ISSN 2071-8594

Russian academy of sciences

Editor-in-Chief

Gennady Osipov

D.А. Devyatkin, R.E. Suvorov, I.V. Sochenkov. Information Retrieval System for Decision Support: Arctic-related Mass Media Case Study

Abstract.

The paper discusses the problem of building a comprehensive information retrieval system that facilitates the decision making process in the specified wide topic. We analyze the requirements to such a system, types of information sources, typical search queries and propose an architecture and an integrated pipeline. We also present a case study in the field of Arctic exploration (oil & mining, ecology issues, etc.). The results are also present, including vibrant topics and typical associations between entities.

Keywords:

Information retrieval, mass media monitoring, event detection, information extraction, relation extraction, knowledge base, decision support.

REFERENCES

1. Imran M. et al. Processing social media messages in mass emergency: a survey //ACM Computing Surveys (CSUR). – 2015. – T. 47. – №. 4. – S. 67.
2. Petrovic S. Real-time event detection in massive streams. – 2013.
3. Li R. et al. Tedas: A twitter-based event detection and analysis system // Data engineering (icde), 2012 ieee 28th international conference on. – IEEE, 2012. – S. 1273-1276.
4. Disaster SitRep – A vertical search engine and information analysis tool in disaster management domain / Li Zheng, Chao Shen, Liang Tang et al. // Proceedings of 2012 IEEE 13th International Conference on Information Reuse and Integration (IRI). — 2012. — P. 457–465.
5. Tweedr: Mining Twitter to inform disaster response / Zahra Ashktorab, Christopher Brown, Manojit Nandi, Aron Culotta // Proceedings of ISCRAM. — 2014. — P. 354–358.
6. Xiaohua L. et al. Recognizing named entities in tweets / Xiaohua Liu, Shaodian Zhang, Furu Wei, Ming Zhou // Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies . – Association for Computational Linguistics. — 2011. — P. 359–367.
7. Bhattacharya A., Tiwari M. K., Harding J. A. A framework for ontology based decision support system for elearning modules, business modeling and manufacturing systems // Journal of Intelligent Manufacturing. – 2012. – T. 23. – №. 5. – S. 1763-1781.
8. Rao L., Mansingh G., Osei-Bryson K. M. Building ontology based knowledge maps to assist business process reengineering //Decision Support Systems. – 2012. – T. 52. – №. 3. – S. 577-589.
9. Hersovici M. et al. The shark-search algorithm. An application: tailored Web site mapping //Computer Networks and ISDN Systems. – 1998. – T. 30. – №. 1. – S. 317-326.
10. Chen Z. et al. An improved shark-search algorithm based on multi-information // Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on. – IEEE, 2007. – T. 4. – S. 659-658
11. Su C. et al. An efficient adaptive focused crawler based on ontology learning // Hybrid Intelligent Systems, 2005. HIS'05. Fifth International Conference on. – IEEE, 2005. – S. 6 pp.
12. Liu H., Janssen J., Milios E. Using HMM to learn user browsing patterns for focused web crawling // Data & Knowledge Engineering. — 2006. — Vol. 59, no. 2. — P. 270–291
13. Blanvillain O., Kasioumis N., Banos V. BlogForever Crawler: Techniques and Algorithms to Harvest Modern Weblogs //Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14). – ACM, 2014. – S. 7.
14. Florian R. et al. Named entity recognition through classifier combination //Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. – Association for Computational Linguistics, 2003. – S. 168-171.
15. Al-Rfou R. et al. Polyglot-NER: Massive multilingual named entity recognition //Proceedings of the 2015 SIAM International Conference on Data Mining, Vancouver, British Columbia, Canada. – 2015.
16. Wikipedia – svobodnaya entsiklopediya [Elektronnyy resurs] / Wikimedia. – URL: http://wikipedia.org (provereno 20.01.2016).
17. Bollacker K. et al. Freebase: a collaboratively created graph database for structuring human knowledge //Proceedings of the 2008 ACM SIGMOD international conference on Management of data. – ACM, 2008. – S. 1247-1250.
18. Manning C. D. et al. Introduction to information retrieval. – Cambridge : Cambridge university press, 2008. – T. 1. – S. 496.
19. Sochenkov I. V., Suvorov R. Ye. Servisy polnotekstovogo poiska v informatsionno-analiticheskoy sisteme (Chast 1) //Informatsionnye tekhnologii i vychislitelnye sistemy. M.: ISA RAN. – 2013. – №. 2. – S. 69-78.
20. Takase S., Okazaki N., Inui K. Fast and Large-scale Unsupervised Relation Extraction. – 2015.
21. Angeli G., Premkumar M. J., Manning C. D. Leveraging Linguistic Structure For Open Domain Information Extraction //Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing,
ACL. – 2015. – S. 26-31.
22. TAC Knowledge Base Population [Elektronnyy resurs] // NIST Information Technology Laboratory. – 2015. –
URL: http://www.nist.gov/tac/2015/KBP/ (provereno 20.01.2016).
23. Hoffmann R. et al. Knowledge-based weak supervision for information extraction of overlapping relations //Proceedings of the 49th Annual Meeting of the Association
for Computational Linguistics: Human Language Technologies-Volume 1. – Association for Computational Linguistics, 2011. – S. 541-550.
24. Scrapy. A Fast and Powerful Scraping and Web Crawling Framework [Elektronnyy resurs] // Scrapy. – 2016. – URL: http://scrapy.org/ (provereno 20.01.2016).
25. Osipov G. et al. Relational-situational method for intelligent search and analysis of scientific publications //Proceedings of the Integrating IR Technologies for Professional Search Workshop. – 2013. – S. 57-64.
26. Agrawal R., Imielinski T., Swami A. Mining association rules between sets of items in large databases // Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data / ACM. — Vol. 22. — 1993. — P. 207–216.
27. Blei D. M. Probabilistic topic models // Communications of the ACM. – 2012. – T. 55. – №. 4. – S. 77-84.
28. D.A. Devyatkin, R.Ye. Suvorov, I.V. Sochenkov. Metod tematicheskoy klasterizatsii masshtabnykh kollektsiy nauchno-tekhnicheskikh dokumentov // Informatsionnye tekhnologii i vychislitelnye sistemy. - 2013. - № 1. - S. 33-42.
29. Haklay M., Weber P. Openstreetmap: User-generated street maps //Pervasive Computing, IEEE. – 2008. – T. 7. – №. 4. – S. 12-18.
30. Titan:Distributed Graph Database [Elektronnyy resurs] // DataStax. – 2016. – URL:
http://thinkaurelius.github.io/titan/ (provereno 20.01.2016).
31. Lakshman A., Malik P. Cassandra: a decentralized structured storage system // ACM SIGOPS Operating Systems Review. – 2010. – T. 44. – №. 2. – S. 35-40.
32. Joishi J., Sureka A. Vishleshan: performance comparison and programming process mining algorithms in graphoriented and relational database query languages. – 2015.
33. Rodriguez M. A. The Gremlin graph traversal machine and language (invited talk) // Proceedings of the 15th Symposium on Database Programming Languages. – ACM, 2015. – S. 1-10.
34. Aho A. V., Corasick M. J. Efficient string matching: an aid to bibliographic search // Communications of the ACM. – 1975. – T. 18. – №. 6. – S. 333-340.
35. Al-Rfou R., Perozzi B., Skiena S. Polyglot: Distributed word representations for multilingual nlp //arXiv preprint arXiv:1307.1662. – 2013.