ISSN 2071-8594

Russian academy of sciences

Editor-in-Chief

Gennady Osipov

I.F. Kuzminov, P.D. Bakhtin, A.A. Timofeev, E.E. Khabirova, P.A. Lobanova, N.I. Zurabyan Modern Natural Language Processing Technologies for Solving Strategic Analytics Tasks

Abstract.

The article is devoted to a review of the latest natural language processing (NLP) technologies that can be applied in strategic analytics. The introduction discusses the main problems in this area and specific tasks that can be solved using NLP tools. The article provides an overview of the main application areas in which these tools are involved. The paper reviews recent advancements in NLP and assess their potential. Conclusions are drawn about how the NLP apparatus should be developed in order to fulfill the needs of strategic analytics in the future.

Keywords:

NLP, artificial intelligence, text mining, strategic analytics.

PP. 3-16.

DOI 10.14357/20718594200101

References

1. Went P. How does the digital economy create ‘alternative data’? [Electronic resource]. URL: https://link.medium.com/dCTNLPPuEW (accessed 02.10.2019).
2. Kuz'minov I. F., Loginova I. V., Lobanova P. A. Perspektivy ispol'zovanija tehnologij analiza bol'shih dannyh dlja strategicheskoj analitiki agropromyshlennogo kompleksa // Saharnaja svekla, 2018. – № 9. – P. 2-7.
3. Sokolov A. V., Chulok A. A. Dolgosrochnyj prognoz nauchno-tehnologicheskogo razvitija Rossii na period do 2030 goda: kljuchevye osobennosti i pervye rezul'taty // Forsajt. 2012. – T. 6. – № 1. – P. 12-25.
4. Osipov G. et al. Information retrieval for R&D support //Professional search in the modern world. – Springer, Cham, 2014. – P. 45-69.
5. Meissner D. Approaches for Developing National STI Strategies // STI Policy Review. 2014. – Vol. 5. – No. 1. – P. 34-56.
6. King D. A., Thomas S. M. Taking science out of the box--foresight recast //Science. – 2007. – T. 316. – №. 5832. – P. 1701-1702.
7. Martin B. R. Foresight in science and technology //Technology analysis & strategic management. – 1995. – T. 7. – №. 2. – P. 139-168.
8. Sokolov A. et al. Future of S&T: Delphi survey results //Foresight and STI Governance (Foresight-Russia till No. 3/2015). – 2009. – T. 3. – №. 3. – P. 40-58.
9. Kuz'minov I. F., Lobanova P. A., Loginova I. V. Tehnologija analiza bol'shih dannyh dlja strategicheskoj analitiki otrasli // Kombikorma. 2019. – № 4. – P. 46-52.
10. Berry, M. W. Survey of text mining / M. W. Berry // Computing Reviews. — 2004. — T. 45. — № 9. — P. 548.
11. Patents in Artificial Intelligence - 1969-2017. [Electronic resource]. URL: https://www.econsight.ch/wpcontent/ ai/index.html (accessed 02.10.2019).
12. MarketsandMarkets. Natural Language Processing Market worth 16.07 Billion USD by 2021. [Electronic resource]. URL: https://www.marketsandmarkets.com/PressReleases/natural-language-processing-nlp.asp (accessed 02.10.2019).
13. Grand View Research. Voice and Speech Recognition Market Size, Share & Trends Analysis Report, By Function, By Technology (AI, Non-AI), By Vertical (Healthcare, BFSI, Automotive), And Segment Forecasts, 2018 – 2025. [Electronic resource]. URL: https://www.grandviewresearch.com/industryanalysis/voice-recognition-market (accessed 02.10.2019).
14. Kim M. 2019 UI and UX Design Trends. [Electronic resource]. URL: https://uxplanet.org/2019-ui-and-ux-designtrends- 92dfa8323225 (accessed 14.10.2019).
15. MarketsandMarkets. Recommendation Engine Market by Type (Collaborative Filtering, Content-Based Filtering, and Hybrid Recommendation), Deployment Mode (Cloud and On-Premises), Technology, Application, End-User, and Region - Global Forecast to 2022. [Electronic resource]. URL: https://www.marketsandmarkets.com/Market-Reports/recommendation-engine-market-151385035.html (accessed 02.10.2019).
16. MarketsandMarkets. Natural Language Processing (NLP) in Healthcare and Life Sciences Market by Component (Technology and Services), Type (Rule-based, Statisticaland Hybrid), Application, Deployment Mode (Cloud and On Premise) and Region - Global Forecast to 2021. [Electronic resource]. URL: https://www.marketsandmarkets.com/Market- Reports/healthcare-lifesciences-nlp-market-131821021.html (accessed 02.10.2019).
17. Mordor Intelligence. Chatbots Market Size - Segmented by Type (Solution, Service), Deployment (On-Premise, Cloud), End-User Vertical (BFSI, Healthcare, IT and Telecommunication, Retail, Utilities, Government), and Region - Growth, Trends and Forecast (2019 - 2024). [Electronic resource]. URL: https://www.mordorintelligence.com/industryreports/chatbots-market (accessed 02.10.2019).
18. Tractica. Emotion Recognition and Sentiment Analysis Market to Reach $3.8 Billion by 2025. [Electronic resource]. URL: https://www.tractica.com/newsroom/pressreleases/emotion-recognition-and-sentiment-analysismarket-to-reach-3-8-billion-by-2025/ (accessed 02.10.2019).
19. Grand View Research. Machine Translation Market Size To Reach $983.3 Million by 2022. [Electronic resource]. URL: https://www.grandviewresearch.com/pressrelease/global-machine-translation-market (accessed 02.10.2019).
20. HR Technologist. HR Tech Marketplace Worth $400 Billion. [Electronic resource]. URL: https://www.hrtechnologist.com/news/digitaltransformation/hr-tech-marketplace-worth-400-billion/ (accessed 14.10.2019).
21. Jain M. R. Why Google Needed a Graph Serving System.[Electronic resource]. URL: https://blog.dgraph.io/post/why-google-needed-graphserving-system/ (accessed 14.10.2019).
22. Garcia C. Gulliver’s engine. [Electronic resource]. URL: https://computerhistory.org/blog/gulliversengine/?key=gullivers-engine (accessed 14.10.2019).
23. SCIgen - An Automatic CS Paper Generator. [Electronic resource]. URL: https://pdos.csail.mit.edu/archive/scigen/ (accessed 14.10.2019).
24. Mathgen. [Electronic resource]. URL: http://thatsmathematics.com/mathgen/ (accessed 14.10.2019)..
25. Laplante P.A. 3.7.5 Paper Generators // Technical Writing: A Practical Guide for Engineers and Scientists. – CRC Press, 2011. – P. 56–59.
26. Gibson J. From 0 to 2 million in 4 years. [Electronic resource]. URL: https://medium.com/vizzuality-blog/globalforest-watch-from-0-to-2-million-in-4-years-32f63cd9a46 (accessed 14.10.2019).
27. Leung J. The Benefits of Pair Programming. [Electronic resource]. URL: https://medium.com/betterprogramming/when-pair-programming-works-it-worksreally-well-heres-why-c51857bbcf0f (accessed 14.10.2019).
28. Saravia E. NLP 2018 Highlights. [Electronic resource]. URL: http://elvissaravia.com/nlp-highlights-2018/ (accessed 14.10.2019).
29. Transformers from scratch. [Electronic resource]. URL: http://www.peterbloem.nl/blog/transformers (accessed 14.10.2019).
30. Il'vovskij D., Chernjak E. Glubinnoe obuchenie dlja avtomaticheskoj obrabotki tekstov // Otkrytye sistemy. – SUBD. 2017. – № 2. – P. 26-29.
31. Lau J. H., Baldwin T. An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation // Proceedings of the 1st Workshop on Representation Learning for NLP. – 2016. – S. 78-86.
32. STSbenchmark. [Electronic resource]. URL: http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark (accessed 14.10.2019).
33. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. Distributed Representations of Words and Phrases and their Compositionality, in Proceedings of NIPS [Electronic resource]. URL: https://arxiv.org/abs/1310.4546 (accessed 14.10.2019).
34. Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in VectorSpace. – CoRR, abs/1301.3781, 2013 – URL: http://arxiv.org/abs/1301.3781.
35. Huber F. King - Man + Woman = King? [Electronic resource]. URL: https://blog.esciencecenter.nl/king-manwoman- king-9a7fd2935a85 (accessed 14.10.2019).
36. Singh C. Fine-Tune ERNIE 2.0 for Text Classification. [Electronic resource]. URL: https://towardsdatascience.com/https-medium-comgaganmanku96-fine-tune-ernie-2-0-for-text-classification-6f32bee9bf3c (accessed 14.10.2019).
37. Rajasekharan A. Deconstructing BERT. [Electronic resource]. URL: https://towardsdatascience.com/deconstructing-bertreveals-clues-to-its-state-of-art-performance-in-nlp-tasks-76a7e828c0f1 (accessed 14.10.2019).
38. How To Make Custom AI-Generated Text With GPT-2. [Electronic resource]. URL: https://minimaxir.com/2019/09/howto-gpt2/ (accessed 14.10.2019).
39. Write With Transformer. [Electronic resource]. URL: https://transformer.huggingface.co/ (accessed 14.10.2019).
40. Ma E. Combing LDA and Word Embeddings for topic modeling. [Electronic resource]. URL: https://towardsdatascience.com/combing-lda-and-wordembeddings-for-topic-modeling-fe4a1315a5b4 (accessed 14.10.2019).
41. He R., Lee W.S., Ng H.T., Dahlmeier D. An Unsupervised Neural Attention Model for Aspect Extraction // Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). – 2017. – P.388-397.
42. Lee J.-S., Hsiang J. Patent Claim Generationby Fine- Tuning OpenAI GPT-2. [Electronic resource]. URL: https://arxiv.org/abs/1907.02052 (accessed 14.10.2019).
43. Richter M. Comparing Word Embeddings. [Electronic resource]. URL: https://towardsdatascience.com/comparing-wordembeddings-c2efd2455fe3 (accessed 14.10.2019).
44. Hackathorn R. DxR: Bridging 2D Data Visualization into Immersive Spaces. [Electronic resource]. URL: https://towardsdatascience.com/dxr-bridging-2d-datavisualization-into-immersive-spaces-d77a20d5f9e9 (accessed 14.10.2019).
45. Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. [Electronicresource]. URL: https://arxiv.org/pdf/1908.10084.pdf (accessed 14.10.2019).
46. Sinclair C. Clustering Using OPTICS. [Electronic resource]. URL: https://towardsdatascience.com/clusteringusing- optics-cac1d10ed7a7 (accessed 14.10.2019).
47. Oskolkov N. How to cluster in High Dimensions. [Electronic resource]. URL: https://towardsdatascience.com/how-to-cluster-in-highdimensions-4ef693bacc6 (accessed 14.10.2019).
48. Kopichinsky G. Improve Heavy Elasticsearch Aggregations with Random Score and Sampler Aggregation. [Electronic resource]. URL: https://medium.com/cognigo/improve-heavyelasticsearch-aggregations-with-random-score-andsampler-aggregation-9e1857271059 (accessed 14.10.2019).
49. Gutteridge L. What I’m Telling Business People About Why Relational Databases Are So Bad. [Electronic resource]. URL: https://codeburst.io/what-im-tellingbusiness-people-about-why-relational-databases-are-sobad-6f38d3d6c995 (accessed 14.10.2019).
50. Liu Y., Lapata M. Text Summarization with Pretrained Encoders. [Electronic resource]. URL: https://www.researchgate.net/publication/335337738_Text_Summarization_with_Pretrained_Encoders (accessed 14.10.2019).
51. Liu Y. Fine-tune BERT for Extractive Summarization. [Electronic resource]. URL: https://www.researchgate.net/publication/331986865_Fine-tune_BERT_for_Extractive_Summarization (accessed 14.10.2019).
52. Zhang H., Xu J., Wang J. Pretraining-Based Natural Language Generation for Text Summarization. [Electronic resource]. URL: https://arxiv.org/pdf/1902.09243.pdf (accessed 14.10.2019).
53. Metz C., Blumental S. How A.I. Could Be Weaponized to Spread Disinformation. [Electronic resource]. URL: https://www.nytimes.com/interactive/2019/06/07/technology/ai-text-disinformation.html (accessed 14.10.2019).
54. Grover — A State-of-the-Art Defense against Neural Fake News. [Electronic resource]. URL: https://grover.allenai.org/ (accessed 14.10.2019).
55. Saravia E. XLNet outperforms BERT on several NLP Tasks. [Electronic resource]. URL: https://medium.com/dair-ai/xlnet-outperforms-bert-onseveral-nlp-tasks-9ec867bb563b (accessed 14.10.2019).
56. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. [Electronic resource]. URL: http://arxiv.org/abs/1907.11692 (accessed 14.10.2019).
57. Trending Research. [Electronic resource]. URL: https://paperswithcode.com/ (accessed 14.10.2019).
58. NLP-progress. [Electronic resource]. URL: http://nlpprogress.com/ (accessed 14.10.2019).
59. Anan'eva M.I., Devjatkin D.A., Kobozeva M.V., Smirnov I.V., Solov'ev F.N., Chepovskij A.M. Issledovanie harakteristik tekstov protivopravnogo soderzhanija // Trudy Instituta sistemnogo analiza Rossijskoj akademii nauk. – 2017. – T. 67. – № 3. – P. 86-97.
60. Stankevich M.A., Isakov V.A., Devjatkin D.A., Smirnov I.V. Postroenie klassifikacionnyh modelej dlja zadachi obnaruzhenija depressii u pol'zovatelej social'nyh setej // V sbornike: Informatika, upravlenie i sistemnyj analiz Trudy V Vserossijskoj nauchnoj konferencii molodyh uchenyh s mezhdunarodnym uchastiem. – 2018. – P. 237-246.
61. Soms N.L., Dobrov A.V. AI-tehnologii NLU i Ontological Semantics v medicinskih jekspertnyh sistemah. [Electronic resource]. URL: https://armit.ru/medsoft/2019/presentation/Day_01/16_18/3.pdf (accessed 14.10.2019).
62. Efimenko I.V. Semanticheskij analiz tekstov v oblasti mediciny i bioteha: problemy i perspektivy. [Electronic resource]. URL: https://armit.ru/medsoft/2019/presentation/Day_01/16_18/6.pdf (accessed 14.10.2019).
63. Russkih A.N. Analiz otzyvov klientov o vedushhih laboratornyh provajderah. [Electronic resource]. URL: https://armit.ru/medsoft/2019/presentation/Day_01/16_18/4.pdf (accessed 14.10.2019).
64. Osipov G. et al. Exactus expert—search and analytical engine for research and development support // Novel Applications of Intelligent Systems. – Springer, Cham, 2016. – P. 269-285.
65. Monitoring trendov. [Electronic resource]. URL: https://digitaltrends.rt.ru (accessed 14.10.2019).
66. Pod red. Shumskogo S.A. – Pivovarov I.O. i dr. Al'manah «Iskusstvennyj intellekt». [Electronic resource]. URL: http://www.aireport.ru/ (accessed 14.10.2019).
67. Osipov G.S., Smirnov I.V., Tihomirov I.A., Sochenkov I.V., Zubarev D.V., Isakov V.A. Tehnologii semanticheskogo poiska zaimstvovanij v nauchnyh tekstah // Kniga. Kul'tura. Obrazovanie. Innovacii (“Krym-2016”) Materialy Vtorogo Mezhdunarodnogo professional'nogo foruma. – 2016. – P. 311-313.
68. Ena O.V., Nagaev K.V. Avtomatizacija processov razrabotki tehnologicheskih dorozhnyh kart. Raschet integral'nyh pokazatelej primenimosti // Biznesinformatika. – 2013. – № 3 (25). – P. 56-62.
69. Kuz'minov I. F., Bahtin P. D., Neznanov A. A., Lobanova P.A. Issledovanija struktury nauchnogo soobshhestva na osnove semanticheskogo analiza: vyjavlenie i klasterizacija centrov kompetencij i tematik // V kn.: Upravlenie nauchnymi issledovanijami i razrabotkami. Gosudarstvo i nauka: novye modeli upravlenija – 2018. – Trudy Chetvertoj nauchnoprakticheskoj konferencii (26 nojabrja 2018 g., Moskva). – IPU RAN, 2019. – P. 128-137.
70. McClory P. Is AI Our Last Hope for a Big Disruption? Or Just The Newest One? [Electronic resource]. URL: https://towardsdatascience.com/is-ai-our-last-hope-for-abig-disruption-or-just-the-newest-one-357f9c3db618 (accessed 14.10.2019).
71. Giacaglia G. The Road to Artificial General Intelligence. [Electronic resource]. URL: https://medium.com/datadriveninvestor/the-road-toartificial-general-intelligence-cfcb37bdc432 (accessed 14.10.2019).
72. Kolakowski N. Unleashing Machine Learning on Literature’s Great Works. [Electronic resource]. URL: https://link.medium.com/kfFBclDJHW (accessed 14.10.2019)
73. Schuchmann S. History of the first AI Winter. [Electronic resource]. URL: https://link.medium.com/SSfaFF1PGW (accessed 14.10.2019).