ISSN 2071-8594

Russian academy of sciences

Editor-in-Chief

Gennady Osipov

V. I. Gorodetsky, O. N. Tushkanova Semantic technologies for semantic applications. Part 2. Models of comparative text semantics

Abstract.

The both parts of the paper discuss the basic aspects of semantic computing, semantic technologies and semantic applications applied to NL-texts big data processing for knowledge extracting and decision-making. The basic components of the corresponding systems and technologies are reviewed, which include ontologies and semantic models of their use, semantic resources, and semantic component. The semantic resources contain knowledges about the words semantics and means for refinement of this semantics. The semantic component of the technology is used to formally describe the meaning of NL-entities and numerically evaluate their pairwise semantic similarity. The main focus of this part is on numerical models of pairwise semantic similarity of NL-entities. These models are important for solving tasks of text semantic clustering and classification and their various applications. Various types of semantic relatedness and semantic similarity measures for NL-entities in the context of semantic computing tasks are discussed and compared. Problems that constrain the practical use of semantic technologies for the development of semantic applications are analyzed.

Keywords:

semantic technology, semantic computing, semantic resource, comparative semantics, semantic relatedness, semantic similarity.

PP. 49-61.

DOI 10.14357/20718594190105

References

1. Meng L., Huang R., Gu J. A review of semantic similarity measures in wordnet. International Journal of Hybrid Information Technology, 6 (1), 2013, pp. 1-12.
2. Feng Y., Bagheri E., Ensan F., Jovanovic J. The state of the art in semantic relatedness: a framework for comparison. Knowledge Engineering Review, 2017, pp. 1-30.
3. Leacock C., Chodorow M. Combining local context and wordnet similarity for word sense identification. WordNet: An electronic lexical database, 1998, vol. 49, no. 2, pp. 265-283.
4. Wu Z., Palmer M. Verbs semantics and lexical selection. Proceedings of the 32nd annual meeting on Association for Computational Linguistics, ser. ACL ’94. Stroudsburg, PA, USA: Association for Computational Linguistics, 1994, pp. 133-138.
5. Li Y., Bandar Z., Mclean D. An approach for measuring semantic similarity between words using multiple information sources. Knowledge and Data Engineering, IEEE Transactions on, 2003, vol. 15, 4, pp. 871-882.
6. Resnik P. Using information content to evaluate semantic similarity in a taxonomy. Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 1, ser. IJCAI’95. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1995, pp. 448-453.
7. Lin D. An information-theoretic definition of similarity. Proceedings of the Fifteenth International Conference on Machine Learning, ser. ICML ’98. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1998, pp. 296-304.
8. Jiang J. J., Conrath D.W. Semantic similarity based on corpus statistics and lexical taxonomy. Computational Linguistics, 1997, vol. cmp-lg/970, no. Rocling X, p. 15.
9. Parkhomenko P.A., Grigor'yev A.A., Astrakhantsev N.A. Obzor i eksperimental'noye sravneniye metodov klasterizatsii tekstov [Review and experimental comparison of methods of text clustering]. Trudy ISP RAN [ISP RAS Proceddings]. 2017, 29 (2), pp. 161-200.
10. Zhu G., Iglesias C.A. Computing Semantic Similarity of Concepts in Knowledge Graphs. IEEE Transactions on Knowledge and Data Engineering 29.1, 2017, pp. 72-85.
11. Gabrilovich E., Markovitch, S. Computing semantic relatedness using Wikipedia-based Explicit Semantic Analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI ’07), Sangal, R., Mehta, H. & Bagga, R. K. (eds). Morgan Kaufmann Publishers Inc., 2007, pp. 1606-1611.
12. Tversky A. Features of Similarity. Psycological Review, 84 (4), 1977. P. 327-352.
13. Lesk M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC ’86), DeBuys, V. (ed.). ACM, 1986, pp. 24–26.
14. Vasilescu F., Langlais P., Lapalme G. Evaluating Variants of the Lesk Approach for Disambiguating Words. Proceedings of The Fourth International Conference on Language Resources and Evaluation (LREC 2004), Portugal, 2004, pp. 633-636.
15. Morris J., G. Hirst G. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 1991, vol. 17, 1, pp. 21–43.
16. Wei T., Lu Y., Chang H., Zhou Q., Bao X. A semantic approach for text clustering using WordNet and lexicalchains. Expert Systems with Applications, 2015, 42, pp. 2264–2275.
17. Tkach S.S. Primeneniye leksicheskikh tsepochek dlya razresheniya leksicheskoy mnogoznachnosti na osnove Russkogo Viki-slovarya [Application of lexical chains for solving lexical polysemy based on the Russian Wiki Dictionary]. Masters thesis. Petrozavodsk State University, Petrozavodsk, 2016, 60 p.
18. Mitra M., Singhal A., Buckley C. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, 1998, pp. 206-214.
19. Sahami M., Heilman T.D. A web-based kernel function for measuring the similarity of short text snippets.In Proceedings of the 15th International Conference on World Wide Web (WWW ’06), ACM, 2006, pp. 377-386.
20. Cilibrasi R.L., Vitanyi P. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering 2007, 19 (3), pp. 370-383.
21. Vereshchagin N.K. Uspenskiy V.A., Shen' A. Kolmogorovskaya slozhnost' i sluchaynost' [Kolmogorov complexity and randomness]. Moscow, MTSNMO, 2013, 575 p.
22. Bollegala D., Matsuo Y., Ishizuka M. WebSim: a Webbased Semantic Similarity Measure. The 21st Annual Conference of the Japanese Society for Artificial Intelligence, 2007, pp. 1-4.
23. Wong W., Liu W., Bennamoun M. Tree-traversing ant algorithm for term clustering based on featureless similarities. Data Mining and Knowledge Discovery 15 (3), pp. 349-381.
24. Bartussek W., Bense H., Hoppe T., Humm B.G., Reibold A., Schade U., Siegel M., Walsh P. Introduction to Semantic Applications. In Thomas Hoppe, Bernhard Humm, Anatol Reibold (Eds.). Semantic Applications. Methodology, Technology, Corporate Use. Springer-Verlag GmbH Germany, part of Springer Nature 2018.
25. Gorodetskiy V.I., Serebryakov S.V. Metody i algoritmy kollektivnogo raspoznavaniya [Methods and algorithms of collective recognition]. Avtomatika i telemekhanika [Automation and Remote Control], 2008, vol. 69 (11), pp. 3-40.