ISSN 2071-8594

Russian academy of sciences

Editor-in-Chief

Gennady Osipov

A.O. Shelmanov, D.A. Devyatkin, V.A. Isakov, I.V. Smirnov Open information extraction from texts Part II. Extraction of semantic relations using unsupervised machine learning

Abstract.

In this paper, we discuss open information extraction from natural language texts. We present the approach to extraction of semantic relations using unsupervised machine learning. The presented approach is based on deep clustering methods in which clusterization algorithm is integrated in multi-layer autoencoder neural network. This method allows to generalize surface relations (triplets) into semantic relations. This paper also provides the method of surface relation extraction.

Keywords:

open information extraction, semantic relations, unsupervised machine learning, neural networks, autoencoder.

PP. 39-49.

DOI 10.14357/20718594190204

References

1. Open information extraction from the web. / Michele Banko, Michael J Cafarella, Stephen Soderland et al. // IJCAI. — Vol. 7. — 2007. — P. 2670–2676.
2. Textrunner: open information extraction on the web / Alexander Yates, Michael Cafarella, Michele Banko et al. // Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations / Association for Computational Linguistics. — 2007. — P. 25–26.
3. Wu F., Weld D. S. Open information extraction using wikipedia // Proceedings of the 48th annual meeting of the association for computational linguistics / Association for Computational Linguistics. — 2010. — P. 118–127.
4. Fader A., Soderland S., Etzioni O. Identifying relations for open information extraction // Proceedings of the conference on empirical methods in natural language processing / Association for Computational Linguistics. — 2011. — P. 1535–1545.
5. Open information extraction: The second generation. / Oren Etzioni, Anthony Fader, Janara Christensen et al. // IJCAI. — Vol. 11. — 2011. — P. 3–10.
6. Shelmanov, A.O., V.A. Isakov, M.A. Stankevich and I.V. Smirnov. 2018. Otkrytoe izvlechenie informatsii iz tekstov chast' 1. Postanovka zadachi i obzor metodov. [Open information extraction from texts part 1. Problem statement and survey of methods]. Iskusstvennyj intellekt i prinyatie reshenij [Artificial intelligence and decision-making] 2:47-61.
7. Lin D., Pantel P. Discovery of inference rules for question-answering // Natural Language Engineering. — 2001. — Vol. 7, no. 4. — P. 343–360.
8. Takase S., Okazaki N., Inui K. Fast and large-scale unsupervised relation extraction // Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. — 2015. — P. 96–105.
9. Distributed representations of words and phrases and their compositionality / Tomas Mikolov, Ilya Sutskever, Kai Chen et al. // Advances in neural information processing systems. — 2013. — P. 3111–3119.
10. Structured relation discovery using generative models / Limin Yao, Aria Haghighi, Sebastian Riedel, Andrew McCallum // Proceedings of the Conference on Empirical Methods in Natural Language Processing / Association for Computational Linguistics. — 2011. — P. 1456–1466.
11. Blei D. M., Ng A. Y., Jordan M. I. Latent dirichlet allocation // Journal of machine Learning research. — 2003. — Vol. 3, no. Jan. — P. 993–1022.
12. Yao L., Riedel S., McCallum A. Unsupervised relation discovery with sense disambiguation // Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1 / Association for Computational Linguistics. — 2012. — P. 712–720.
13. De Lacalle O. L., Lapata M. Unsupervised relation extraction with general domain knowledge // Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. — 2013. — P. 415–425.
14. A framework for incorporating general domain knowledge into latent dirichlet allocation using first-order logic / David Andrzejewski, Xiaojin Zhu, Mark Craven, Benjamin Recht // IJCAI Proceedings-International Joint Conference on Artificial Intelligence. — Vol. 22. — 2011. — P. 1171.
15. Marcheggiani D., Titov I. Discrete-state variational autoencoders for joint discovery and factorization of relations // Transactions of the Association for Computational Linguistics. — 2016. — Vol. 4. — P. 231–244.
16. Kingma D. P., Welling M. Auto-encoding variational bayes // arXiv preprint arXiv:1312.6114. — 2013.
17. Hasegawa T., Sekine S., Grishman R. Discovering relations among named entities from large corpora // Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics / Association for Computational Linguistics. — 2004. — P. 415.
18. Shinyama Y., Sekine S. Preemptive information extraction using unrestricted relation discovery // Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics / Association for Computational Linguistics. — 2006. — P. 304–311.
19. Unsupervised relation extraction by mining wikipedia texts using information from the web / Yulan Yan, Naoaki Okazaki, Yutaka Matsuo et al. // Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2 / Association for Computational Linguistics. — 2009. — P. 1021–1029.
20. Xie J., Girshick R., Farhadi A. Unsupervised deep embedding for clustering analysis // International conference on machine learning. — 2016. — P. 478–487.
21. Deep clustering with convolutional autoencoders / Xifeng Guo, Xinwang Liu, En Zhu, Jianping Yin // International Conference on Neural Information Processing / Springer. — 2017. — P. 373–382.
22. Tian K., Zhou S., Guan J. Deepcluster: A general clustering framework based on deep learning // Joint European Conference on Machine Learning and Knowledge Discovery in Databases / Springer. — 2017. — P. 809–825.
23. Learning deep representations for graph clustering. / Fei Tian, Bin Gao, Qing Cui et al. // AAAI. — 2014. — P. 1293–1299.
24. Auto-encoder based data clustering / Chunfeng Song, Feng Liu, Yongzhen Huang et al. // Iberoamerican Congress on Pattern Recognition / Springer. — 2013. — P. 117–124.
25. Hinton G. E., Salakhutdinov R. R. Reducing the dimensionality of data with neural networks // science. — 2006. — Vol. 313, no. 5786. — P. 504–507.
26. Frantzi K., Ananiadou S., Mima H. Automatic recognition of multi-word terms:. the c-value/nc-value method // International journal on digital libraries. — 2000. — Vol. 3, no. 2. — P. 115–130.
27. Smirnov, I.V., A.O. Shelmanov, E.S. Kuznetsova and I.V. Hramoin. 2014. Semantiko-sintaksicheskij analiz estestvennykh yazykov chast' II. Metod semantikosintaksicheskogo analiza tekstov. [Semantico-syntactic analysis of natural languages Part II. Method of semanticsyntactic analysis of texts]. Iskusstvennyj intellekt i prinyatie reshenij [Artificial intelligence and decisionmaking] 1:11-24.
28. Kutuzov A., Kuzmenko E. Webvectors: a toolkit for building web interfaces for vector semantic models // International Conference on Analysis of Images, Social Networks and Texts / Springer. — 2016. — P. 155–161.
29. Lang J., Lapata M. Unsupervised semantic role induction via split-merge clustering // Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies / Association for Computational Linguistics. — 2011. — P. 1117–1126. Titov I., Khoddam E. Unsupervised induction of semantic roles within a reconstruction-error minimization framework // Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. — 2015. — P. 1–10.