ISSN 2071-8594

Russian academy of sciences

Editor-in-Chief

Gennady Osipov

M.I. Suvorova, M.V. Kobozeva, E.G. Sokolova, S.Y. Toldova Extraction of Script Knowledge from Texts. Part I. The Task and the Review of the State of the Art

Abstract:

This paper discusses the importance of automatic extraction of script knowledge for natural language understanding. We discuss theoretical approaches to the description of text structure: story grammars, scripts, frames and narrative schemas. We provide a list of research fields where automatic script knowledge extraction can be applied to achieve better precision and recall (e.g. automatic summarization, information extraction, coreference resolution, etc.). The article also presents popular approaches to the automatic extraction of script knowledge and methods for evaluation of such approaches. Besides, we present a list of datasets that can be used to train and test new models.

Keywords:

script knowledge extraction, narrative schemas, scripts, frames, natural language processing.

PP. 17-26.

DOI 10.14357/20718594200102

References

1. Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text-interdisciplinary Journal for the Study of Discourse, 8(3), 243-281.
2. Chambers N., Jurafsky D. A Database of Narrative Schemas //LREC. 2010.
3. Propp V. Morphology of the Folktale. – University of Texas Press, 2010. – Т. 9.
4. Mitrofanova O. 2019. Issledovaniye strukturnoy organizatsii khudozhestvennogo proizvedeniya s pomoshch'yu tematicheskogo modelirovaniya: opyt raboty s tekstom romana «Master i Margarita» M.A. Bulgakova [A study of the structural organization of fiction using topic modeling: the case study of the novel “The Master and Margarita” by Bulgakov] // Trudy mezhdunarodnoy konferentsii «Korpusnaya lingvistika-2019» [Proceedings of the international conference “Corpus Linguistics-2019”]. SPb. 387-394.
5. Martem'yanov Yu. 2004. Logika situatsiy [Logic of situations] // Stroyeniye teksta. Terminologichnost' slov. [Text structure. Terminology of words]. Moscow: YASK.
6. Baranov A.N. 2001. Vvedeniye v prikladnuyu lingvistiku [Introduction to Applied Linguistics]. Moscow, Editorial URSS.
7. Bodrova A. A., Bocharov V. V. 2014. Relationship Extraction from Literary Fiction //Dialogue: International Conference on Computational Linguistics.
8. Iyyer, M., Guha, A., Chaturvedi, S., Boyd-Graber, J., & Daumé III, H. (2016, June). Feuding families and former friends: Unsupervised learning for dynamic fictional relationships. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1534-1544.
9. Shenk, R., Birnbaum, L., and May, J. (1989). K integracii semantiki i pragmatiki [Integrating Semantics and Pragmatics], translated by G.Y. Levin, Novoe v zarubezhnoj lingvistike, 24, 32–47 (in Russian).
10. Minsky M. L. 1977. Frame-system theory //Thinking.
11. Charniak E. 1978. On the use of framed knowledge in language comprehension //Artificial Intelligence. (11: 3). 225-265.
12. Schank R. C., Abelson R. P. 1977. Scripts //Plans, Goals and Understanding.
13. Fillmore C. J. et al. Frame semantics and the nature of language //Annals of the New York Academy of Sciences: Conference on the origin and development of language and speech. – 1976. – Т. 280. – №. 1. – С. 20-32.
14. Schank R. C., Abelson R. P. Scripts, plans, and knowledge //IJCAI. – 1975. – С. 151-157.
15. Darbanov B. 2017. Teoriya skhemy, freym, skript, stsenariy kak modeli ponimaniya teksta [Scheme theory, frame, script, script as a model for understanding of the text] // Aktual'nyye problemy gumanitarnykh i yestestvennykh nauk [Actual problems of the humanities and natural sciences]. No. 6-2. 75-78.
16. Tkhostov, A., & Nelyubina, A. 2013. Illness Perceptions in Patients with Coronary Heart Disease and Their Doctors. Procedia-Social and Behavioral Sciences. 86: 574-577.
17. Chambers N., Jurafsky D. 2009. Unsupervised learning of narrative schemas and their participants //Proceedings of the Joint Conference of the 47th Annual Meeting of theACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. – Association for Computational Linguistics. 602-610.
18. Chambers N., Jurafsky D. 2008. Unsupervised learning of narrative event chains //Proceedings of ACL-08: HLT. 789-797.
19. Kozerenko Ye. B., Kuznetsov K. I., Romanov D. A. 2018. Semanticheskaya obrabotka nestrukturirovannykh tekstovykh dannykh na osnove lingvisticheskogo protsessora PullEnti [Integrated Platform For Multilingual Text Knowledge Processing] // Informatika i yeyo primeneniya [“Informatics and Applications” scientific journal]. 12:3. 91-98.
20. A.O. Shelmanov, V.A. Isakov, M.A. Stankevich, I.V. Smirnov. 2018. Otkrytoye izvlecheniye informatsii iz tekstov Chast' I. Postanovka zadachi i obzor metodov [Open Information Extraction. Part I. The Task and the Review of the State of the Art ] // Iskusstvennyy intellekt i prinyatiye resheniy [Artificial Intelligence and Decision Making]. 2:47-61.
21. Chambers N., Jurafsky D. 2011. Template-based information extraction without the templates //Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics. 976-986.
22. Azzam S., Humphreys K., Gaizauskas R. 1999. Using coreference chains for text summarization //Proceedings of the Workshop on Coreference and its Applications. – Association for Computational Linguistics.77-84.
23. Filatova E., Hatzivassiloglou V. 2004. Event-based extractive summarization //Text Summarization Branches Out. 104-111.
24. DeJong G. 1982. An overview of the FRUMP system //Strategies for natural language processing. 113:149-176.
25. Xu J. Gan, Z., Cheng, Y., Liu, J. Discourse-Aware Neural Extractive Model for Text Summarization //arXiv preprint arXiv:1910.14142. 2019.
26. Bean D., Riloff E. 2004. Unsupervised learning of contextual role knowledge for coreference resolution //Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004. 297-304.
27. Irwin J., Komachi M., Matsumoto Y. Narrative schema as world knowledge for coreference resolution //Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task. – Association for Computational Linguistics, 2011. – С. 86-92.
28. Simonson D., Davis A. 2016. NASTEA: Investigating narrative schemas through annotated entities //Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016). 57-66.
29. Doust R., Piwek P. 2017. A model of suspense for narrative generation //Proceedings of the 10th International Conference on Natural Language Generation. 178-187.
30. Balasubramanian N. et al. 2013. Generating coherent event schemas at scale //Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1721-1731.
31. Pichotta K., Mooney R. J. 2016. Learning statistical scripts with LSTM recurrent neural networks //Thirtieth AAAI Conference on Artificial Intelligence.
32. Shibata T., Kohama S., Kurohashi S. 2014. A Large Scale Database of Strongly-related Events in Japanese //LREC. 3283-3288.
33. Borgelt C., Kruse R. 2002. Induction of association rules: Apriori implementation //Compstat. – Physica, Heidelberg. 395-400.
34. Regneri M., Koller A., Pinkal M. 2010. Learning script knowledge with web experiments //Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 979-988.
35. Taylor W. L. 1953. “Cloze procedure”: A new tool for measuring readability //Journalism Bulletin. 30: 4. 415-433.
36. Mostafazadeh N. et al. A corpus and evaluation framework for deeper understanding of commonsense stories //arXiv preprint arXiv:1604.01696. 2016.
37. Mikolov T. et al. 2013. Distributed representations of words and phrases and their compositionality //Advances in neural information processing systems. 3111-3119.
38. Kiros R. et al. 2015. Skip-thought vectors //Advances in neural information processing systems. 3294-3302.
39. Huang P. S. et al. Learning deep structured semantic models for web search using clickthrough data //Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 2013. 2333-2338.
40. Devlin J. et al. Bert: Pre-training of deep bidirectional transformers for language understanding //arXiv preprint arXiv:1810.04805. 2018.
41. Settles, B. (2012). Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 6(1), 1-114.
42. Suvorov R., Shelmanov A., Smirnov I. 2017. Active Learning with Adaptive Density Weighted Sampling for Information Extraction from Scientific Papers //Conference on Artificial Intelligence and Natural Language. Springer, Cham. 77-90.
43. Snell J., Swersky K., Zemel R. 2017. Prototypical networks for few-shot learning //Advances in Neural Information Processing Systems. 4077-4087.
44. Sandhaus, E. 2008. The New York Times Annotated Corpus. Linguistic Data Consortium, Philadelphia.
45. Pustejovsky J. et al. 2003. The timebank corpus //Corpus linguistics. 40p.