Y. M. Kuznetsova, I. V. Smirnov, M. A. Stankevich, N. V. Chudova Creating a Text Analysis Tool for Socio-Humanitarian Research. Part 2. RSA Machine and the Experience of Using It
The second part of the work describes the most known tools for linguistic-statistical analysis of text corpuses and introduces RSA machine - a novel text analysis tool for socio-humanitarian research. This tool works with network representation of text and allows finding the constructions with complex graph structure in texts. RSA machine implements following features: search of constructions by query, computation of frequencies and statistical characteristics for search results, corpora or individual texts, comparing texts using statistical and frequency features. This paper describes the RSA machine architecture and developing tools. We present the results of pilot research of RSA machine using 142 texts examples written by people with different psychology and demographic characteristics. Some of them (18) were diagnosed with mental disorder. The performed correlation analysis revealed some relations between extracted texts attributes (e.g. frequency of predicate types) and results of psychological analysis performed by experts.
text corpora analysis, software architecture, graph database, semantic-syntactic constructions, socio-humanitarian research, worldview.
1. Kuznetsova Y.M., Smirnov I.V., Isakov V.A., Stankevich M.A., Chudova N.V. Sozdanie instrumenta avtomaticheskogo analiza teksta v interesah socio-gumanitarnyh issledovanij Ch. 1. [Creating a tool for automatic text analysis in the interest of socio-humanitarian research. P.1.] (In print).
2. Heiden S. The TXM platform: Building open-source textual analysis software compatible with the TEI encoding scheme //24th Pacific Asia conference on language, information and computation. – Institute for Digital Enhancement of Cognitive Development, Waseda University, 2010. – P. 389-398.
3. Kilgarriff A. et al. The Sketch Engine: ten years on //Lexicography. – 2014. – V. 1. – №. 1. – P. 7-36.
4. Evert S., Hardie A. Twenty-first century corpus workbench: Updating a query architecture for the new millennium. – 2011.
5. Zolotova G.A. Sintaksicheskij slovar': Repertuar ehlementarnyh edinic russkogo sintaksisa [Syntax dictionary: The repertoire of elementary units of Russian syntax]. М.: Editorial URSS, Ed. 3., P. 430, 2006.