ISSN 2071-8594

Russian academy of sciences

Editor-in-Chief

Gennady Osipov

A.S. Mikhailov, T.V. Sokolova, A.А. Chepovskiy, A.M. Chepovskiy Identification of thematic orientation of natural language texts

Abstract.

This paper presents a novel method of text categorization based on the use of specialized dictionaries. The method is applied to texts of mass media and short comments on the Internet. The analysis indicates the effectiveness and efficiency of word stemming for text categorization problem and the validity of the proposed method.

Keywords:

automatic text categorization, specialized dictionary, stemming.

PP. 9-17

REFERENCES

1. Babenko M., Kurshev Ye., Odintsov O., Suleymanova Ye., Chepovskiy A. Sistema klassifikatsii tekstov informatsionnykh soobshcheniy na russkom yazyke "AKTIS" // V kn.: Trudy mezhdunarodnoy konferentsii "Programmnye sistemy: teoriya i prilozheniya", IPS RAN, g. Pereslavl-Zalesskiy, may 2004. — M.: Fizmatlit, 2004. T.2. S.7-20.
2. Batura T.V. Formalnye metody opredeleniya avtorstva teksta. Vestnik NGU. Seriya: Informatsionnye tekhnologii. 2012. Tom10. Vypusk 4. S. 81-94.
3. Mbaykodzhi E., Dral A.A., Sochenkov I.V.. Metod avtomaticheskoy klassifikatsii korotkikh tekstovykh soobshcheniy. // Informatsionnye tekhnologii i vychislitelnye sistemy. – 2012. – №3. S.93-102.
4. Boyarskiy K.K., Kanevskiy Ye.A., Saganenko G.I. K voprosu avtomaticheskoy klassifikatsii tekstov. Ekonomiko-matematicheskie issledovaniya: matematicheskie modeli i informatsionnye tekhnologii.VII - SPb: SPb EMI RAN. Nestor-Istoriya. 2009 2009. S. 252-273.
5. Gusev S.V., Polyakov I.V., Chepovskiy A.M. Primenenie statisticheskoy modeli teksta v informatsionnykh sistemakh // V kn.: Yershovskaya konferentsiya po informatike 2011. Rabochiy seminar «Naukoemkoe programmnoe obespechenie». — Novosibirsk: Institut sistem informatiki im. A.P. Yershova, 2011. S. 69-72.
6. Kukushkina O.V., Polikarpov A.A., Khmelev D.V. Opredelenie avtorstva teksta s ispolzovaniem bukvennoy i grammaticheskoy informatsii // Problemy peredachi informatsii. M.: Nauka, 2001. T. 37, № 2. S. 96-108.
7. Chepovskiy A.M. Voprosy obrabotki tekstovykh soobshcheniy na estestvennykh yazykakh // V kn.: SCVRT2013-14 Trudy Mezhdunarodnoy nauchnoy konferentsii Mezhdunarodnogo tsentra po yadernoy bezopasnosti Instituta fiziko-tekhnicheskoy informatiki. Protvino:
Izd-vo IFTI, 2014. S. 250-254.
8. Andreev A.M., Berezkin D.V., Syuzev V.V., Shabanov V.I. Modeli i metody avtomaticheskoy klassifikatsii tekstovykh dokumentov. — Vestnik MGTU im. N.E. Baumana. Ser. «Priborostroenie», 2003, № 4. S. 64-94.
9. Bolkhovityanov A.V., Chepovskiy A.M. Metody avtomaticheskogo analiza slovoform // Informatsionnye tekhnologii. 2011. № 4 (176). S. 24-29.
10. Bolkhovityanov A. V., Chepovskiy A. M. Algoritmy morfologicheskogo analiza kompyuternoy lingvistiki. Uchebnoe posobie. — M.: MGUP im. Ivana Fedorova, 2013. — 198 s.
11. Gusev S.V., Chepovskiy A.M. Model dlya identifikatsii estestvennogo yazyka teksta // Biznes-informatika, 2011. № 3(17). C. 31-35