N.L. Rusnachenko N.V. Loukachevitch Methods of lexicon integration with machine learning for sentiment analysis system
This paper describes the application of SVM classifier for sentiment classification of Russian Twitter messages in the banking and telecommunications domains of SentiRuEval-2016 competition. Varieties of features were implemented to improve the quality of message classification, especially sentiment score features based on a set of sentiment lexicons. We study the impact of different training types (balanced/imbalanced) and its volumes, and advantages of applying several lexicon-based features. Before SentiRuEval-2016, the classifier was tuned on the previous year collection of the same competition (SentiRuEval-2015) to obtain a better settings set. The created system achieved the third place at SentiRuEval-2016 in both tasks. The experiments performed after the SentiRuEval-2016 evaluation allowed us to improve our results by searching for a better ’Cost’ parameter value of SVM classifier and extracting more information from lexicons into new features. The final classifier achieved results close to the top results of the competition.
machine learning, SVM, sentiment analysis, lexicons, SentiRuEval-2016
1. Loukachevitch N., Rubtsova Yu. Entity-Oriented Sentiment Analysis of Tweets: Results and Problem // XVII International Conference DAMDID/RCDL’2015 ««Data Analytics and Management in Data Intensive Domains»., № 2015.
2. Loukachevitch N., Blinov P., Kotelnikov E., Rubtsova Y., Ivanov V., Tutubalina E. SentiRuEval: testing object-oriented sentiment analysis systems in Russian // Proceedings of International Conference Dialog-2015, Vol. 2, 2015. pp. 3–13.
3. Loukachevitch N., Rubtsova. SentiRuEval-2016: Overcoming Time Gap and Data Sparsity in Tweet Sentiment Analysis, Proceedings of International Conference Dialog-2016 // Proceedings of International Conference Dialog-2016, 2016.
4. Nakov P., Kozareva Z., Ritter A., Rosental S. SemEval-2013 Task 2: Sentiment Analysis in Twitter // Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Seventh International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, June 2013. pp. 312–320.
5. Rosental S., Nakov P., Ritter A., Stoyanov V. SemEval-2014 Task 2: Sentiment Analysis in Twitter // Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), August 2014. pp. 73–80.
6. Turney P. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews // Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002. pp. 417–424.
7. Saif M., Kiritchenko S., Xiaodan Z. NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets // In Proceedings of the seventh international workshop on Semantic Evaluation Exercises (SemEval-2013), Vol. 2, June 2013. pp. 321–327.
8. Severyn A., Moschitti A. On the Automatic Learning of Sentiment Lexicons // Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL, 2015. pp. 1397--1402.
9. Pang B., Lee L., Vaithyanathan S. Thumbs up: sentiment classification using machine learning techniques // In Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Association for Computational Linguistics, Vol. 10, 2002. pp. 79–86.
10. Chih-Chung C., Chih-Jen L. LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011. pp. 2(3):27:1–27:27.
11. Loukachevitch N., Levchik A. Creating a General Russian Sentiment Lexicon// Open Semantic Technologies for Intelligent Systems (OSTIS- 2016), 2016. pp. 377–382.
12. Rubtsova Yu. Сonstructing a corpus for sentiment classification training// Software & Systems, No. №1(109), 2015. pp.72-78.
13. Asch V.V. Macro-and micro-averaged evaluation measure [[basic draft]], 2013.
14. Rusnachenko N. Use of lexicons to improve quality of sentiment classification // Proceedings of International Conference Dialog-2016, June 1-4 2016.
15. Arhnipenko K., Kozlov I., Trofimovich J., Skorniakov K., Gomzin A., Turdakov D. Comparison of neural network architectures for sentiment analysis of russian tweets // Proceedings of the International Conference “Dialogue 2016”, June 2016.