ISSN 2071-8594

Russian academy of sciences

Editor-in-Chief

Gennady Osipov

Ch. Yu. Brester, V.V. Stanovov, O.E. Semenkina, E.S. Semenkin About the use of evolutionary algorithms in big data analysis

Abstract.

This article is a survey: several examples demonstrate the expediency of using evolutionary algorithms in Big Data analysis. Evolutionary algorithms have evident advantages: their high scalability and flexibility, ability to solve global optimization problems and optimize several criteria simultaneously are essential for feature selection, instance selection and missing-data imputation problems. Moreover, we illustrate the use of evolutionary algorithms in combination with such machine learning tools as neural networks and fuzzy systems. Our examples show that Evolutionary Machine Learning is getting more and more applicable in data processing and we anticipate seeing the further development of this area especially in the sense of Big Data.

Keywords:

evolutionary algorithms, Big Data, feature selection, instance selection, missing-data imputation.

PP. 82-93.

REFERENCES

1. Qiu, J., Wu, Q., Ding, G., Xu, Y., Feng, Sh. 2016. A survey of machine learning for big data processing. EURASIP J. Adv. Signal Process. 67. doi: 10.1186/s13634-016-0355-x.
2. Zhang, Z. 2015. Missing values in big data research: some basic skills. Annals of Translational Medicine. 3(21). 323 p. doi:10.3978/j.issn.2305-5839.2015.12.11.
3. Bagherzadeh-Khiabani, F., Ramezankhani, A., Azizi, F., Hadaegh, F., Steyerberg, E.W., Khalili, D. 2016. A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results. J Clin Epidemiol. 71: 76–85.
4. Kohavi, R., John, G.H. 1997. Wrappers for feature subset selection. Artificial Intelligence. 97: 273–324.
5. Holland, J.H. 1975. Adaptation in natural and artificial systems. J.H. Holland – Ann Arbor. MI: University of Michigan Press.
6. Goldberg, D.E. 1989. Genetic algorithms in search, optimization and machine learning. Addison- Wesley.
7. Gladkov, L.A., Kureychik, V.V., Kureychik, V.M. 2006. Geneticheskie algoritmy [Genetic algorithms]. Moscow: Fizmatlit.
8. Karpov, V.E. 2003. Evolyutsionnoye modelirovaniye. Problemy formy i soderzhaniya [Evolutionary modeling. Problems of form and content]. Novosti iskusstvennogo intellekta. №5. 35-46.
9. Karpov, V.E. 2013. Methodological problems in evolutionary computation. Scientific and Technical Information Processing. 40(5): 286-291.
10. Khritonenko, D., Semenkin, E. 2013. Distributed self-configuring evolutionary algorithms for artificial neural networks design. Vestnik SibSAU. 4 (50): 112–116.
11. Kureychik, V.M., Lebedev, B.K., Lebedev, O.K., Chernyshev, Yu., O. 2004. Adaptatsiya na osnove samoobucheniya [Adaptation based on learning]. Rostov-on-Don: RGASKhM GOU.
12. Redko, V. G. 2008. Modeli adaptivnogo povedenija – biologicheski inspirirovannyj podhod k iskusstvennomu intellektu [Models of adaptive behavior – biologically inspired approach to artificial intelligence]. Iskusstvennyj intellekt i prinjatie reshenij. 2: 11-23.
13. Brester, Ch., Semenkin, E. 2015. Cooperative Multiobjective Genetic Algorithm with Parallel Implementation. Advances in Swarm and Computational Intelligence, LNCS 9140: 471–478.
14. Akhmedova, S., Semenkin, E. 2013. Co-operation of biology related algorithms. IEEE Congress on Evolutionary Computation, CEC 2013: 2207–2214.
15. Semenkin, E., Semenkina, M. 2012. Self-configuring genetic algorithm with modified uniform crossover operator. Advances in Swarm Intelligence: 414–421.
16. Ryzhikov, I., Semenkin, E. 2013. Evolutionary strategies algorithm based approaches for the linear dynamic system identification. Lecture Notes in Computer Science, Т. 7824 LNCS: 477–484.
17. Zaloga, A., Yakimov, I., Burakov, S., Semenkin, E., Akhmedova, S., Semenkina, M., Sopov, E. 2015. On the application of co-operative swarm optimization in the solution of crystal structures from x-ray diffraction data. Lecture Notes in Computer Science, Т. 9140: 89–96.
18. Sergienko, R.B. 2010. Method of fuzzy classifier design with self-tuning coevolutionary algorithms. Scientific and Technical Information Processing. 3: 98-106.
19. Semenkina, M.E. 2013. Effectiveness investigation of adaptive evolutionary algorithms for data mining information technology design. Scientific and Technical Information Processing. 1: 13-24.
20. Bukhtoyarov, V.V. 2010. Evolutionary method for design of neural networks ensemble prediction. Scientific and Technical Information Processing. 3: 89-97.
21. Polkovnikova, N.A., Kureichik, V.M. 2015. Mnogokriterialnaya optimizatsiya na osnove evoliutsionnykh algoritmov [Multiobjective optimization on the base of evolutionary algorithms]. Izvestiya SFedU. Engineering Sciences. 2(163): 149-162.
22. Ishibuchi, H., Mihara, S., Nojima, Y. 2013. Parallel Distributed Hybrid Fuzzy GBML Models With Rule Set Migration and Training Data Rotation. IEEE Transactions on fuzzy systems. 21(2): 355–368.
23. Nepomnyashchikh, V.A., Popov, E.E., Redko, V.G. 2008. A bionic model of adaptive searching behavior. Journal of Computer and Systems Sciences International. 47(1): 78-85.
24. Lin, Q., Liu, W., Peng, H., Chen, Y. 2013. Efficient Genetic Algorithm for High-Dimensional Function Optimization. Proceedings of the 9th International Conference on Computational Intelligence and Security: 255–59.
25. Groshev, S.V., Karpenko, A.P., Martynyuk, V.A. 2016. Effektivnost populiatsionnykh algoritmov pareto-approksimatsii. Eksperimentalnoe sravneniye [The effectiveness of population-based Pareto-approximation algorithms. Experimental comparison]. On-line Journal "Naukovedenie". 8(4). doi: 10.15862/67EVN416.
26. Mahajan, R., Kaur, G. 2013. Neural Networks using Genetic Algorithms. International Journal of Computer Applications. 77(14): 6–11.
27. Thounaojam, D.M., Khelchandra, T., Singh, K.M., Roy, S. 2016. A Genetic Algorithm and Fuzzy Logic Approach for Video Shot Boundary Detection. Computational Intelligence and Neuroscience. 2016: 11. doi:10.1155/2016/8469428.
28. Cano, J.R., Herrera, F., Lozano, M. 2007. Evolutionary Stratified Training Set Selection for Extracting Classification Rules with trade off Precision-Interpretability. Data & Knowledge Engineering archive, 60(1): 90–108.
29. Zitzler, E., Laumanns, M., Thiele, L. 2002. SPEA2: Improving the Strength Pareto Evolutionary Algorithm for Multiobjective Optimization. Evolutionary Methods for Design Optimisation and Control with Application to Industrial Problems EUROGEN 2001 3242 (103): 95-100.
30. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation. 6 (2): 182-197.
31. Wang, R. 2013. Preference-Inspired Co-evolutionary Algorithms. A thesis submitted in partial fulfillment for the degree of the Doctor of Philosophy, University of Sheffield. 231 p.
32. Brester, C., Semenkin, E., Sidorov, M. 2016. Multi-objective heuristic feature selection for speech-based multilingual emotion recognition. Journal of Artificial Intelligence and Soft Computing Research. 6(4): 243–253.
33. Fernandez, A., Garcia, S., Luengo, J., Bernado-Mansilla, E., Herrera, F. 2010. Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy, and Comparative Study. IEEE Transactions on Evolutionary Computation. 14(6): 913–941.
34. Booker, L.B., Goldberg, D.E., Holland, J.H. 1989. Classifier systems and genetic algorithms. Artif. Intell. 40(1–3): 235–282.
35. Stanovov, V., Semenkin, E., Semenkina, O. 2015. Self-configuring hybrid evolutionary algorithm for fuzzy classification with active learning. IEEE Congress on evolutionary computation (CEC 2015, Japan). doi: 10.1109/CEC.2015.7257108.
36. Alcalá-Fdez, J., Sánchez, L., Garcia, S., Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F. 2009. KEEL: A software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 13(3): 307–318.
37. Asuncion, A., Newman, D. 2007. UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. URL: http://www.ics.uci.edu/~mlearn/MLRepository.html.
38. Alcala-Fdez, J., Alcala, R., Herrera, F. 2011. A fuzzy association rulebased classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Trans. Fuzzy Syst. 19(5): 857–872.
39. Bacardit, J., Burke, E.K., Krasnogor, N. 2009. Improving the scalability of rule-based evolutionary learning. Memetic Comput. J. 1(1): 55–67.
40. Berlanga, F.J., Rivera, A.J., del Jesus, M.J., Herrera, F. 2010. GP-COACH: Genetic programming-based learning of compact and accurate fuzzy rulebased classification systems for high-dimensional problems. Inf. Sci. 180(8): 1183–1200.
41. Sanz, J.A., Fernandez, A., Bustince, H., Herrera, F. 2010. Improving the performance of fuzzy rule-based classification systems with interval-valued fuzzy sets and genetic amplitude tuning. Inf. Sci. 180(19): 3674–3685.
42. Sanz, J., Fernandez, A., Bustince, H., Herrera, F. A genetic tuning to improve the performance of fuzzy rule-based classification systems with interval-valued fuzzy sets: Degree of ignorance and lateral position. Int. J. Approx. Reason. 52(6): 751–766.
43. Venkatadri, M., Srinivasa, Rao K. 2010. A multiobjective genetic algorithm for feature selection in data mining. International Journal of Computer Science and Information Technologies, 1(5): 443–448.
44. Brester, C., Kauhanen, J., Tuomainen, T.P., Semenkin, E., Kolehmainen, M. 2016. Comparison of two-criterion evolutionary filtering techniques in cardiovascular predictive modelling. Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics (ICINCO’2016), Lisbon, Portugal. 1: 140–145.
45. Sidorov, M., Brester, C., Semenkin, E., Minker, W. 2014. Speaker State Recognition with Neural Network-based Classification and Self-adaptive Heuristic Feature Selection. Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO’2014), Vienna, Austria. 1: 699–703.
46. Schafer, J.L., Graham, J.W. 2002. Missing Data: Our View of the State of the Art. Psychological Methods. 7(2): 147–177.