ISSN 2071-8594

Russian academy of sciences


Gennady Osipov

Yu. A. Dubnov Feature Selection Method Based on a Probabilistic Approach and Cross-Entropy Metric for Image Recognition Problem


The paper considers the problem of feature selection in the classification problem. A method for selecting informative features based on a probabilistic approach and cross-entropy metrics is proposed. Several variants of the information criterion for selecting features for a binary classification problem are considered, as well as its generalization to the case of a multiclass problem. Demonstration examples of the proposed method for the task of image recognition from the mnist collection are given.


feature selection, classification, cross-entropy.

PP. 78-85.

DOI 10.14357/20718594200206


1. J. Friedman, T. Hastie, and R. Tibshirani. Elements of Statistical Learning: Prediction, Inference and Data Mining. Springer, 2001.
2. C. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics), Springer, 758 p., 2006.
3. E. Alpaydin. Introduction to Machine Learning. MIT Press, 3rd ed., 640 p., 2014
4. I.T. Jolliffe. Principal Component Analysis, Series: Springer Series in Statistics, 2nd ed., Springer, NY, XXIX, 487p., 2002.
5. G.J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. Wiley Interscience, 2004.
6. P. Comon, C. Jutten. Handbook of Blind Source Separation, Independent Component Analysis and Applications. Academic Press, Oxford UK., 2010.
7. Michael W. Berry; et al. Algorithms and Applications for Approximate Nonnegative Matrix Factorization // Computational Statistics & Data Analysis, vol.52, p.155-173, 2007.
8. L. van der Maaten, G. Hinton. Visualizing High-Dimensional Data Using t-SNE // Journal of Machine Learning Research, vol.9, p.2579-2605, 2008.
9. M.A. Carreira-Perpinan. A review of dimension reduction techniques. Technical report CS-96-09, Department of Computer Science, University of Sheffield, 1997.
10. Imola K. Fodor. A survey of dimension reduction techniques, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, 2002.
11. P. Cunningham. Dimension Reduction. Technical Report UCD-CSI-2007-7, University College Dublin, 2007.
12. A. Blum, P. Langley. Selection of relevant features and examples in machine learning // Artificial Intelligence, vol.97(1-2), p.245-271, 1997.
13. J. Abellán, J.G. Castellano. Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy // Entropy, vol.19, no.6, 247, 2017.
14. H.C. Peng, F. Long, C. Ding. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy // IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27(8), p.1226-1238, 2005.
15. Y. Zhang, S. Li, T. Wang, Z. Zhang. Divergence-based feature selection for separate classes // Neurocomputing, vol.101, p.32-42, 2013.
16. Dan Claudiu Ciresan, Ueli Meier, Luca Maria Gambardella, Juergen Schmidhuber. Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition // Neural Computation, Vol. 22, Num. 12, 2010.
17. Dan Cireşan, Ueli Meier, Juergen Schmidhuber. Multi-column Deep Neural Networks for Image Classification // CVPR 2012, p. 3642-3649.