ISSN 2071-8594

Russian academy of sciences

Editor-in-Chief

Gennady Osipov

V. E. Uvarov, A. A. Popov, T. A. Gultyaeva Analyzing Incomplete Sequences Using Gaussian Hidden Markov Models

Abstract.

This paper studies the methods of incomplete sequence analysis using Gaussian hidden Markov models (HMMs). We present Marginalization algorithm, which can be applied both for training HMM on incomplete sequences and for recognition of incomplete sequences using HMMs. In addition, we present a modification of Viterbi algorithm that can be used for decoding and imputation of incomplete sequences using HMM. Both presented algorithms significantly outperform the standard methods of incomplete sequence analysis, namely: elimination of missing observations in sequences followed by “gluing” of the remaining subsequences into one sequence and imputation of missing observations with the mean of the neighboring observations.

Keywords:

hidden Markov models, machine learning, sequences, Baum-Welch algorithm, missing observations, incomplete data, Viterbi algorithm, classification, decoding, imputation.

PP. 17-30.

REFERENCES

1. Baum L. E., Petrie T. Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics, 1966, vol. 37, pp. 1554-1563.
2. Baum L. E., Egon J. A. An inequality with applications to statistical estimation for probabilistic functions of a Markov process and to a model for ecology. Bulletin of the American Meteorological Society, 1967, vol. 73, pp. 360-363.
3. Gales M., Young S. The Application of Hidden Markov Models in Speech Recognition. Signal Processing, 2007, vol. 1, no. 3, pp. 195-304.
4. Rabiner L. R. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, 1989, vol. 77, pp. 257-285.
5. Frequencies of “hidden Markov models” keyword in literature published between 1800 and 2008 year provided by Google Ngram Viewer. Available at: https://books.google.com/ngrams/graph?content=hidden+Markov+models&year_start=1800&year_end=2008&corpus=15&s
moothing=3&share=&direct_url=t1%3B%2Chidden%20Markov%20models%3B%2Cc0
6. Cooke M., Green P., Josifovski L., Vizinh A. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, 2001, vol. 34, no. 3, pp. 267-285.
7. Lee D., Kulic D., Nakamura Y. Missing motion data recovery using factorial hidden Markov models. IEEE International Conference on Robotics and Automation, Pasadena, California, 2008, pp. 1722-1728.
8. Gultyaeva A., Popov A., Kokoreva V., Uvarov V. Classification of observation sequences described by Hidden Markov Models. Proceedings of the International Workshop Applied Methods of Statistical Analysis: Nonparametric approach, Novosibirsk, 2015, pp. 136-143.
9. Gultyaeva A., Popov A., Kokoreva V., Uvarov V. Training Hidden Markov Models on Incomplete Sequences. Proceedings of 13th International Conference on Actual Problems of Electronic Instrument Engineering, Novosibirsk, 2016. – Vol. 1. — pp. 317-320.
10. Gultyaeva A., Popov A., Sautin A. Metody statisticheskogo obucheniia v zadachakh regressii i klassifikatsii [Methods of Statistical Learning for the Problems of Regression and Classification]. Novosibirsk, 2016, 322 p.
11. Popov A., Gultyaeva A., Uvarov V. Issledovanie podkhodov k obucheniiu skrytykh markovskikh modelei pri nalichii propuskov v posledovatel'nostiakh [Training Hidden Markov Models on Incomplete Sequences]. Materialy konferentsii «Obrabotka informatsii i matematicheskoe modelirovanie», Rossiiskaia nauchno tekhnicheskaia konferentsiia [Proceeding of Russian Scientific conference “Information processing and mathematical modelling”]. Novosibirsk, 2016, pp. 125-139.
12. Popov A., Gultyaeva A., Uvarov V. A Comparison of Some Methods for Training Hidden Markov Models on Sequences with Missing Observations. Proceedings of 11th International Forum on Strategic Technology IFOST-2016, 2016, vol. 1, pp. 431-435
13. Popov A., Gultyaeva A., Uvarov V. Issledovanie Metodov Obucheniia Skrytykh Markovskikh Modelei pri Nalichii Propuskov v Posledovatel'nostiakh [Training Hidden Markov Models on Sequences with Missing Observations]. Trudy XIII mezhdunarodnoi konferentsii «Aktual'nye problemy elektronnogo priborostroeniia» [Proceeding of 13-th international conference “Actual Problems of Electronic Instrument Engineering”]. Novosibirsk, 2016, vol. 8, pp. 149-152.
14. Baum L. E., Sell G. R. Growth functions for transformations on manifolds. Pacific Journal of Mathematics, 1968, vol. 27, no. 2, pp. 211-227.
15. Dempster A. P., Laird N. M., Rubin D. B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 1977, vol. 39, pp. 1-38.
16. Li X. Training Hidden Markov Models with Multiple Observations – A Combinatorial Method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, vol. PAMI-22, no. 4, pp. 371-377.
17. Baum L. E., Petrie T., Soules G., Weiss N. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 1970, vol. 41, no. 1, pp. 164-171.
18. Viterbi A. J. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. IEEE Transactions on Information Theory, 1967, no. 13, pp. 260-269.
19. Gelman A., Hill J. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, 2006.