ISSN 2071-8594

Russian academy of sciences

Editor-in-Chief

Gennady Osipov

M.Y. Chesnokov Time series anomaly detection based on DBSCAN ensembles

Abstract.

The quality of anomaly detection algorithms highly depends on the input parameters and internal structure of dataset, in addition this problem usually occurred in unsupervised setting leading to the conceptual complexity of quality measurement. In practice there is a significant variance of results of anomaly detection due to the huge amount of datasets under consideration having diverse internal structure. Outlier ensemble is a kind of technique which can improve the variance of anomaly detection and increase the overall quality of identification. In this paper we investigate the problem of anomaly detection in time series in unsupervised setting, propose the method of outlier ensemble construction based on DBSCAN algorithm, which uses the time series internal structure for adaptive input parameters selection. Experiments on synthetic and real datasets show the decrease of variance and high quality of method compared to popular techniques such as Median Absolute Deviation, One Class SVM, Isolation Forest, Local Outlier Factor and simple DBSCAN.

Keywords:

anomaly detection, time series, unsupervised ensembles, DBSCAN.

PP. 99-107.

References

1. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 15.
2. Zimek, A., Campello, R. J., & Sander, J. (2014). Ensembles for unsupervised outlier detection: challenges and research questions a position paper. Acm Sigkdd Explorations Newsletter, 15(1), 11-22.
3. Aggarwal, C. C. (2016). Outlier Analysis Second Edition.
4. Gupta, M., Gao, J., Aggarwal, C. C., & Han, J. (2014). Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering, 26(9), 2250-2267.
5. Aggarwal, C. C., & Sathe, S. (2015). Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explorations Newsletter, 17(1), 24-47.
6. Aggarwal, C. C., & Sathe, S. (2017). Outlier Ensembles: An Introduction. Springer.
7. Aggarwal, C. C. (2013). Outlier ensembles: position paper. ACM SIGKDD Explorations Newsletter, 14(2), 49-58.
8. Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, No. 34, pp. 226-231).
9. Chandola, V., Banerjee, A., & Kumar, V. (2012). Anomaly detection for discrete sequences: A survey. IEEE Transactions on Knowledge and Data Engineering, 24(5), 823-839.
10. Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000, May). LOF: identifying density-based local outliers. In ACM sigmod record (Vol. 29, No. 2, pp. 93-104). ACM.
11. Shewhart, W. A. (1931). Economic control of quality of manufactured product. ASQ Quality Press.
12. Leys, C., Ley, C., Klein, O., Bernard, P., & Licata, L. (2013). Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology, 49(4), 764-766.
13. Vapnik, V. (2013). The nature of statistical learning theory. Springer science & business media.
14. Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008, December). Isolation forest. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on (pp. 413-422). IEEE.
15. Schubert, E., Wojdanowski, R., Zimek, A., & Kriegel, H. P. (2012, April). On evaluation of outlier rankings and outlier scores. In Proceedings of the 2012 SIAM International Conference on Data Mining (pp. 1047-1058). Society for Industrial
and Applied Mathematics.
16. Lazarevic, A., & Kumar, V. (2005, August). Feature bagging for outlier detection. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (pp. 157-166). ACM.
17. Zhou, H., Wang, P., & Li, H. (2012). Research on adaptive parameter determination in DBSCAN algorithm [J]. Journal of Xi'an University of Technology, 28(3), 289-292.
18. Karami, A., & Johansson, R. (2014). Choosing DBSCAN parameters automatically using differential evolution. International Journal of Computer Applications, 91(7).
19. Kut, A., & Birant, D. (2006). Spatio-temporal outlier detection in large databases. CIT. Journal of computing and information technology, 14(4), 291-297.
20. Yahoo! labs. Webscope dataset ydata-labeled-time-series-anomalies-v1_0 [Online]. Available at: http://webscope.sandbox.yahoo.com/ (accessed August 25, 2017).