ISSN 2071-8594

Russian academy of sciences


Gennady Osipov

N.V. Korepanova Machine learning for treatment optimization in subgroups of patients


In clinical trials comparing experimental and control treatment the effect of treatment often depends on the range of patient’s characteristics (biomarkers) such as clinical, anthropological, genetic, psychological, social characteristics and others. Personalized medicine aims at finding such dependencies to tailor treatment strategies to a patient. This paper presents an overview of the approaches to data analysis of clinical trials intended for identification of influential biomarkers and subgroups of patients, where experimental and control treatment differ significantly in efficiency.


personalized medicine, subgroup analysis, clinical trials, machine learning.

PP. 54-66.


1. Brookes S.T., Whitley E., Peters T.J., Mulheran P.A., Egger M., Smith G.D. Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives // Health Technology Assessment, Vol. 5, No. 33, 2001. pp. 1-56.
2. Cook D.I., Gebski V.J., Keech A.C. Subgroup analysis in clinical trials // Medical Journal of Australia, Vol. 180, No. 6, 2004. pp. 289-291.
3. Grouin J.M., Coste M., Lewis J. Subgroup Analyses in Randomized Clinical Trials: Statistical and Regulatory Issues // Journal of Biopharmaceutical Statistics, Vol. 15, No. 5, 2005. pp. 869-882.
4. Kent D.M., Rothwell P.M., Ioannidis J.P.A., Altman D.G., Hayward R.A. Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal // Trials, Vol. 11, No. 1, 2010. P. 85.
5. Pocock S.J., Assmann S.E., Enos L.E., Kasten L.E. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems // Statistics in Medicine, Vol. 21, No. 19, 2002. pp. 2917-2930.
6. Rothwell P.M. Subgroup analysis in randomised controlled trials: importance, indications, and interpretation // Lancet, Vol. 365, No. 9454, 2005. pp. 176-186.
7. Sleight P. Debate: Subgroup analyses in clinical trials — fun to look at, but don’t believe them! // Current Controlled Trials in Cardiovascular Medicine, Vol. 1, No. 1, 2000. pp. 25-27.
8. Sun X., Briel M., Walter S.D., Guyatt G.H. Is a subgroup effect believable? Updating criteria to evaluate the credibility of subgroup analyses // British Medical Journal, Vol. 340, 2010. pp. 850-854.
9. Wang R., Lagakos S.W., Ware J.H., Hunter D.J., Drazen J.M. Statistics in Medicine - Reporting of Subgroup Analyses in Clinical Trials // The New England Journal of Medicine, Vol. 357, 2007. pp. 2189-2194.
10. Lipkovich I., Dmitrienko A., DAgostino R.B. Tutorial in Biostatistics: Data-Driven Subgroup Identification and Analysis in Clinical Trials // Statistics in Medicine, Vol. 36, No. 1, 2016. pp. 136-196.
11. Meinshausen N., Meier L., Bühlmann P. p-Values for High-Dimensional Regression // Journal of the American Statistical Association, Vol. 104, No. 488, 2009. pp. 1671-1681.
12. Lockhart R., Taylor J., Tibshirani R.J., Tibshirani R. A Significance Test for the Lasso // Annals of Statistics, Vol. 42, No. 2, 2014. pp. 413-468.
13. Freidlin B., Simon R. Adaptive Signature Design: An Adaptive Clinical Trial Design for Generating and Prospectively Testing A Gene Expression Signature for Sensitive Patients // Clinical Cancer Research, Vol. 11, No. 21, 2005. pp. 7872-7878.
14. Meinshausen N., Bühlmann P. Stability selection // Journal of the Royal Statistical Society, Series B, Vol. 72, No. 4, 2010. pp. 417-423.
15. Gunter L., Zhu J., Murphy S. Variable Selection for Qualitative Interactions in Personalized Medicine while Controlling The Familywise Error Rate // Journal of Biopharmaceutical Statistics, Vol. 21, No. 6, 2011. pp. 1063-1078.
16. Simon R.M., Subramanian J., Li M.C., Menezes S. Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data // Briefings in Bioinformatics, Vol. 12, No. 3, 2011. pp. 203-214.
17. Good P. Resampling Methods: a Practical Guide to Data Analysis. 3rd ed. Boston: Birkhauser, 2005.
18. Loh W.Y., Shih Y.S. Split Selection Methods for Classification Trees // Statistica Sinica, Vol. 7, 1997. pp. 815-840.
19. Hothorn T., Hornik K., Zeileis A. Unbiased Recursive Partitioning: A Conditional Inference Framework // Journal of Computational and Graphical Statistics, Vol. 15, No. 3, 2006. pp. 651-674.
20. Hesterberg T., Moore D.S., Monaghan S., Clipson A., R. E. Bootstrap methods and permutation tests. Vol 5. // In: The Practice of Business Statistics / Ed. by Moore D.S. W. H. Freeman, 2005. pp. 1-70.
21. Varma S., Simon R. Bias in error estimation when using cross-validation for model selection // BMC Bioinformatics, Vol. 7, 2006. P. 91.
22. Foster J.C., Taylor J., Ruberg S.J. Subgroup identification from randomized clinical trial data. // Statistics in Medicine, Vol. 30, No. 24, 2011. pp. 2867-2880.
23. Dixon D.O., Simon R. Bayesian Subset Analysis // Biometrics, Vol. 47, No. 3, 1991. pp. 871-881.
24. Berger J.O., Wang X., Shen L. A Bayesian Approach to Subgroup Identification // Journal of Biopharmaceutical Statistics, Vol. 24, No. 1, 2014. pp. 110-129.
25. Xu Y., Trippa L., Müller P., Ji Y. Subgroup-Based Adaptive (SUBA) Designs for Multi-Arm Biomarker Trials // Statistics in Biosciences, Vol. 8, No. 1, 2016. pp. 159-180.
26. Xu Y., Yu M., Zhao Y.Q., Li Q., Wang S., Shao J. Regularized Outcome Weighted Subgroup Identification for Differential Treatment Effects // Biometrics, Vol. 71, No. 3, Sep 2015. pp. 645-653.
27. Little R.J., Rubin D.R. Causal effects in clinical and epidemiological studies via potential outcomes. // Annual Review of Public Health, Vol. 21, 2000. pp. 121-145.
28. Cox D.R. Regression Models and Life-Tables // Journal of the Royal Statistical Society. Series B, Vol. 34, No. 2, 1972. pp. 187-220.
29. Breiman L., Friedman J.H., Olshen R.A., Stone C.J. Classification and Regression Trees. Wadsworth: Belmont, CA, 1984.
30. Royston P., Altman D.G. Regression Using Fractional Polynomials of Continuous Covariates: Parsimonious Parametric Modelling // Journal of the Royal Statistical Society. Series C, Vol. 43, No. 3, 1994. pp. 429-467.
31. Royston P., Sauerbrei W. A new approach to modelling interactions between treatment and continuous covariates in clinical trials by using fractional polynomials // Statistics in Medicine, Vol. 23, No. 16, 2004. pp. 2509-2525.
32. Tibshirani R. Regression shrinkage and selection via the lasso. // Journal of the Royal Statistical Society. Series B, Vol. 58, No. 1, 1996. pp. 267-288.
33. Imai K., Ratkovic M. Estimating Treatment Effect Heterogeneity in Randomized Program Evaluation // Tha Annals of Applied Statistics, Vol. 7, No. 1, 2013. pp. 443-470.
34. Cai T., Tian L., Peggy H W., Wei L.J. Analysis of randomized comparative clinical trial data for personalized treatment selections // Biometrics, Vol. 12, No. 2, 2011. pp. 270-282.
35. Song X., Pepe M.S. Evauating Markers for Selecting a Patient's Treatment // Biometrics, Vol. 60, No. 4, 2004. pp. 874-883.
36. Huang Y., Gilbert P.B., Janes H. Assessing Treatment-Selection Markers using a Potential Outcomes Framework // Biometrics, Vol. 68, No. 3, 2012. pp. 687-696.
37. Zhao L., Tian L., Cai T., Claggett B., Wei L.J. Effectively Selecting a Target Population for a Future Comparative Study // Journal of the American Statistical Association, Vol. 108, No. 502, 2013. pp. 527-539.
38. Breiman L. Random forests // Machine Learning, Vol. 45, No. 1, 2001. pp. 5-32.
39. Dusseldorp E., Conversano C., Van Os B.J. Combining an Additive and Tree-Based Regression Model Simultaneously: STIMA // Journal of Computational and Graphical Statistics, Vol. 19, No. 3, 2010. pp. 514-530.
40. Hodges J.S., Cui Y., Sargent D.J., Carlin B.P. Smoothing Balanced Single-Error-Term Analysis of Variance // Technometrics, Vol. 49, No. 1, 2007. pp. 12-25.
41. Gu X., Yin C., Lee J.J. Bayesian Two-step Lasso Strategy for Biomarker Selection in Personalized Medicine Development for Time-to-Event Endpoints // Contemporary Clinical Trials, Vol. 36, No. 2, 2013. pp. 642-650.
42. Negassa A., Ciampi A., Abrahamowicz M., Shapiro S., Boivin J.F. Tree-structured subgroup analysis for censored survival data: Validation of computationally inexpensive model selection criteria // Statistics and Computing, Vol. 15, No. 3, 2005. pp. 231-239.
43. Su X., Tsai C.L., Wang H., Nickerson D.M., Li B. Subgroup Analysis via Recursive Partitioning // Journal of Machine Learning Research, Vol. 10, 2009. pp. 141-158.
44. Su X., Zhou T., Yan X. Interaction Trees with Censored Survival Data // The International Journal of Biostatistics, Vol. 4, No. 1, 2008. P. 2.
45. Loh W.W., He X., Man M. A regression tree approach to identifying subgroups with differential treatment effects // Statistics in Medicine, Vol. 34, No. 11, 2015. pp. 1818-1833.
46. Loh W.Y., Fu H., Man M., Champion V., Yu M. Identification of subgroups with differential treatment effects for longitudinal and multiresponse variables // Statistics in Medicine, Vol. 35, No. 26, 2016. pp. 4837-4855.
47. Loh W.Y. Regression Trees with Unbiased Variable Selection and Interaction Detection // Statistica Sinica, Vol. 12, 2002. pp. 361-386.
48. Zeileis A., Hothorn T., Hornik K. Model-based Recurcive Partitioning // Journal of Computational and Graphical Statistics, Vol. 17, No. 2, 2008. pp. 492-514.
49. Dusseldorp E., Mechelen I.V. Qualitative Interaction Trees: a tool to identify qualitative treatment-subgroup interactions // Statistics in Medicine, Vol. 33, No. 2, 2014. pp. 219-237.
50. Tian L., Alizadeh A.A., Gentles A.J., Tibshirani R. A Simple Method for Estimating Interactions between a Treatment and a Large Number of Covariates // Journal of the Americal Statistical Association, Vol. 109, No. 508, 2014. pp. 1517-1532.
51. Jones H.E., Ohlssen D.I., Neuenschwander B., Racine A., Branson M. Bayesian Models for Subgorup Analysis in Clinical Trials // Clinical Trials, Vol. 8, No. 2, 2011. pp. 129-143.
52. Qian M., Murphy S.A. Performance guarantees for individualized treatment rules // The Annals of Statistics, Vol. 39, No. 2, 2011. pp. 1180-1210.
53. Zhao Y., Zheng D., Rush A.J., Kosorok M.R. Estimating individualized treatment rules using outcome weighted learning. // Journal of the American Statistical Association, Vol. 107, No. 449, 2012. pp. 1106-1118.
54. Lu W., Zhang H.H., Zeng D. Variable Selection for Optimal Treatment Decision // Statistical Methods in Medical Research, Vol. 22, No. 5, 2013. pp. 493-504.
55. Foster J., Taylor J.M.G., Kaciroti N., Nan B. Simple subgroup approximations to optimal treatment regimes from randomized clinical trial data // Biostatistics, Vol. 16, No. 2, 2015. pp. 368-382.
56. Zhang B., Tsiatis A.A., Davidian M., Zhang M., Laber E. Estimating Optimal Treatment Regimes from a Classification Perspective // Statistics, Vol. 1, No. 1, 2012. pp. 103-114.
57. Zhang B., Tsiatis A.A., Laber E.B., Davidian M. Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions // Biometrika, Vol. 100, No. 3, 2013. pp. 681-694.
58. Laber E.B., Zhao Y.Q. Tree-base methods for individualized treatment regimes // Biometrika, Vol. 102, No. 3, 2015. pp. 501-514.
59. Zhang Y., Laber E.B., Tsiatis A., Davidian M. Using decision lists to construct interpretable and parsimonious treatment regimes // Biometrics, Vol. 71, No. 4, 2015. pp. 895-904.
60. Fu H., Zhou J., Faries D.E. Estimating optimal treatment regimes via subgroup identification in randomized control trials and observational studies // Statistics in Mdeicine, Vol. 35, No. 19, 2016. pp. 3285-3302.
61. Lo V.S.Y. The true lift model: a novel data mining approach to response modeling in database marketing. // SIGKDD Explorations, Vol. 4, No. 2, 2002. pp. 78-86.
62. Larsen K. Net lift models: optimizing the impact of your marketing. // Predictive analytics world. 2011. Vol. Workshop presentation.
63. Robins J.M. Correcting for non-compliance in randomized trials using structural nested mean models // Communications in Statistics - Theory and Methods, Vol. 23, No. 8, 1994. pp. 2379-2412.
64. Robins J., Rotnitzky N. Estimation of treatment effects in randomised trials with non-comliance and a dichotomous outcome using structural mean models // Biometrika, Vol. 91, No. 4, 2004. pp. 763-783.
65. Jaskowski M., Jaroszewicz S. Uplift modeling for clinical trial data // ICML, 2012 workshop on machine learning for clinical data analysis. Edinburgh. Scotland. 2012.
66. Radcliffe N.J., Surry P.D. Differential analysis: modeling true response by isolating the effect of a single action. // Proceedings of credit scoring and credit control VI. 1999.
67. Radcliffe N.J., Surry P.D. Real-world uplift modeling with significance-based uplift trees., Portrait Technical Report TR-2011-1, stochastic solutions, Tech. rep. 2011.
68. Hansotia B., Rukstales B. Incremental Value Modeling // Journal of Interactive Marketing, Vol. 16, No. 3, 2002. pp. 35-46.
69. Chickering D.M., Heckerman D. A decision theoretic approach to targeted advertising. // Proceedings of the 16th conference in uncertainty in artificail intelligence (UAI'00). 2000. pp. 82-88.
70. Rzepakowski P., Jaroszewicz S. Decision trees for uplift modeling // Proceedings of the 10th IEEE International conference on data mining (ICDM). Sydney. Australia. 2010. pp. 441-450.
71. Rzepakowski P., Jaroszewicz S. Decision trees for uplift modeling wth single and multiple treatments // Knowledge and Information Systems, Vol. 32, No. 2, 2012. pp. 303-327.
72. Kuusisto F., Costa V.S., Nassif H., Burnside E., Page D., Shavlik J. Support vector machines for differenctial prediction // Proceedings of the ECML-PKDD. 2014.
73. Jaroszewicz S., L. Zaniewicz Ł. Székely regularization for uplift modeling. // In: Challenges in computational statistics and data mining. Springer International Publishing, 2016. pp. 135-154.
74. Zaniewicz L., Jaroszewicz S. Lp - Support vector machines for uplift modeling // Knowledge and Information Systems, Vol. 53, No. 1, 2017. pp. 269-296.
75. Guelman L., Guillen M., Perez-Marin A.M. Random forests for uplift modeling: an insurance customer retention case. Vol 115. // In: Modeling and simulation in engingeering, economics and management. Springer, Berlin, 2012. pp. 123-133.
76. Soltys M., Jaroszewicz S., Rzepakowski P. Ensemble methods for uplift modeling // Data Mining and Knowledge Discovery, Vol. 29, No. 6, 2015. pp. 1531-1559.
77. Chen G., Zhong H., Belousov A., Devanarayan V. A PRIM approach to predictive-signature development for patient stratification // Statistics in Medicine, Vol. 34, No. 2, 2015. pp. 317-342.
78. Lipkovich I., Dmitrienko A., Denne J., Enas G. Subgroup identification based on differential effect search—A recursive partitioning method for establishing response to treatment in patient subpopulations // Statistics in Medicine, Vol. 30, No. 21, 2011. pp. 2601-2621.
79. Friedman J.H., Fisher N.I. Bump Hunting in High-Dimensional Data // Statistics and Computing, Vol. 9, No. 2, 1999. pp. 123-243.
80. Sivaganesan S., Laudb P.W., Müller P. A Bayesian subgroup analysis with a zero-enriched Polya Urn scheme // Statistics in Medicine, Vol. 30, No. 4, 2010. pp. 312-323.
81. Korepanova N., Kuznetsov S.O., Karachunskiy A.I. Matchings and Decision Trees for Determining Optimal Therapy // In: Analysis of Images, Social Networks and Texts Third International Conference, AIST 2014, Yekaterinburg, Russia, April 10-12, 2014, Revised
Selected Papers. Springer International Publishing, 2014. pp. 101-110.
82. Gale D., Shapley L.S. College Admissions and the Stability of Marriage // The American Mathematical Monthly, Vol. 69, No. 1, 1962. pp. 9-15.
83. Roth A.E. Differed acceptance algorithm: history, theory, practice, and open questions, Harvard University, 2007.
84. Alkan A., Gale D. Stable schedule matching under revealed preference // Journal of Economic Theory, Vol. 112, 2003. pp. 289-306.
85. Ganter B., Kuznetsov S.O. Pattern Structures and Their Projections // 9th International Conference on Conceptual Structures (ICCS 2001). 2001. Vol. 2120. pp. 129-142.
86. Kuznetsov S.O. Pattern Structures for Analyzing Complex Data // 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC 2009). 2009. Vol. 5908. pp. 33-44.
87. Korepanova N., Kuznetsov S.O. Pattern Structures for Treatment Optimization // In: CLA 2016: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications. CEUR Workshop Proceedings. Moscow: Higher School of Economics, National Research University, 2016. pp. 217-228.
88. Korepanova N.V., Kuznetov S.O. Vybor terapii onkologicheskogo zabolevaniya v podgruppakh patsientov na osnove analiza zamknutykh opisaniy [The choice of therapy for oncological disease in subgroup of patient on the basis of the analysis of close descriptions]. // In: Pyatnadtsataya natsionalnaya conferentsiya po iskusstvennomu intellektu s mezhdunarodnym uchastiem KII-2016 [15th national conference on artificial intelligence with international participation CAI-2016] (October 3-7, 2016, Smolensk): Vol 1. Universum, Smolensk. pp. 352-359.