Statistical learning with latent prediction targets
Booil Jo, Ph.D.
In predictive modeling, often aided by machine learning methods, much effort is concentrated in identifying good predictors. However, the same level of rigor is often absent in improving the outcome side of models. In this study, we focus on this rather neglected aspect of model development and demonstrate the use of longitudinal information as a way of improving the outcome side of predictive models. This involves optimally characterizing individuals’ outcome status, classifying them, and validating the formulated prediction targets. None of these tasks are straightforward, which may explain why longitudinal prediction targets are not commonly used in practice despite their compelling benefits. As a practical way of improving this situation, we explore a semi-supervised learning approach based on growth mixture modeling (GMM), a method of identifying latent subpopulations that manifest heterogeneous outcome trajectories. In the proposed approach, we utilize the benefits of the conventional use of GMM for the purpose of generating potential candidate models based on empirical model fitting, which can be viewed as unsupervised learning. We then evaluate candidate GMM models on the basis of a direct measure of success; how well the trajectory types are predicted by clinically and demographically relevant baseline features, which can be viewed as supervised learning. We examine the proposed approach focusing on a particular utility of latent trajectory classes, as outcomes that can be used as valid prediction targets in clinical prognostic models. Our approach is illustrated using data from the Longitudinal Assessment of Manic Symptoms study.
Jo B, Findling RL, Wang C-P, Hastie JT & the LAMS group (2017). Targeted use of growth mixture modeling: A learning perspective. Statistics in Medicine, 36, 671-686.
Jo B, Findling RL, Hastie JT, Youngstrom EA, Wang C-P & the LAMS group (in press). Construction of longitudinal prediction targets using semi-supervised learning. Statistical Methods in Medical Research.