Conceptual foundations for selecting optimal subgroups for treatment
Tyler VanderWeele, Ph.D.
What data are relevant when making a treatment decision for me? What replications are relevant for quantifying the uncertainty of this personalized decision? What does “relevant” even mean here? The multi-resolution (MR) perspective from the wavelets literature provides a convenient theoretical framework for contemplating such questions. Within the MR framework, signal and noise are two sides of the same coin: variation. They differ only in the resolution of that variation—a threshold, the primary resolution, divides them. We use observed variations at or below the primary resolution (signal) to estimate a model and those above the primary resolution (noise) to estimate our uncertainty. The higher the primary resolution, the more relevant our model is for predicting a personalized response. The search for the appropriate primary resolution is a quest for an age old bias-variance trade-off: estimating more precisely a less relevant treatment decision versus estimating less precisely a more relevant one. However, the MR setup crystallizes how the tradeoff depends on three objects: (i) the estimand which is independent of any statistical model, (ii) a model which links the estimand to the data, and (iii) the estimator of the model. This trivial, yet often overlooked distinction, between estimand, model, and estimator, supplies surprising new ways to improve mean squared error. The MR framework also permits a conceptual journey into the counterfactual world as the resolution level approaches infinite, where “me” becomes unique and hence can only be given a single treatment, necessitating the potential outcome setup. A real-life Simpson’s paradox involving two kidney stone treatments will be used to illustrate these points and engage the audience.