Improving Generalization from Randomized Trials for Policy Purposes
Larry V. Hedges, Ph.D.
Randomized trials provide the gold standard of internal validity for making causal inferences about the effects of interventions. However, randomized trials are seldom conducted using probability samples that might provide the same gold standard of generalizability (external validity). I will discuss methods to quantify and improve the generalizability of findings from randomized trials conducted to inform policy and illustrate these ideas with the FIRST trial. I will begin by formalizing some subjective notions of generalizability in terms of estimating average treatment effects in well-defined inference populations. The problem is to use a study sample to estimate parameters of the distribution of treatment effects (e.g., the average treatment effect) in an inference population. When study samples are not probability samples, the inference process relies on matching the study sample to the inference population on a potentially large number of covariates that are related to variation in treatment effects. I outline methods that can, under definable assumptions, yield estimates of the population average treatment effects are unbiased (or nearly so) with a standard error depends largely on how well the study sample matches the inference population. If the standard error is reasonably small, the study sample yields generalizable effects, but if it is large (or even infinite, as it can be) the evidence in the study sample has little or no generalizability to the inference population. I use the Flexibility In duty hour Requirements for Surgical Trainees (FIRST) trial to illustrate the use of these ideas.