Abstract Search – Society for Epidemiologic Research

Causal Inference

Everything and the kitchen sink? Improving our models for selective attrition among breast and lung cancer survivors and the impact on estimated aging trajectories Sophia Fuller* Sophia Fuller Sowmya Vasan Hailey Banack Alexandra Binder Elizabeth Cespedes Feliciano

In cancer research, evaluating long-term declines of physical function is complicated by differences in censoring between cancer survivors and cancer-free individuals as well as differences by cancer type, stage, and baseline functional status. Inverse probability of censoring weights (IPCW) can be used to account for selective attrition. The objective of this research is to compare two approaches to fitting IPCW: 1) IPCW generated using Super Learner (IPCW-SL) and 2) IPCW generated using a standard generalized linear model approach (IPCW-GLM). Super learner, an ensemble machine learning algorithm, is a prediction modelling technique useful as datasets become larger and computational efficiency increases. However, generalized linear models (GLMs) are still the default for estimating the probability of censoring. Using Women’s Health Initiative data, we matched women with breast and lung cancer to women without cancer by age at diagnosis. We followed women up to 16 years after diagnosis (or index date) and predicted the probability of censoring at each follow up year using both a GLM and super learner with 10-fold cross validation. We used the predicted probabilities from these models to create stabilized IPCW-SL and IPCW-GLM and compared their relative performance across successive waves of follow-up. Women with late-stage cancer, especially lung cancer, had higher weights to account for the degree of censoring compared to earlier stage and cancer-free women. We next used these weights in generalized estimating equations to examine trajectories of physical function decline among breast and lung cancer survivors vs. age-matched women without cancer; we will examine how each weighting schema changes the estimated rate of physical function decline. As computational efficiency increases, ensemble prediction models are becoming more accessible and may outperform traditional methods.