Nutrition/Obesity
Into the Unknown – missing data in real-world analyses of metabolic syndrome and colorectal cancer survival Anlan Cao* Anlan Cao Kristina L. Johnson Jeffrey A. Meyerhardt Edward Giovannucci Elizabeth M. Cespedes Feliciano
Metabolic syndrome (MetSyn) is a cluster of risk factors that increase incidence and mortality of multiple conditions, including cancer. Electronic health records can be used to assess MetSyn , but incomplete measurement of the five components is common. Understanding the impact of missing data is essential as unmeasured cannot be presumed to be healthy.
We included patients aged 18+ diagnosed with stage I-III colorectal cancer (CRC) at Kaiser Permanente Northern California (2005-2020). MetSyn was defined as having ≥3 of 5 components as abnormal within 24mo pre- to 6mo post-diagnosis and before systemic anticancer therapy: 1) body mass index ≥27.7, 2) elevated blood pressure, 3) low high-density lipoprotein cholesterol, 4) hypertriglyceridemia, and 5) diabetes/impaired glucose tolerance. Patients with insufficient data for a definitive diagnosis were classified as unknown. We addressed the unknown group using various approaches: complete case analysis, missing indicator, multiple imputation (MI) and inverse probability weighting (IPW). We used Cox proportional hazards models to calculate the HR for MetSyn and survival adjusted for confounders.
Among 9,343 patients (mean age: 66, 50% female), 3,594 had definitive MetSyn, 4,004 did not, and 1,745 were unknown. Over a median follow-up of 7 years, 3,644 patients died. Complete case analysis showed MetSyn was associated with higher all-cause mortality (HR= 1.27 [1.18-1.36]). Using a missing indicator for unknown, HRs for MetSyn remained similar, while the unknown group also had worse survival (HR=1.20 [1.09-1.32]). MI and IPW yielded similar MetSyn HRs with slightly wider confidence intervals (MI: HR=1.20 [1.11-1.29]; IPW: HR=1.24 [1.13-1.36]). CRC-specific mortality followed a similar pattern.
Missingness in MetSyn components is informative, but predictable by measured variables. Rather than presuming patients with missing components to be healthy, they should be included in analyses as a separate group or using MI or IPW.