Skip to content

App Abstracts

Social

Learning from the Unexpected: Drivers of Life Expectancy in Urban Areas of the United States Jeanette Stingone* Jeanette Stingone Teshawna Badu

Background: Neighborhood is a strong predictor of life expectancy. Prediction methods are often used to classify spatial areas by life expectancy and target public health intervention in efforts to reduce disparities. But, can we learn from areas where models don’t perform well and improve public health intervention? In previous work, researchers in Brazil used machine learning methods and traditional sociodemographic factors to predict life expectancy and then analyzed modifiable health characteristics of the municipalities whose observed life expectancies deviation from the prediction.

Methods: We replicate the Brazilian approach in urban areas in the United States using random forest to predict life expectancy at birth using twenty-six variables that represent social and structural drivers of health. Standard pipelines including data partitioning and cross-validation were applied. We then compared modifiable, health-related characteristics of the under and overachievers (i.e., census tracts that have a 10% worse or better outcome than predicted), calculating the median difference in values between the two groups. Data come from the U.S. Small-area Life Expectancy Estimates Project, the US Census American Community Survey and the CDC 500 Cities Project.

Results: Initial results from New York City found 43% of variability in life expectancy at the census tract level was explained solely by sociodemographic predictors. Overachieving neighborhoods reported greater leisure time for physical activity, sleeping at least 7 hours a day, health insurance coverage and lower rates of smoking. In contrast, there were no differences in prevalence of routine medical checkups and mammography, while underachieving neighborhoods had higher rates of cervical cancer screening.

Discussion:  Exploring areas that deviate from prediction may generate hypotheses for future research. The performance of the prediction model is a key limitation. Results from national data are forthcoming.