Social
A novel machine learning approach to identify neighborhood-level social phenotypes Madeline Brooks* Madeline Brooks Brooks Brooks Brooks Brooks Johns Hopkins Bloomberg School of Public Health
Measures of neighborhood-level disadvantage, like the Area Deprivation Index (ADI) and the Social Vulnerability Index (SVI), are widely used as measures of social determinants of health (SDOH) but were actually created for different purposes or represent only one SDOH domain. We used machine learning to identify multidimensional neighborhood phenotypes based on 60+ SDOH indicators and validated them against the ADI, SVI, and area-level health outcomes.
We collected data for Maryland related to neighborhood-level health, interpersonal, organization, economy, infrastructure, and structural factors; and aggregated these to spatially consistent hexagonal units (N=33,326). We applied four clustering algorithms, using 10-fold cross-validation and cluster fit statistics to choose the optimal number of phenotypes. Results were pooled using an adaptive clustering ensemble approach. Overall and domain summary scores were created to compare phenotypes and assign labels by the type and extent of relative advantage/disadvantage. Spatial regression models were used to estimate associations of phenotypes with tract-level prevalence of infant (preterm birth, low birthweight, stillbirth, and infant mortality) and adult (drug overdose and all-cause mortality) health outcomes.
Ensemble learning produced five phenotypes, corresponding to Overall Advantage (OA), Community Health Disadvantage (CHD), Structural Disadvantage (SD), Community Infrastructure Disadvantage (CID), and Overall Disadvantage (OD). The phenotype overall summary score was correlated with the ADI and SVI (r=0.64-0.66, p<0.05), suggesting convergent validity (Figure 1). The OD phenotype was associated with increased prevalence of all tract-level adverse health outcomes relative to the OA phenotype; other phenotypes were associated depending on outcome.
Neighborhood phenotypes derived from machine learning may capture spatial variation in SDOH and related health outcomes, beyond the commonly used ADI and SVI.

