Big Data/Machine Learning/AI
The Sociome Data Commons: Pediatric asthma exacerbations on the South Side of Chicago, 2010 to 2019 Sandra Tilmon* Sandra Tilmon Sanjay Krishnan Samuel Volchenboum Jonathan Ozik Ellen Cohen Brian Furner Julian Solway
The Sociome Data Commons collects and harmonizes data about non-clinical aspects of life that affect health, from environmental to economic and social factors. By collecting a breadth of data, the Sociome aims to facilitate discovery of novel influencers of health, providing opportunities for new interventions.
In Chicago, pediatric asthma has high morbidity, persistent disparities, and has been selected as a community priority. The University of Chicago is located on the South Side of the city, serving patients mostly from surrounding communities.. For this analysis, patient race/ethnicity (largely non-Hispanic Black) was excluded to prevent overfitting on race as well as to permit later bias testing, and patient insurance (mostly Medicaid) was excluded as a proxy for poverty. Clinical covariates in the dataset include asthma phenotypes, medication, and distances to various health facilities. Non-clinical covariates include a poverty index constructed from the American Community Survey, air pollution, crime rates, tree cover, and housing conditions (including a propensity for urban flooding).
This analysis’ first aim is to explore the risks associated with asthma exacerbations among pediatric asthma visits. Multiple, disparate models (logistic regression, support-vector machine, boosted decision tree, and neural network) are compared with standardized metrics of accuracy (proportion of correct predictions, positive and negative), recall (proportion of actual positives identified correctly), and R2 (proportion of variance explained). Bias is assessed on the best performing model by comparing the false negative, false omission, and true positive rates by vulnerable group status.
The second aim is to stratify by asthma exacerbation clusters (previously identified, Moran’s I = 0.5958, p<.0001, see attached image) for cluster-specific risks. Implications for neighborhood or cluster-specific interventions are discussed.