Skip to content

Abstract Search

Perinatal & Pediatric

Sub-phenotyping of gestational diabetes with machine learning in a large cohort study in China Song-Ying Shen* Song-Ying Shen Cheng-Rui Wang Francesca Crowe Jin-Hua Lu Dong-mei Wei Wan-Qing Xiao Xiao-yan Xia Li-Fang Zhang Yi-xin Guo Jian-Rong He Hui-Min Xia Krishnarajah Nirantharakumar Xiu Qiu

OBJECTIVE

To develop a new classification scheme for gestational diabetes (GDM) in GDM women in the Born in Guangzhou Cohort Study, China.

RESEARCH DESIGN AND METHODS

Using the Density-Based Spatial Clustering of Applications with Noise method, clusters of GDM were identified based on high-dimensional epidemiological and medical data. Prevalence ratios (PRs) of adverse perinatal outcomes for each cluster were estimated using Poisson regression. Algorithms were developed using classification and regression trees (CART) to assign women to the subtypes.

RESULTS

We identified six replicable and distinct clusters in 5689 GDM women. Women in cluster 2 (48.8%) were mainly primiparous, youngest, and with few adverse conditions. Women in cluster 1 (39.9%) were multiparous, older, and had a higher rate of large-for-gestational-age and cesarean section (CS). Women in cluster 3 (5.4%) had higher proportions of preterm delivery and neonatal complications and had significantly higher PRs of maternal, placental, and neonatal complications when compared to cluster 2. Cluster 4 (4.3%) was enriched by placental dysfunction/malformation and perinatal asphyxia. Cluster 5 (0.6%) was dominated by fetal macrosomia and CS. Cluster 6 (0.4%) was characterized by maternal congenital uterus malformation, CS, and early fetal deaths. Trajectories of biological processes using biomarkers like bile acid throughout pregnancy differed and matched the enriched outcomes. CART algorithms assigned 91.6% of women accurately using early-to-mid pregnancy variables, improving to 95.9% with additional variables collected at birth.

CONCLUSIONS

The clustering and new classification system could serve as a basis to develop subtype-specific therapeutic strategies and aid in exploring the underlying disease mechanisms.

Figure legends

Fig  Clustering of GDM patients and patient distribution according to the method of classification (n=5698). Distribution of GDM subtypes based on (a) glucose measurements from OGTT and (b) clustering. (c) The plot of the GDM clustering cohort using principal component analysis. (d) Distribution of 8 clusters of patients according to main individual maternal, placental conditions, fetal, or newborn conditions.

i-IFG: isolated impaired fasting glucose if fasting glucose≥5.1mmol/l and both 1-hour glucose<10.0 mmol/l and 2-hour glucose<8.5 mmol/l; i-IGT: isolated impaired post-load glucose tolerance if 1h glucose≥10.0 mmol/l and /or 2-hour glucose ≥ 8·5 mmol/l and fasting glucose<5·1mmol/L; IFG+IGT: combined IFG and IGT if 1-hour glucose≥10·0 mmol/l and/or 2-hour glucose ≥ 8·5 mmol/l and fasting glucose≥5·1 mmol/l; UMAP: Uniform Manifold Approximation and Projection; LAG, large for gestational age; CS, cesarean section; GDM, gestational diabetes; SGA, small for gestational age; CHD, congenital heart disease. *Composite neonatal complication was defined as the presence of any of the following: birth trauma, respiratory distress, syndrome of infant of GDM mother, neonatal hypoglycemia, neonatal jaundice, perinatal hematological disorder, transitory neonatal disorders of calcium and magnesium metabolism, pneumonia, other neonatal infections, respiratory disease or cardiovascular disorders originating in the perinatal period, and intracranial nontraumatic hemorrhage of the fetus and newborn/Haemorrhagic disease of the fetus and newborn.