Diabetes
Diabetes Identification in Self-Reported Population-Based Survey Data Versus Health Administrative Data Divine-Favour Chichenim Ofili* Divine-Favour Chichenim Ofili Amélie Quesnel-Vallée
Background: In the absence of clinical diagnoses, diabetes is identified through self-reported data or algorithmic diagnoses from health administrative data (HAD). Understanding the diagnostic agreement and uniqueness of these sources is key for guiding data selection.
Aim: To evaluate diabetes diagnosis concordance across data sources in Quebec, Canada, and explore sociodemographic and health variations among individuals identified by each source.
Methods: This study used the Care Trajectories-Enriched Dataset (TorSaDE), linking the Canadian Community Health Survey to Quebec’s HAD. Individuals aged ≥20 years in the survey and who completed at least one survey wave were included. Those with missing diabetes diagnosis survey responses or suspected gestational diabetes were excluded. Diabetes presence or absence was assessed using both self-reported data and HAD. Group characteristics were compared with ANOVA, chi-square, and post-hoc tests. Age and sex effects were adjusted for using linear and logistic regression.
Results: From 101150 individuals, 10740 (10.6%) were diagnosed with diabetes. Diagnosis concordance was high, as ~65% of individuals with diabetes were found in both sources. In fact, ~90% of those self-reporting diabetes were also algorithmically diagnosed. Individuals with diabetes were generally older, of lower educational status, poorer health, and lower household income than those without diabetes. Significant differences in age, sex, residence, health and healthcare use were found between individuals diagnosed exclusively in each source and those with concordant diagnoses, but their educational and economic status were comparable.
Conclusion: Despite a high concordance between the two data sources, each identifies distinct subpopulations in terms of social and health characteristics. It is best to combine both sources for more comprehensive studies, but if only one is available, results should be interpreted based on the population it best represents