COVID-19 Pandemic
Heterogeneity and Duplication in Common Data Elements for the Study of COVID-19 Megan M. Chenoweth* Megan Chenoweth Chloe A. Perry John T. Kubale
INTRODUCTION: The COVID-19 pandemic has highlighted the importance of consistent shared measures to support comparability of findings across studies. Common Data Elements (CDEs)—standardized questions, variables, or measures with specific sets of responses that are common across multiple studies—are an important tool for this. Since March 2020, researchers have worked to define CDEs related to COVID-19, add them to the National Library of Medicine (NLM) CDE repository, and endorse them according to criteria set forth by the NIH Scientific Data Council. Our aim is to examine CDEs related to COVID, categorize them by topic, and note duplication across sources.
METHODS: We inventoried COVID-related CDEs in the NLM CDE repository, characterizing them by source, endorsement status, and topic. We used topics to identify duplication and heterogeneity across sources. We then manually grouped topics into categories and subcategories that will be validated using textual analysis so that similarities could easily be identified.
RESULTS: We identified 411 COVID-related CDEs from four sources in the NLM CDE repository. CDEs from only two sources (n=138, 33.6%) are endorsed. We noted 80 topics in three categories: clinical (n=182, 29 topics); social, behavioral, and economic (SBE, n=193, 50 topics); and administrative (n=36, not categorized by topic). In 17 cases, there were CDEs from 2-3 sources covering the same topic: two clinical, 15 SBE.
DISCUSSION: Our inventory is the first step toward describing the extent of existing COVID CDEs, as well as the first attempt to demonstrate duplication within topics. The heterogeneity we see suggests a lack of agreement on which CDEs are optimal in which contexts, especially in the SBE domain. This is exacerbated by some CDEs not undergoing endorsement and changes to the endorsement process itself over time. Improved guidance on selecting CDEs could reduce heterogeneity and increase harmonization across studies.