Skip to content

Abstract Search

LATEBREAKER

Big Data/Machine Learning/AI

CIPHER: A Key Resource for Evaluating Emerging Phenotypes Vidisha Tanukonda* Vidisha Tanukonda Ashley Galloway Monika Maripuri Yuk-Lam Ho Francesca Fontin Jeffrey Gosian Michael Murray Rahul Sangar Edward Zielinski Joanne Sordillo Katherine P Liao Tianxi Cai Sumitra Muralidhar Jacqueline Honerlaw Kelly Cho Tiffany Sim

Electronic health records (EHR) based phenotypes represent clinical conditions or characteristics derived from information contained within health systems, enabling reuse of EHR data for research and healthcare operations. Thus, developing phenotypes with high accuracy is critical for quality of data analytics and utilization. The Department of Veterans Affairs has developed CIPHER (Centralized Interactive Phenomics Resource), a growing public knowledgebase comprised of over 5000 computable phenotype definitions and data visualization tools, for research acceleration.

CIPHER is uniquely structured to provide information about the portability of a phenotype across healthcare systems. For example, ICD-10 code U09.9 (Post COVID-19 condition, unspecified) introduced in 2021, is used to identify patients with Long COVID. A recent manuscript provided one of the first assessments of the accuracy of the code at three healthcare systems. In January 2024, this nascent phenotype definition and its validation metrics were added to CIPHER (Figure 1A), making them widely available. CIPHER’s integrative platform allows for entry of an array of metrics, allowing users to easily compare the performance and quickly evaluate the definition’s applicability. For instance, the wide range in PPV for U09.9 across health systems (23.2% to 62.4%) shows that the value of the code varies by site in its ability to accurately classify Long COVID cases (Figure 1B).

The CIPHER metadata standard ensures information needed to replicate a phenotype algorithm is available to users and provides critical information about phenotype portability or limitations. Definitions such as the Long COVID phenotype can be replicated across additional cohorts, and if validated, the resulting performance metrics can be deposited in the knowledgebase. CIPHER’s novel approach positions it as an immeasurable resource for evaluating emerging phenotypes and presents the framework to standardize existing definitions.