Big Data/Machine Learning/AI
Leveraging natural language processing to identify housing and income insecurity concepts from clinical notes of patients with opioid-related disorders Shivani Mehta* Shivani Mehta Hyelee Kim Matthew S. Pantell Tasce Bongiovanni James D. Harrison William Brown, III
Introduction: Electronic health records (EHRs) contain structured data elements for capturing social determinants/drivers of health (SDOH), such as Z codes (International Classification of Disease codes for social risk factors). Z codes are underutilized, potentially underestimating SDOHs’ impact on opioid-related disorders (ORDs). Natural Language Processing (NLP) offers efficiency in identifying SDOH-related information from patient clinical notes. Objective: Use NLP to identify concepts of housing and income insecurity from clinical notes of patients with ORDs. Methods: The sample includes 2,846 adult patients (≥18 years) with ORDs in University of California, San Francisco Health outpatient encounters (January 1, 2018 – December 31, 2019). cTAKES (Clinical Text Analysis & Knowledge Extraction System) extracts housing and income insecurity concepts from patient clinical notes using the Unified Medical Language System (UMLS) dictionary. We used Systemized Nomenclature of Medicine – Clinical Terms codes to map UMLS concept unique identifier codes for a broader capture. Validation involved manual assessment by two independent reviewers. Results: For housing insecurity, cTAKES flagged 7,216 clinical notes associated with 834 unique patients (29.3%), and 333 patients (11.7%) had an Z code for housing insecurity (Z59.0, Z59.1, Z59.8). For income insecurity, cTAKES flagged 487 clinical notes associated with 273 unique patients (9.6%). Only 7 unique patients (0.25%) had a Z code for financial problems (Z59.5, Z59.6, Z59.7, Z59.86). cTAKES has 91% sensitivity, 100% specificity for housing insecurity, and 77% sensitivity, 99.5% specificity for income insecurity. Conclusion: This study underscores the inadequacy of relying solely on structured data elements for comprehensive SDOH data. The advanced text processing capabilities of NLP tools like cTAKES prove crucial in identifying meaningful SDOH information from clinical narratives.