Skip to content

Abstract Search

COVID-19 Pandemic

Evaluating the Use of Event Metadata Extracted from Online News Media for Disease Outbreak Detection Yannan Shen* Yannan Shen David Buckeridge Russell Steele Philip Abdelmalik

Digital disease surveillance (DDS) uses internet-based data to detect and monitor health threats. DDS leverages advanced techniques to extract structured event metadata of event entities such as whether an event was caused by an unclassified pathogen. Our study aims to understand the data properties of event metadata and explore the use of a self-controlled case series design to assess the temporal association of event entities and the emergence of the Omicron variant. We obtained COVID-19 event metadata between October 1 and December 31, 2021, from a DDS system, BioCaster, and daily counts of Omicron-positive genome samples for the same period from the Global Initiative on Sharing All Influenza Data. Countries with detected change points in at least one entity were included. For each country, the emergence of the Omicron variant was identified based on genome counts using Bayesian change point analysis. We defined a risk period of 9 days following each emergence and included an 18-day pre-exposure period to address potential assumption violation, as more genome samples may be collected after news reporting a new variant. Conditional Poisson regression was used to estimate the relative incidence (RI) and its 95%CI. Given the significant media attention garnered by the World Health Organization’s announcement regarding the identification of the Omicron variant, we conducted a secondary analysis with a truncated study period until November 25, 2021, to eliminate the potential impact of the announcement. During the study, 67 countries were included. The number of changes detected from different entities varied from 19 to 419. No increasing incidence of changes in any entity was identified. However, after shortening the study period, increased RIs were found for the entity indicating events caused by unclassified pathogens (2.24 95%CI 1.03, 4.84). These findings highlight the potential of online news media for signalling the emergence of significant infectious diseases.