Skip to content

App Abstracts

Methods/Statistics

Application of a data fusion design to improve incidence estimates of etiology-specific diarrheal disease using hybrid study data Maria Garcia Quesada* Maria Garcia Quesada Alex Breskin James Platts-Mills Patricia Pavlinac Sean Galagan Lance Waller Benjamin Lopman Elizabeth Rogawski-McQuade

Accurate estimates of infectious disease incidence are critical for prioritizing and powering studies of public health interventions including vaccines. Because cohort studies are often cost prohibitive, hybrid studies (i.e., facility-based disease surveillance with catchment area enumeration and healthcare utilization survey) are an appealing alternative. Previously, healthcare utilization adjustments have been applied as simple proportions within one or two stratifying features, which can fail to capture the full heterogeneity in healthcare seeking behaviors by disease severity and sociodemographic characteristics. Given the potential impact of small differences in such proportions on adjusted incidence rates, proper adjustment is critical. Furthermore, commonly used methods for quantifying uncertainty around incidence estimates in hybrid designs often utilize computationally intensive bootstrapping or Monte Carlo simulations. Here, we use data from the Global Enteric Multicenter Study (GEMS) to illustrate the application of a data fusion design with M-estimation to estimate etiology-specific diarrheal disease incidence from hybrid study data, with the goal of later applying these methods in the ongoing Enterics for Global Health (EFGH) study. M-estimation enables the estimation and combination of statistical parameters from different data sources using stacked estimating functions and allows a closed-form solution for the variance. Here, four sets of estimating equations must be defined: 1) logistic regression to estimate healthcare seeking propensity weights based on the healthcare utilization survey, 2) true number of cases based on healthcare seeking and facility sampling weights, 3) total population at risk based on community sampling weights, 4) estimated true incidence. We demonstrate how this method can be implemented for any study with hybrid data, and compare our resulting incidence estimates and corresponding variance with other established methods.