Methods/Statistics
A Causal Analytic Workflow for Processing Clinically Linked Antimicrobial Resistance Surveillance Data Mengyao Tang* Mengyao Tang Tang columbia university
Background:
Antimicrobial resistance (AMR) surveillance is essential for understanding disease burden and informing policy, yet analytic efforts are often constrained by fragmented data structures, inconsistent denominators, and limited linkage between microbiological and clinical data—particularly in low- and middle-income settings and neonatal populations. These challenges hinder reproducibility, comparability, and valid causal inference.
Methods:
We developed a standardized causal analytic workflow to guide the processing of clinically linked AMR surveillance data into a reproducible and generalizable analytic structure. Using data from two neonatal AMR surveillance systems—the ACORN2 project in the Philippines and the multicenter NeoSEAP study across South and Southeast Asia—we applied a directed acyclic graph (DAG)–informed approach to identify key variables along the causal pathway to resistant infection. Raw laboratory and clinical data with differing formats (long and wide) were transformed into a unified data architecture organized around four epidemiologic concepts: persons, episodes, specimens, and organisms. Reproducible R scripts were used to standardize data linkage, account for contaminants using a pathogenicity classification, and summarize antimicrobial use through WHO AWaRe categories.
Results:
The proposed workflow enabled consistent handling of multiple infection episodes, appropriate denominator selection, and systematic integration of clinical and microbiological information across heterogeneous data sources. Data monitoring reports identified missingness, data entry errors, and enrollment patterns, while analytic outputs supported epidemiologic analyses of pathogen distribution, antimicrobial use, and clinical outcomes. Applying the workflow across two structurally distinct datasets demonstrated its generalizability and scalability.
Conclusions:
This causal analytic workflow provides a practical framework for processing AMR surveillance data that enhances reproducibility, supports valid epidemiologic inference, and reduces analytic burden. By aligning data processing with causal structure and surveillance objectives, this approach facilitates higher-quality AMR reporting and can be adapted for diverse infectious disease surveillance settings.

