Big Data/Machine Learning/AI
Differential Diagnosis of Bacterial Meningitis Using Machine Learning Models Audêncio Victor* Audêncio Victor Diego Augusto Medeiros Santos Pamella Cristina de Carvalho Lucas Telma Regina Marques Pinto Carvalhanas
Introduction: Meningitis, an inflammatory condition affecting the membranes surrounding the brain and spinal cord, can be caused by various agents. Bacterial meningitis is particularly severe due to its association with high morbidity and mortality. This study aims to explore the application of machine learning (ML) models to aid in the differential diagnosis of bacterial meningitis, utilizing data from SINAN (Brazil’s Notifiable Diseases Information System) outbreak database in the State of São Paulo, Brazil.
Methods: Data were collected from the SINAN database, including demographic variables, clinical symptoms, and cerebrospinal fluid analyses. Five ML models were applied: Random Forest, LightGBM, XGBoost, CatBoost, and AdaBoost, to classify meningitis cases into bacterial, fungal, viral, and other types. Models were evaluated based on metrics such as AUC-ROC, accuracy, precision, recall, F1-score, and MCC.
Results: The CatBoost model demonstrated superior performance, achieving an AUC-ROC of 0.95 in binary classification (bacterial vs. non-bacterial) and 0.85 in multiclass classification (Neisseria meningitidis, Streptococcus pneumoniae, and Haemophilus influenzae). XGBoost and LightGBM also showed promising results, with AUC-ROC scores of 0.94 and 0.92, respectively, in binary classification. The CatBoost model further exhibited high sensitivity (1.00) and reasonable specificity, emphasizing its applicability in rapid and accurate meningitis diagnosis. SHAP analysis identified variables such as leukocyte count and presence of petechiae as influential predictors.
Conclusion: ML algorithms, particularly CatBoost, XGBoost, and LightGBM, proved highly effective for the differential diagnosis of meningitis, providing a valuable tool for the rapid identification of meningitis types and bacterial serogroups. These techniques can be integrated into public health protocols to enhance outbreak responses and optimize patient treatment.
Keywords: Meningitis, Machine Learning, Differential Diagnosis, SINAN (Notifiable Diseases Information System), Surveillance, Epidemiology.