Abstract Search – Society for Epidemiologic Research

Methods/Statistics

Transfer Learning for Improving Local Predictive Performance in Small Datasets Carine Savalli* Carine Savalli André Henrique Alves Carneiro Fabiano Barcellos Filho Murilo Afonso Robiati Bigoto Roberta Moreira Wichmann Alexandre Dias Porto Chiavegatto Filho

Transfer learning is a machine learning technique that incorporates knowledge from a pre-trained model to enhance the training and performance of a new model in a distinct but related context. This study focused on predicting admissions to the intensive care unit (ICU) for COVID-19 patients in a hospital with a small sample size (H1, n=72). We employed a transfer learning approach and compared two competitive source models trained in two different hospitals in Brazil (H2: n=1,330; and H3: n=148). All hospitals collected the same demographic and laboratory variables. The dataset was divided into training and testing (70%/30%) subsets using a hold-out approach, while ensuring stratification based on the outcome variable. Pre-processing steps were applied to prepare the data for modeling, followed by training the XGBoost algorithm. Hyperparameter tuning was performed through a randomized search strategy combined with 3-fold cross-validation on the training set to optimize model performance. The metric adopted was the area under the curve ROC (Receiver Operating Characteristic). Both source hospitals demonstrated excellent predictive performance for the task (H2: AUC=0.9410; AUC=0.9316), while the hospital H1 presented an unsatisfactory performance (AUC=0.6239). Using the transfer learning approach, each source model was updated with 20 additional trees trained on data from H1 to capture local patterns. Using H2 as the source model resulted in an AUC of 0.7863, representing a 26.0% increase in predictive performance in H1, and using H3 as the source model yielded an AUC of 0.7094, representing a 13.7% increase. This study demonstrates the effectiveness of transfer learning in improving predictive performance in resource-limited settings with small datasets. Leveraging knowledge from source hospitals with high-performing models yielded substantial AUC gains, highlighting its potential to improve decision-making in healthcare.