Behavior
Predicting COVID Vaccine Uptake Using Machine Learning Zichao Li* Zichao Li
Background
Vaccination has always been a controversial and dividing topic. The anti-vaccine movement resurged in 1970s in UK and grew rapidly in the past 20 years with a focus on child vaccines (Dubé et al., 2015). The COVID-19 pandemic has fueled this intense debate. According to a national survey fielded in Oct. 2021, only 38% of the respondents are vaccinated or very likely to get vaccinated, indicating the majority of individuals in the U.S. are hesitant about getting vaccinated to a certain degree (OSF Preprints | The COVID States Project #35, n.d.).
Scholars further suggest that there is a discrepancy between attitudes toward vaccination and the actual vaccine uptake behavior (Dubé et al., 2013), posing challenges for policymakers and healthcare professionals to more effectively and accurately intervene on vaccine uptake behaviors. Despite research using a machine learning approach to investigate attitudes and beliefs on COVID vaccines (Lincoln et al., 2022), there lack of academic efforts in predicting the COVID vaccine behavioral outcomes. Previous studies have investigated using machine learning to shed light on COVID-related behavioral outcomes, for instance, predicting the likelihood of following the CDC health guidelines (Hajdu et al., 2022; Van Lissa et al., 2022a). This study uses supervised machine learning methods to predict individual COVID vaccination uptake behaviors, which could further provide implications for future vaccine promotion health campaigns and interventions, allowing for a more optimal allocation of resources.
Data
I used COVID-19 Behavior Determinants Database, data from a three-way survey fielded in the US and Canada for this paper (Song et al., 2022). This is a web-based survey administered to 8070 English-speaking respondents above the age of 18. 5326 of the respondents reside in four states of the US: New York, California, Florida, and Texas; the subsample in Canada include all provinces except Quebec (Song et al., 2022). The first two waves of the survey were conducted in 2020 before the COVID-19 vaccine was approved by the FDA, so I only used the data from the third wave (n=3024) fielded in March 2021 for this project. There are no missing values in the outcome from the third wave of the survey, so data from all 3024 respondents were used for data analysis.
Method
In data pre-processing, I conducted principle component analysis (PCA) on several scales, including Brief Locus of Control Scale, General Trust Scale (GTSQ) and Multidimensional Iowa Suggestibility Scale (SSSQ), to compute one score that represents the whole scale. After data preprocessing, 98 predictors were used for modeling training and selection. Then I applied supervised machine learning methods for categorization to develop models that predict the individual COVID vaccine uptake, including logistic regression, KNN, LDA, QDA and tree-based methods. To effectively manage the variance bias tradeoff, I used 10-fold cross-validation, with 80% of the data as the training set.
Results
Logistic regression performs the best among all models, showing its advantage in high accuracy and relatively low false negative rate. The logistic regression model has an accuracy of 79.01%. A side note is that for the logistic regression models discussed above, I also raised the threshold for predicting positive cases from a 0.5, the default value, to 0.8 and 0.9, the model performances do not improve.
In a nutshell, the logistic regression model yields the strongest model, indicating that machine learning could be used in predicting vaccination behaviors at an individual level. This study could offer practical implications to the design of vaccine promotion campaigns. In the future public health crisis, this tool can be powerful to identify vaccine hesitant individuals that vaccine promotion programs intend to intervene on. Especially because COVID-19 was a novel vaccine, public opinion on the vaccine could be quite versatile and risk-sensitive. In preparation for the occurrence of such challenge, this machine learning approacg could contribute to a deeper understanding in vaccine hesitant groups and then develop persuasive campaign messages for them.