Skip to content

Abstract Search

Big Data/Machine Learning/AI

Machine Learning Models for Predicting Prescription Opioid Misuse Among U.S. High School Students: Links with Selected Risky Health Behaviors Asef Raiyan Hoque* Asef Raiyan Hoque Liling Li

Introduction:

Prescription opioid misuse among adolescents in the U.S. is an emerging public health concern. The objective of this study was to develop machine learning models to predict prescription opioid misuse among U.S. high school students using socio-demographic and selected behavioral risk factors.

Methods:

This study analyzed cross-sectional data from the Youth Risk Behavior Surveillance System (YRBSS), a nationally representative dataset of U.S. high school students in grades 9-12. Machine learning models were developed using complete case analysis, with a total of n = 11,492 students meeting inclusion criteria. The outcome variable prescription opioid misuse was defined using reported lifetime misuse. Selected risky health behaviors identified from previous studies were included as predictors. Nine machine learning models were tested, including neural network, logistic regression, linear support vector machine (SVM), radial basis function SVM, random forest, naïve bayes, polynomial SVM, decision tree, and extreme gradient boosting (XGBoost).

Result:

All predictive models achieved high test accuracy (85.8% – 94.3%). Five out of nine classifiers had high test AUC ranging between 79.1% – 89.6%. Random Forest was the best predictive model, with both highest accuracy (94.3%) and AUC value (89.6%). From the Random Forest model, the three most important socio-demographic predictors were age, race, and grade. Most important risky health behaviors by variable importance were lifetime ecstasy use, lifetime methamphetamine use, and current marijuana use, respectively.

Discussion:

The findings showed the importance of socio-demographic factors and risky health behaviors in predicting prescription opioid misuse. Machine learning techniques can be a promising tool for identifying at-risk individuals, enabling more targeted prevention efforts. Future research is needed to explore additional risky health behaviors and consider longitudinal datasets to improve model performance.