1364Predicting obesity and smoking using medication data: a machine-learning approach
Abstract Background Administrative health datasets are widely used in public health research but often lack information about common confounders. We aimed to develop and validate machine learning (ML)-based models using medication data from Australia’s Pharmaceutical Benefits Scheme (PBS) database to predict obesity and smoking. Methods We used data from the D-Health Trial (N = 18,000) and the QSkin Study (N = 43,794). Smoking history, and height and weight were self-reported at study entry. Linkage to the PBS dataset captured 5 years of medication data after cohort entry. We used age, sex, and medication use, classified using Anatomical Therapeutic Classification codes, as potential predictors of smoking and obesity. We trained gradient-boosted machine learning models using data for the first 80% of participants enrolled; models were validated using the remaining 20%. We assessed model performance overall and by sex and age, and compared models generated using 3 and 5 years of PBS data. Results Based on the validation dataset using 3 years of PBS data, the area under the receiver operating characteristic curve (AUC) was 0.70 (95% confidence interval (CI) 0.68 – 0.71) for predicting obesity and 0.71 (95% CI 0.70 – 0.72) for predicting smoking. Models performed better in women than in men. Using 5 years of PBS data resulted in marginal improvement. Conclusions Medication data in combination with age and sex can be used to predict obesity and smoking. These models may be of value to researchers using data collected for administrative purposes.