BACKGROUND
The rapid growth of social media as an information channel has made it possible to quickly spread inaccurate or false vaccine information and thus create obstacles for vaccine promotion.
OBJECTIVE
To develop and evaluate an intelligent automated protocol to identify and classify HPV vaccine misinformation on social media, using machine learning (ML)-based methods.
METHODS
Reddit posts (2007-2017, n=28,121) were compiled that contained human papillomavirus (HPV) vaccine related keywords. A random subset (n=2200) was manually labeled for misinformation, serving as a gold standard corpus for evaluation. Five ML-based algorithms, including support vector machines (SVM), logistics regression (LR), extremely randomized trees (ET), convolutional neural network (CNN) and recurrent neural network (RNN), designed to identify vaccine misinformation, were evaluated for identification performance. Topic modeling was applied to identify the major categories associated with HPV vaccine misinformation.
RESULTS
A convolutional neural network model achieved the highest AUC at 0.7943. Of 28,121 Reddit posts, 7,207 (25.63%) were classified as vaccine misinformation with discussions about general safety issues identified as the leading type misinformed posts (37%).
CONCLUSIONS
ML-based approaches are effective in the identification and classification of HPV vaccine misinformation from Reddit and may be generalizable to other social media platforms. ML -based methods may provide the capacity and utility to meet the challenge for intelligent automated monitoring and classification of public health misinformation in social media networks. The timely identification of vaccine misinformation online is a first step for misinformation correction and vaccine promotion.
CLINICALTRIAL