Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions

Background The rapid growth of social media as an information channel has made it possible to quickly spread inaccurate or false vaccine information, thus creating obstacles for vaccine promotion. Objective The aim of this study is to develop and evaluate an intelligent automated protocol for identifying and classifying human papillomavirus (HPV) vaccine misinformation on social media using machine learning (ML)–based methods. Methods Reddit posts (from 2007 to 2017, N=28,121) that contained keywords related to HPV vaccination were compiled. A random subset (2200/28,121, 7.82%) was manually labeled for misinformation and served as the gold standard corpus for evaluation. A total of 5 ML-based algorithms, including a support vector machine, logistic regression, extremely randomized trees, a convolutional neural network, and a recurrent neural network designed to identify vaccine misinformation, were evaluated for identification performance. Topic modeling was applied to identify the major categories associated with HPV vaccine misinformation. Results A convolutional neural network model achieved the highest area under the receiver operating characteristic curve of 0.7943. Of the 28,121 Reddit posts, 7207 (25.63%) were classified as vaccine misinformation, with discussions about general safety issues identified as the leading type of misinformed posts (2666/7207, 36.99%). Conclusions ML-based approaches are effective in the identification and classification of HPV vaccine misinformation on Reddit and may be generalizable to other social media platforms. ML-based methods may provide the capacity and utility to meet the challenge involved in intelligent automated monitoring and classification of public health misinformation on social media platforms. The timely identification of vaccine misinformation on the internet is the first step in misinformation correction and vaccine promotion.

Download Full-text

Utilizing machine learning-based approaches for the detection and classification of human papillomavirus (HPV) vaccine misinformation: Infodemiology Study of Reddit Discussions (Preprint)

10.2196/preprints.26478 ◽

2020 ◽

Author(s):

Jingcheng Du ◽

Sharice Preston ◽

Hanxiao Sun ◽

Ross Shegog ◽

Rachel Cunningham ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Social Media ◽

Human Papillomavirus ◽

Convolutional Neural Network ◽

Hpv Vaccine ◽

Support Vector ◽

Safety Issues ◽

Vaccine Promotion

BACKGROUND The rapid growth of social media as an information channel has made it possible to quickly spread inaccurate or false vaccine information and thus create obstacles for vaccine promotion. OBJECTIVE To develop and evaluate an intelligent automated protocol to identify and classify HPV vaccine misinformation on social media, using machine learning (ML)-based methods. METHODS Reddit posts (2007-2017, n=28,121) were compiled that contained human papillomavirus (HPV) vaccine related keywords. A random subset (n=2200) was manually labeled for misinformation, serving as a gold standard corpus for evaluation. Five ML-based algorithms, including support vector machines (SVM), logistics regression (LR), extremely randomized trees (ET), convolutional neural network (CNN) and recurrent neural network (RNN), designed to identify vaccine misinformation, were evaluated for identification performance. Topic modeling was applied to identify the major categories associated with HPV vaccine misinformation. RESULTS A convolutional neural network model achieved the highest AUC at 0.7943. Of 28,121 Reddit posts, 7,207 (25.63%) were classified as vaccine misinformation with discussions about general safety issues identified as the leading type misinformed posts (37%). CONCLUSIONS ML-based approaches are effective in the identification and classification of HPV vaccine misinformation from Reddit and may be generalizable to other social media platforms. ML -based methods may provide the capacity and utility to meet the challenge for intelligent automated monitoring and classification of public health misinformation in social media networks. The timely identification of vaccine misinformation online is a first step for misinformation correction and vaccine promotion. CLINICALTRIAL

Download Full-text

Machine Learning Classification of Inflammatory Bowel Disease in Children Based on a Large Real-World Pediatric Cohort CEDATA-GPGE® Registry

Frontiers in Medicine ◽

10.3389/fmed.2021.666190 ◽

2021 ◽

Vol 8 ◽

Author(s):

Nicolas Schneider ◽

Keywan Sohrabi ◽

Henning Schneider ◽

Klaus-Peter Zimmer ◽

Patrick Fischer ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Convolutional Neural Network ◽

Support Vector ◽

Machine Learning Classification ◽

Extreme Gradient Boosting ◽

Inflammatory Bowel

Introduction: The rising incidence of pediatric inflammatory bowel diseases (PIBD) facilitates the need for new methods of improving diagnosis latency, quality of care and documentation. Machine learning models have shown to be applicable to classifying PIBD when using histological data or extensive serology. This study aims to evaluate the performance of algorithms based on promptly available data more suited to clinical applications.Methods: Data of inflammatory locations of the bowels from initial and follow-up visitations is extracted from the CEDATA-GPGE registry and two follow-up sets are split off containing only input from 2017 and 2018. Pre-processing excludes patients in remission and encodes the categorical data numerically. For classification of PIBD diagnosis, a support vector machine (SVM), a random forest algorithm (RF), extreme gradient boosting (XGBoost), a dense neural network (DNN) and a convolutional neural network (CNN) are employed. As best performer, a convolutional neural network is further improved using grid optimization.Results: The achieved accuracy of the optimized neural network reaches up to 90.57% on data inserted into the registry in 2018. Less performant methods reach 88.78% for the DNN down to 83.94% for the XGBoost. The accuracy of prediction for the 2018 follow-up dataset is higher than those for older datasets. Neural networks yield a higher standard deviation with 3.45 for the CNN compared to 0.83–0.86 of the support vector machine and ensemble methods.Discussion: The displayed accuracy of the convolutional neural network proofs the viability of machine learning classification in PIBD diagnostics using only timely available data.

Download Full-text

Convolutional neural network - Support vector machine based approach for classification of cyanobacteria and chlorophyta microalgae groups

Algal Research ◽

10.1016/j.algal.2021.102568 ◽

2021 ◽

pp. 102568

Author(s):

Mesut Ersin Sonmez ◽

Numan Eczacıoglu ◽

Numan Emre Gumuş ◽

Muhammet Fatih Aslan ◽

Kadir Sabanci ◽

...

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Convolutional Neural Network ◽

Support Vector ◽

Network Support

Download Full-text

Landslide susceptibility mapping based on convolutional neural network and conventional machine learning methods

10.21203/rs.3.rs-190195/v1 ◽

2021 ◽

Author(s):

Rui Liu ◽

Xin Yang ◽

Chong Xu ◽

Luyao Li ◽

Xiangqiang Zeng

Keyword(s):

Neural Network ◽

Machine Learning ◽

Convolutional Neural Network ◽

Landslide Susceptibility ◽

Susceptibility Mapping ◽

Landslide Susceptibility Mapping ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Conventional Machine

Abstract Landslide susceptibility mapping (LSM) is a useful tool to estimate the probability of landslide occurrence, providing a scientific basis for natural hazards prevention, land use planning, and economic development in landslide-prone areas. To date, a large number of machine learning methods have been applied to LSM, and recently the advanced Convolutional Neural Network (CNN) has been gradually adopted to enhance the prediction accuracy of LSM. The objective of this study is to introduce a CNN based model in LSM and systematically compare its overall performance with the conventional machine learning models of random forest, logistic regression, and support vector machine. Herein, we selected the Jiuzhaigou region in Sichuan Province, China as the study area. A total number of 710 landslides and 12 predisposing factors were stacked to form spatial datasets for LSM. The ROC analysis and several statistical metrics, such as accuracy, root mean square error (RMSE), Kappa coefficient, sensitivity, and specificity were used to evaluate the performance of the models in the training and validation datasets. Finally, the trained models were calculated and the landslide susceptibility zones were mapped. Results suggest that both CNN and conventional machine-learning based models have a satisfactory performance (AUC: 85.72% − 90.17%). The CNN based model exhibits excellent good-of-fit and prediction capability, and achieves the highest performance (AUC: 90.17%) but also significantly reduces the salt-of-pepper effect, which indicates its great potential of application to LSM.

Download Full-text

HARTH: A Human Activity Recognition Dataset for Machine Learning

Sensors ◽

10.3390/s21237853 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7853

Author(s):

Aleksej Logacjov ◽

Kerstin Bach ◽

Atle Kongsvold ◽

Hilde Bremseth Bårdstu ◽

Paul Jarle Mork

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Convolutional Neural Network ◽

Activity Recognition ◽

Human Activity ◽

Short Term Memory ◽

Human Activity Recognition ◽

Support Vector ◽

Free Living

Existing accelerometer-based human activity recognition (HAR) benchmark datasets that were recorded during free living suffer from non-fixed sensor placement, the usage of only one sensor, and unreliable annotations. We make two contributions in this work. First, we present the publicly available Human Activity Recognition Trondheim dataset (HARTH). Twenty-two participants were recorded for 90 to 120 min during their regular working hours using two three-axial accelerometers, attached to the thigh and lower back, and a chest-mounted camera. Experts annotated the data independently using the camera’s video signal and achieved high inter-rater agreement (Fleiss’ Kappa =0.96). They labeled twelve activities. The second contribution of this paper is the training of seven different baseline machine learning models for HAR on our dataset. We used a support vector machine, k-nearest neighbor, random forest, extreme gradient boost, convolutional neural network, bidirectional long short-term memory, and convolutional neural network with multi-resolution blocks. The support vector machine achieved the best results with an F1-score of 0.81 (standard deviation: ±0.18), recall of 0.85±0.13, and precision of 0.79±0.22 in a leave-one-subject-out cross-validation. Our highly professional recordings and annotations provide a promising benchmark dataset for researchers to develop innovative machine learning approaches for precise HAR in free living.

Download Full-text

Malware Classification with Improved Convolutional Neural Network Model

International Journal of Computer Network and Information Security ◽

10.5815/ijcnis.2020.06.03 ◽

2020 ◽

Vol 12 (6) ◽

pp. 30-43

Author(s):

Sumit S. Lad ◽

◽

Amol C. Adamuthe

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Model ◽

Neural Network Model ◽

Personal Information ◽

Support Vector ◽

Svm Classifier ◽

Gray Scale ◽

Malware Classification

Malware is a threat to people in the cyber world. It steals personal information and harms computer systems. Various developers and information security specialists around the globe continuously work on strategies for detecting malware. From the last few years, machine learning has been investigated by many researchers for malware classification. The existing solutions require more computing resources and are not efficient for datasets with large numbers of samples. Using existing feature extractors for extracting features of images consumes more resources. This paper presents a Convolutional Neural Network model with pre-processing and augmentation techniques for the classification of malware gray-scale images. An investigation is conducted on the Malimg dataset, which contains 9339 gray-scale images. The dataset created from binaries of malware belongs to 25 different families. To create a precise approach and considering the success of deep learning techniques for the classification of raising the volume of newly created malware, we proposed CNN and Hybrid CNN+SVM model. The CNN is used as an automatic feature extractor that uses less resource and time as compared to the existing methods. Proposed CNN model shows (98.03%) accuracy which is better than other existing CNN models namely VGG16 (96.96%), ResNet50 (97.11%) InceptionV3 (97.22%), Xception (97.56%). The execution time of the proposed CNN model is significantly reduced than other existing CNN models. The proposed CNN model is hybridized with a support vector machine. Instead of using Softmax as activation function, SVM performs the task of classifying the malware based on features extracted by the CNN model. The proposed fine-tuned model of CNN produces a well-selected features vector of 256 Neurons with the FC layer, which is input to SVM. Linear SVC kernel transforms the binary SVM classifier into multi-class SVM, which classifies the malware samples using the one-against-one method and delivers the accuracy of 99.59%.

Download Full-text

Deep Learning Assisted Neonatal Cry Classification via Support Vector Machine Models

Frontiers in Public Health ◽

10.3389/fpubh.2021.670352 ◽

2021 ◽

Vol 9 ◽

Author(s):

Ashwini K ◽

P. M. Durai Raj Vincent ◽

Kathiravan Srinivasan ◽

Chuan-Yu Chang

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Feature Extraction ◽

Deep Learning ◽

Convolutional Neural Network ◽

Support Vector ◽

Svm Classifier ◽

Infant Cry ◽

Learning Techniques

Neonatal infants communicate with us through cries. The infant cry signals have distinct patterns depending on the purpose of the cries. Preprocessing, feature extraction, and feature selection need expert attention and take much effort in audio signals in recent days. In deep learning techniques, it automatically extracts and selects the most important features. For this, it requires an enormous amount of data for effective classification. This work mainly discriminates the neonatal cries into pain, hunger, and sleepiness. The neonatal cry auditory signals are transformed into a spectrogram image by utilizing the short-time Fourier transform (STFT) technique. The deep convolutional neural network (DCNN) technique takes the spectrogram images for input. The features are obtained from the convolutional neural network and are passed to the support vector machine (SVM) classifier. Machine learning technique classifies neonatal cries. This work combines the advantages of machine learning and deep learning techniques to get the best results even with a moderate number of data samples. The experimental result shows that CNN-based feature extraction and SVM classifier provides promising results. While comparing the SVM-based kernel techniques, namely radial basis function (RBF), linear and polynomial, it is found that SVM-RBF provides the highest accuracy of kernel-based infant cry classification system provides 88.89% accuracy.

Download Full-text

A Novel Method for Sea-Land Clutter Separation Using Regularized Randomized and Kernel Ridge Neural Networks

Sensors ◽

10.3390/s20226491 ◽

2020 ◽

Vol 20 (22) ◽

pp. 6491

Author(s):

Le Zhang ◽

Jeyan Thiyagalingam ◽

Anke Xue ◽

Shuwen Xu

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Classification Accuracy ◽

Signal Amplitude ◽

Statistical Characteristics ◽

Support Vector ◽

Amplitude Change ◽

Novel Method

Classification of clutter, especially in the context of shore based radars, plays a crucial role in several applications. However, the task of distinguishing and classifying the sea clutter from land clutter has been historically performed using clutter models and/or coastal maps. In this paper, we propose two machine learning, particularly neural network, based approaches for sea-land clutter separation, namely the regularized randomized neural network (RRNN) and the kernel ridge regression neural network (KRR). We use a number of features, such as energy variation, discrete signal amplitude change frequency, autocorrelation performance, and other statistical characteristics of the respective clutter distributions, to improve the performance of the classification. Our evaluation based on a unique mixed dataset, which is comprised of partially synthetic clutter data for land and real clutter data from sea, offers improved classification accuracy. More specifically, the RRNN and KRR methods offer 98.50% and 98.75% accuracy, outperforming the conventional support vector machine and extreme learning based solutions.

Download Full-text

Identification and Classification of Maize Drought Stress Using Deep Convolutional Neural Network

Symmetry ◽

10.3390/sym11020256 ◽

2019 ◽

Vol 11 (2) ◽

pp. 256 ◽

Cited By ~ 9

Author(s):

Jiangyong An ◽

Wanyi Li ◽

Maosong Li ◽

Sanrong Cui ◽

Huanran Yue

Keyword(s):

Neural Network ◽

Machine Learning ◽

Drought Stress ◽

Convolutional Neural Network ◽

Extraction Process ◽

Deep Convolutional Neural Network ◽

Gradient Boosting ◽

Moderate Drought ◽

Detection And Diagnosis

Drought stress seriously affects crop growth, development, and grain production. Existing machine learning methods have achieved great progress in drought stress detection and diagnosis. However, such methods are based on a hand-crafted feature extraction process, and the accuracy has much room to improve. In this paper, we propose the use of a deep convolutional neural network (DCNN) to identify and classify maize drought stress. Field drought stress experiments were conducted in 2014. The experiment was divided into three treatments: optimum moisture, light drought, and moderate drought stress. Maize images were obtained every two hours throughout the whole day by digital cameras. In order to compare the accuracy of DCNN, a comparative experiment was conducted using traditional machine learning on the same dataset. The experimental results demonstrated an impressive performance of the proposed method. For the total dataset, the accuracy of the identification and classification of drought stress was 98.14% and 95.95%, respectively. High accuracy was also achieved on the sub-datasets of the seedling and jointing stages. The identification and classification accuracy levels of the color images were higher than those of the gray images. Furthermore, the comparison experiments on the same dataset demonstrated that DCNN achieved a better performance than the traditional machine learning method (Gradient Boosting Decision Tree GBDT). Overall, our proposed deep learning-based approach is a very promising method for field maize drought identification and classification based on digital images.

Download Full-text

A Spectral Feature Based Convolutional Neural Network for Classification of Sea Surface Oil Spill

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8040160 ◽

2019 ◽

Vol 8 (4) ◽

pp. 160 ◽

Cited By ~ 11

Author(s):

Bingxin Liu ◽

Ying Li ◽

Guannan Li ◽

Anling Liu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Convolutional Neural Networks ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

Oil Film ◽

One Dimensional

Spectral characteristics play an important role in the classification of oil film, but the presence of too many bands can lead to information redundancy and reduced classification accuracy. In this study, a classification model that combines spectral indices-based band selection (SIs) and one-dimensional convolutional neural networks was proposed to realize automatic oil films classification using hyperspectral remote sensing images. Additionally, for comparison, the minimum Redundancy Maximum Relevance (mRMR) was tested for reducing the number of bands. The support vector machine (SVM), random forest (RF), and Hu’s convolutional neural networks (CNN) were trained and tested. The results show that the accuracy of classifications through the one dimensional convolutional neural network (1D CNN) models surpassed the accuracy of other machine learning algorithms such as SVM and RF. The model of SIs+1D CNN could produce a relatively higher accuracy oil film distribution map within less time than other models.

Download Full-text