scholarly journals Adapting and Extending a Typology to Identify Vaccine Misinformation on Twitter

2020 ◽  
Vol 110 (S3) ◽  
pp. S331-S339
Author(s):  
Amelia Jamison ◽  
David A. Broniatowski ◽  
Michael C. Smith ◽  
Kajal S. Parikh ◽  
Adeena Malik ◽  
...  

Objectives. To adapt and extend an existing typology of vaccine misinformation to classify the major topics of discussion across the total vaccine discourse on Twitter. Methods. Using 1.8 million vaccine-relevant tweets compiled from 2014 to 2017, we adapted an existing typology to Twitter data, first in a manual content analysis and then using latent Dirichlet allocation (LDA) topic modeling to extract 100 topics from the data set. Results. Manual annotation identified 22% of the data set as antivaccine, of which safety concerns and conspiracies were the most common themes. Seventeen percent of content was identified as provaccine, with roughly equal proportions of vaccine promotion, criticizing antivaccine beliefs, and vaccine safety and effectiveness. Of the 100 LDA topics, 48 contained provaccine sentiment and 28 contained antivaccine sentiment, with 9 containing both. Conclusions. Our updated typology successfully combines manual annotation with machine-learning methods to estimate the distribution of vaccine arguments, with greater detail on the most distinctive topics of discussion. With this information, communication efforts can be developed to better promote vaccines and avoid amplifying antivaccine rhetoric on Twitter.

Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1285
Author(s):  
Mohammed Al-Sarem ◽  
Faisal Saeed ◽  
Zeyad Ghaleb Al-Mekhlafi ◽  
Badiea Abdulkarem Mohammed ◽  
Tawfik Al-Hadhrami ◽  
...  

Security attacks on legitimate websites to steal users’ information, known as phishing attacks, have been increasing. This kind of attack does not just affect individuals’ or organisations’ websites. Although several detection methods for phishing websites have been proposed using machine learning, deep learning, and other approaches, their detection accuracy still needs to be enhanced. This paper proposes an optimized stacking ensemble method for phishing website detection. The optimisation was carried out using a genetic algorithm (GA) to tune the parameters of several ensemble machine learning methods, including random forests, AdaBoost, XGBoost, Bagging, GradientBoost, and LightGBM. The optimized classifiers were then ranked, and the best three models were chosen as base classifiers of a stacking ensemble method. The experiments were conducted on three phishing website datasets that consisted of both phishing websites and legitimate websites—the Phishing Websites Data Set from UCI (Dataset 1); Phishing Dataset for Machine Learning from Mendeley (Dataset 2, and Datasets for Phishing Websites Detection from Mendeley (Dataset 3). The experimental results showed an improvement using the optimized stacking ensemble method, where the detection accuracy reached 97.16%, 98.58%, and 97.39% for Dataset 1, Dataset 2, and Dataset 3, respectively.


Author(s):  
Antônio Diogo Forte Martins ◽  
José Maria Monteiro ◽  
Javam Machado

During the coronavirus pandemic, the problem of misinformation arose once again, quite intensely, through social networks. In Brazil, one of the primary sources of misinformation is the messaging application WhatsApp. However, due to WhatsApp's private messaging nature, there still few methods of misinformation detection developed specifically for this platform. In this context, the automatic misinformation detection (MID) about COVID-19 in Brazilian Portuguese WhatsApp messages becomes a crucial challenge. In this work, we present the COVID-19.BR, a data set of WhatsApp messages about coronavirus in Brazilian Portuguese, collected from Brazilian public groups and manually labeled. Then, we are investigating different machine learning methods in order to build an efficient MID for WhatsApp messages. So far, our best result achieved an F1 score of 0.774 due to the predominance of short texts. However, when texts with less than 50 words are filtered, the F1 score rises to 0.85.


2019 ◽  
Author(s):  
Longxiang Su ◽  
Chun Liu ◽  
Dongkai Li ◽  
Jie He ◽  
Fanglan Zheng ◽  
...  

BACKGROUND Heparin is one of the most commonly used medications in intensive care units. In clinical practice, the use of a weight-based heparin dosing nomogram is standard practice for the treatment of thrombosis. Recently, machine learning techniques have dramatically improved the ability of computers to provide clinical decision support and have allowed for the possibility of computer generated, algorithm-based heparin dosing recommendations. OBJECTIVE The objective of this study was to predict the effects of heparin treatment using machine learning methods to optimize heparin dosing in intensive care units based on the predictions. Patient state predictions were based upon activated partial thromboplastin time in 3 different ranges: subtherapeutic, normal therapeutic, and supratherapeutic, respectively. METHODS Retrospective data from 2 intensive care unit research databases (Multiparameter Intelligent Monitoring in Intensive Care III, MIMIC-III; e–Intensive Care Unit Collaborative Research Database, eICU) were used for the analysis. Candidate machine learning models (random forest, support vector machine, adaptive boosting, extreme gradient boosting, and shallow neural network) were compared in 3 patient groups to evaluate the classification performance for predicting the subtherapeutic, normal therapeutic, and supratherapeutic patient states. The model results were evaluated using precision, recall, F1 score, and accuracy. RESULTS Data from the MIMIC-III database (n=2789 patients) and from the eICU database (n=575 patients) were used. In 3-class classification, the shallow neural network algorithm performed the best (F1 scores of 87.26%, 85.98%, and 87.55% for data set 1, 2, and 3, respectively). The shallow neural network algorithm achieved the highest F1 scores within the patient therapeutic state groups: subtherapeutic (data set 1: 79.35%; data set 2: 83.67%; data set 3: 83.33%), normal therapeutic (data set 1: 93.15%; data set 2: 87.76%; data set 3: 84.62%), and supratherapeutic (data set 1: 88.00%; data set 2: 86.54%; data set 3: 95.45%) therapeutic ranges, respectively. CONCLUSIONS The most appropriate model for predicting the effects of heparin treatment was found by comparing multiple machine learning models and can be used to further guide optimal heparin dosing. Using multicenter intensive care unit data, our study demonstrates the feasibility of predicting the outcomes of heparin treatment using data-driven methods, and thus, how machine learning–based models can be used to optimize and personalize heparin dosing to improve patient safety. Manual analysis and validation suggested that the model outperformed standard practice heparin treatment dosing.


2019 ◽  
Vol 23 (1) ◽  
pp. 125-142
Author(s):  
Helle Hein ◽  
Ljubov Jaanuska

In this paper, the Haar wavelet discrete transform, the artificial neural networks (ANNs), and the random forests (RFs) are applied to predict the location and severity of a crack in an Euler–Bernoulli cantilever subjected to the transverse free vibration. An extensive investigation into two data collection sets and machine learning methods showed that the depth of a crack is more difficult to predict than its location. The data set of eight natural frequency parameters produces more accurate predictions on the crack depth; meanwhile, the data set of eight Haar wavelet coefficients produces more precise predictions on the crack location. Furthermore, the analysis of the results showed that the ensemble of 50 ANN trained by Bayesian regularization and Levenberg–Marquardt algorithms slightly outperforms RF.


2016 ◽  
Vol 9 (3) ◽  
pp. 1065-1072 ◽  
Author(s):  
Wiesław Paja ◽  
Mariusz Wrzesien ◽  
Rafał Niemiec ◽  
Witold R. Rudnicki

Abstract. Climate models are extremely complex pieces of software. They reflect the best knowledge on the physical components of the climate; nevertheless, they contain several parameters, which are too weakly constrained by observations, and can potentially lead to a simulation crashing. Recently a study by Lucas et al. (2013) has shown that machine learning methods can be used for predicting which combinations of parameters can lead to the simulation crashing and hence which processes described by these parameters need refined analyses. In the current study we reanalyse the data set used in this research using different methodology. We confirm the main conclusion of the original study concerning the suitability of machine learning for the prediction of crashes. We show that only three of the eight parameters indicated in the original study as relevant for prediction of the crash are indeed strongly relevant, three others are relevant but redundant and two are not relevant at all. We also show that the variance due to the split of data between training and validation sets has a large influence both on the accuracy of predictions and on the relative importance of variables; hence only a cross-validated approach can deliver a robust prediction of performance and relevance of variables.


2019 ◽  
pp. 089443931988844
Author(s):  
Ranjith Vijayakumar ◽  
Mike W.-L. Cheung

Machine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.


2019 ◽  
Author(s):  
Frederik Sandfort ◽  
Felix Strieth-Kalthoff ◽  
Marius Kühnemund ◽  
Christian Beecks ◽  
Frank Glorius

Despite their enormous potential, machine learning methods have only found limited application in predicting reaction outcomes, as current models are often highly complex and, most importantly, are not transferrable to different problem sets. Herein, we present the direct utilization of Lewis structures in a machine learning platform for diverse applications in organic chemistry. Therefore, an input based on multiple fingerprint features (MFF) as a universal molecular representation was developed and used for problem sets of increasing complexity: First, molecular properties across a diverse array of molecules could be predicted accurately. Next, reaction outcomes such as stereoselectivities and yields were predicted for experimental data sets that were previously evaluated using (complex) problem-oriented descriptor models. As a final application, a systematic high-throughput data set showed good correlation when using the MFF model, which suggests that this approach is general and ready for immediate adoption by chemists.


Author(s):  
Nisha P. Shetty ◽  
Balachandra Muniyal ◽  
Arshia Anand ◽  
Sushant Kumar ◽  
Sushant Prabhu

Social network and microblogging sites such as Twitter are widespread amongst all generations nowadays where people connect and share their feelings, emotions, pursuits etc. Depression, one of the most common mental disorder, is an acute state of sadness where person loses interest in all activities. If not treated immediately this can result in dire consequences such as death. In this era of virtual world, people are more comfortable in expressing their emotions in such sites as they have become a part and parcel of everyday lives. The research put forth thus, employs machine learning classifiers on the twitter data set to detect if a person’s tweet indicates any sign of depression or not.


Sign in / Sign up

Export Citation Format

Share Document