Credibility Detection on Twitter News Using Machine Learning Approach

Social media presence is a crucial portion of our life. It is considered one of the most important sources of information than traditional sources. Twitter has become one of the prevalent social sites for exchanging viewpoints and feelings. This work proposes a supervised machine learning system for discovering false news. One of the credibility detection problems is finding new features that are most predictive to better performance classifiers. Both features depending on new content, and features based on the user are used. The features' importance is examined, and their impact on the performance. The reasons for choosing the final feature set using the k-best method are explained. Seven supervised machine learning classifiers are used. They are Naïve Bayes (NB), Support vector machine (SVM), Knearest neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Maximum entropy (ME), and conditional random forest (CRF). Training and testing models were conducted using the Pheme dataset. The feature's analysis is introduced and compared to the features depending on the content, as the decisive factors in determining the validity. Random forest shows the highest performance while using user-based features only and using a mixture of both types of features; features depending on content and the features based on the user, accuracy (82.2 %) in using user-based features only. We achieved the highest results by using both types of features, utilizing random forest classifier accuracy(83.4%). In contrast, logistic regression was the best as to using features that are based on contents. Performance is measured by different measurements accuracy, precision, recall, and F1_score. We compared our feature set with other studies' features and the impact of our new features. We found that our conclusions exhibit high enhancement concerning discovering and verifying the false news regarding the discovery and verification of false news, comparing it to the current results of how it is developed.

Download Full-text

Performance Analysis of Supervised Machine Learning Algorithms on Medical Dataset

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f7908.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1637-1642

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Learning System ◽

Supervised Machine Learning ◽

Support Vector ◽

Heart Problem

Machine learning (ML) algorithms are designed to perform prediction based on features. With the help of machine learning, system can automatically learn and improve by experience. Machine learning comes under Artificial intelligence. Machine learning is broadly categorized in two types: supervised and unsupervised. Supervised ML performs classification and unsupervised is for clustering. In present scenario, machine learning is used in various areas. It can be used for biometric recognition, hand writing recognition, medical diagnosis etc. In medical field, machine learning plays an important role in identifying diseases based on patient’s features. Presently,doctors use software application based on machine learning algorithm in various disease diagnosis like cancer, cardiac arrest and many more. In this paper we used an ensemble learning method to predict heart problem. Our study described the performance of ML algorithms by comparing various evaluating parameters such as F-measure, Recall, ROC, precision and accuracy. The study done with various combination ML classifiers such as, Decision Tree (DT), Naïve Bayes (NB), Support Vector Machine (SVM), Random Forest (RF) algorithm to predict heart problem. The result showed that by combining two ML algorithm, DT with NB, 81.1% accuracy was achieved. Simultaneously, the models like Support Vector machine (SVM), Decision tree, Naïve Bayes, Random Forest models were also trained and tested individually.

Download Full-text

A Two Layer Machine Learning System for Intrusion Detection Based on Random Forest and Support Vector Machine

2020 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE) ◽

10.1109/wiecon-ece52138.2020.9397945 ◽

2020 ◽

Author(s):

Sabrina Afroz ◽

S.M Ariful Islam ◽

Samin Nawer Rafa ◽

Maheen Islam

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Intrusion Detection ◽

Learning System ◽

Support Vector

Download Full-text

Detecting Face Touching Using Smartwatches to Mitigate the Spread of COVID-19: Pilot Study (Preprint)

10.2196/preprints.28799 ◽

2021 ◽

Author(s):

Chen Bai ◽

Yu-Peng Chen ◽

Adam Wolach ◽

Lisa Anthony ◽

Mamoun Mardini

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Respiratory Diseases ◽

Window Size ◽

Support Vector ◽

Accelerometer Data ◽

Respiratory Illnesses ◽

Motion Data ◽

Machine Learning Methods

BACKGROUND Frequent spontaneous facial self-touches, predominantly during outbreaks, have the theoretical potential to be a mechanism of contracting and transmitting diseases. Despite the recent advent of vaccines, behavioral approaches remain an integral part of reducing the spread of COVID-19 and other respiratory illnesses. Real-time biofeedback of face touching can potentially mitigate the spread of respiratory diseases. The gap addressed in this study is the lack of an on-demand platform that utilizes motion data from smartwatches to accurately detect face touching. OBJECTIVE The aim of this study was to utilize the functionality and the spread of smartwatches to develop a smartwatch application to identifying motion signatures that are mapped accurately to face touching. METHODS Participants (n=10, 50% women, aged 20-83) performed 10 physical activities classified into: face touching (FT) and non-face touching (NFT) categories, in a standardized laboratory setting. We developed a smartwatch application on Samsung Galaxy Watch to collect raw accelerometer data from participants. Then, data features were extracted from consecutive non-overlapping windows varying from 2-16 seconds. We examined the performance of state-of-the-art machine learning methods on face touching movements recognition (FT vs NFT) and individual activity recognition (IAR): logistic regression, support vector machine, decision trees and random forest. RESULTS Machine learning models were accurate in recognizing face touching categories; logistic regression achieved the best performance across all metrics (Accuracy: 0.93 +/- 0.08, Recall: 0.89 +/- 0.16, Precision: 0.93 +/- 0.08, F1-score: 0.90 +/- 0.11, AUC: 0.95 +/- 0.07) at the window size of 5 seconds. IAR models resulted in lower performance; the random forest classifier achieved the best performance across all metrics (Accuracy: 0.70 +/- 0.14, Recall: 0.70 +/- 0.14, Precision: 0.70 +/- 0.16, F1-score: 0.67 +/- 0.15) at the window size of 9 seconds. CONCLUSIONS Wearable devices, powered with machine learning, are effective in detecting facial touches. This is highly significant during respiratory infection outbreaks, as it has a great potential to refrain people from touching their faces and potentially mitigate the possibility of transmitting COVID-19 and future respiratory diseases.

Download Full-text

A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems

mBio ◽

10.1128/mbio.00434-20 ◽

2020 ◽

Vol 11 (3) ◽

Cited By ~ 9

Author(s):

Begüm D. Topçuoğlu ◽

Nicholas A. Lesniak ◽

Mack T. Ruffin ◽

Jenna Wiens ◽

Patrick D. Schloss

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Sequence Data ◽

Characteristic Curve ◽

Predictive Performance ◽

Model Complexity ◽

Support Vector ◽

Classification Problems ◽

Microbial Biomarkers

ABSTRACT Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability. IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.

Download Full-text

Classification models using circulating neutrophil transcripts can detect unruptured intracranial aneurysm

Journal of Translational Medicine ◽

10.1186/s12967-020-02550-2 ◽

2020 ◽

Vol 18 (1) ◽

Author(s):

Kerry E. Poppenberg ◽

Vincent M. Tutino ◽

Lu Li ◽

Muhammad Waqas ◽

Armond June ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

Model Performance ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Training Cohort ◽

Network Analyses ◽

Machine Learning Methods

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.

Download Full-text

Comparing Supervised Machine Learning Strategies and Linguistic Features to Search for Very Negative Opinions

Information ◽

10.3390/info10010016 ◽

2019 ◽

Vol 10 (1) ◽

pp. 16 ◽

Cited By ~ 3

Author(s):

Sattam Almatarneh ◽

Pablo Gamallo

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Empirical Study ◽

Learning Strategies ◽

Supervised Machine Learning ◽

Support Vector ◽

Word Embeddings ◽

Linguistic Features ◽

Machine Learning Classifiers ◽

Supervised Machine Learning Classifiers

In this paper, we examine the performance of several classifiers in the process of searching for very negative opinions. More precisely, we do an empirical study that analyzes the influence of three types of linguistic features (n-grams, word embeddings, and polarity lexicons) and their combinations when they are used to feed different supervised machine learning classifiers: Naive Bayes (NB), Decision Tree (DT), and Support Vector Machine (SVM). The experiments we have carried out show that SVM clearly outperforms NB and DT in all datasets by taking into account all features individually as well as their combinations.

Download Full-text

Application of Natural Language Processing with Supervised Machine Learning Techniques to Predict the Overall Drugs Performance

AJIT-e Online Academic Journal of Information Technology ◽

10.5824/ajite.2020.01.001.x ◽

2020 ◽

Vol 11 (40) ◽

pp. 8-23

Author(s):

Pius MARTHIN ◽

Duygu İÇEN

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Semantic Analysis ◽

Classification Tree ◽

Supervised Machine Learning ◽

Training Dataset ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

Online product reviews have become a valuable source of information which facilitate customer decision with respect to a particular product. With the wealthy information regarding user's satisfaction and experiences about a particular drug, pharmaceutical companies make the use of online drug reviews to improve the quality of their products. Machine learning has enabled scientists to train more efficient models which facilitate decision making in various fields. In this manuscript we applied a drug review dataset used by (Gräβer, Kallumadi, Malberg,& Zaunseder, 2018), available freely from machine learning repository website of the University of California Irvine (UCI) to identify best machine learning model which provide a better prediction of the overall drug performance with respect to users' reviews. Apart from several manipulations done to improve model accuracy, all necessary procedures required for text analysis were followed including text cleaning and transformation of texts to numeric format for easy training machine learning models. Prior to modeling, we obtained overall sentiment scores for the reviews. Customer's reviews were summarized and visualized using a bar plot and word cloud to explore the most frequent terms. Due to scalability issues, we were able to use only the sample of the dataset. We randomly sampled 15000 observations from the 161297 training dataset and 10000 observations were randomly sampled from the 53766 testing dataset. Several machine learning models were trained using 10 folds cross-validation performed under stratified random sampling. The trained models include Classification and Regression Trees (CART), classification tree by C5.0, logistic regression (GLM), Multivariate Adaptive Regression Spline (MARS), Support vector machine (SVM) with both radial and linear kernels and a classification tree using random forest (Random Forest). Model selection was done through a comparison of accuracies and computational efficiency. Support vector machine (SVM) with linear kernel was significantly best with an accuracy of 83% compared to the rest. Using only a small portion of the dataset, we managed to attain reasonable accuracy in our models by applying the TF-IDF transformation and Latent Semantic Analysis (LSA) technique to our TDM.

Download Full-text

Detecting Real-Time Fall of Elderly People Using Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39635 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1913-1918

Author(s):

Prathima P

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Elderly People ◽

Fall Detection ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

False Alarms ◽

Severe Injuries

Abstract: Fall is a significant national health issue for the elderly people, generally resulting in severe injuries when the person lies down on the floor over an extended period without any aid after experiencing a great fall. Thus, elders need to be cared very attentively. A supervised-machine learning based fall detection approach with accelerometer, gyroscope is devised. The system can detect falls by grouping different actions as fall or non-fall events and the care taker is alerted immediately as soon as the person falls. The public dataset SisFall with efficient class of features is used to identify fall. The Random Forest (RF) and Support Vector Machine (SVM) machine learning algorithms are employed to detect falls with lesser false alarms. The SVM algorithm obtain a highest accuracy of 99.23% than RF algorithm. Keywords: Fall detection, Machine learning, Supervised classification, Sisfall, Activities of daily living, Wearable sensors, Random Forest, Support Vector Machine

Download Full-text

Machine learning in the diagnosis of Myocardial Infarction with Non-Obstructive Coronary Arteries

European Heart Journal ◽

10.1093/eurheartj/ehab724.3067 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

Author(s):

M J Espinosa Pascual ◽

P Vaquero Martinez ◽

V Vaquero Martinez ◽

J Lopez Pais ◽

B Izquierdo Coronel ◽

...

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Obstructive Coronary Artery Disease ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector

Abstract Introduction Out of all patients admitted with Myocardial Infarction, 10 to 15% have Myocardial Infarction with Non-Obstructive Coronaries Arteries (MINOCA). Classification algorithms based on deep learning substantially exceed traditional diagnostic algorithms. Therefore, numerous machine learning models have been proposed as useful tools for the detection of various pathologies, but to date no study has proposed a diagnostic algorithm for MINOCA. Purpose The aim of this study was to estimate the diagnostic accuracy of several automated learning algorithms (Support-Vector Machine [SVM], Random Forest [RF] and Logistic Regression [LR]) to discriminate between people suffering from MINOCA from those with Myocardial Infarction with Obstructive Coronary Artery Disease (MICAD) at the time of admission and before performing a coronary angiography, whether invasive or not. Methods A Diagnostic Test Evaluation study was carried out applying the proposed algorithms to a database constituted by 553 consecutive patients admitted to our Hospital with Myocardial Infarction. According to the definitions of 2016 ESC Position Paper on MINOCA, patients were classified into two groups: MICAD and MINOCA. Out of the total 553 patients, 214 were discarded due to the lack of complete data. The set of machine learning algorithms was trained on 244 patients (training sample: 75%) and tested on 80 patients (test sample: 25%). A total of 64 variables were available for each patient, including demographic, clinical and laboratorial features before the angiographic procedure. Finally, the diagnostic precision of each architecture was taken. Results The most accurate classification model was the Random Forest algorithm (Specificity [Sp] 0.88, Sensitivity [Se] 0.57, Negative Predictive Value [NPV] 0.93, Area Under the Curve [AUC] 0.85 [CI 0.83–0.88]) followed by the standard Logistic Regression (Sp 0.76, Se 0.57, NPV 0.92 AUC 0.74 and Support-Vector Machine (Sp 0.84, Se 0.38, NPV 0.90, AUC 0.78) (see graph). The variables that contributed the most in order to discriminate a MINOCA from a MICAD were the traditional cardiovascular risk factors, biomarkers of myocardial injury, hemoglobin and gender. Results were similar when the 19 patients with Takotsubo syndrome were excluded from the analysis. Conclusion A prediction system for diagnosing MINOCA before performing coronary angiographies was developed using machine learning algorithms. Results show higher accuracy of diagnosing MINOCA than conventional statistical methods. This study supports the potential of machine learning algorithms in clinical cardiology. However, further studies are required in order to validate our results. FUNDunding Acknowledgement Type of funding sources: None. ROC curves of different algorithms

Download Full-text

A Supervised Machine Learning Approach to Detect the On/Off State in Parkinson’s Disease Using Wearable Based Gait Signals

Diagnostics ◽

10.3390/diagnostics10060421 ◽

2020 ◽

Vol 10 (6) ◽

pp. 421

Author(s):

Satyabrata Aich ◽

Jinyoung Youn ◽

Sabyasachi Chakraborty ◽

Pyari Mohan Pradhan ◽

Jin-han Park ◽

...

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

Random Forest ◽

Wearable Devices ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Healthcare Applications ◽

Reported Data

Fluctuations in motor symptoms are mostly observed in Parkinson’s disease (PD) patients. This characteristic is inevitable, and can affect the quality of life of the patients. However, it is difficult to collect precise data on the fluctuation characteristics using self-reported data from PD patients. Therefore, it is necessary to develop a suitable technology that can detect the medication state, also termed the “On”/“Off” state, automatically using wearable devices; at the same time, this could be used in the home environment. Recently, wearable devices, in combination with powerful machine learning techniques, have shown the potential to be effectively used in critical healthcare applications. In this study, an algorithm is proposed that can detect the medication state automatically using wearable gait signals. A combination of features that include statistical features and spatiotemporal gait features are used as inputs to four different classifiers such as random forest, support vector machine, K nearest neighbour, and Naïve Bayes. In total, 20 PD subjects with definite motor fluctuations have been evaluated by comparing the performance of the proposed algorithm in association with the four aforementioned classifiers. It was found that random forest outperformed the other classifiers with an accuracy of 96.72%, a recall of 97.35%, and a precision of 96.92%.

Download Full-text