Tweets Classification on the Base of Sentiments for US Airline Companies

The use of data from social networks such as Twitter has been increased during the last few years to improve political campaigns, quality of products and services, sentiment analysis, etc. Tweets classification based on user sentiments is a collaborative and important task for many organizations. This paper proposes a voting classifier (VC) to help sentiment analysis for such organizations. The VC is based on logistic regression (LR) and stochastic gradient descent classifier (SGDC) and uses a soft voting mechanism to make the final prediction. Tweets were classified into positive, negative and neutral classes based on the sentiments they contain. In addition, a variety of machine learning classifiers were evaluated using accuracy, precision, recall and F1 score as the performance metrics. The impact of feature extraction techniques, including term frequency (TF), term frequency-inverse document frequency (TF-IDF), and word2vec, on classification accuracy was investigated as well. Moreover, the performance of a deep long short-term memory (LSTM) network was analyzed on the selected dataset. The results show that the proposed VC performs better than that of other classifiers. The VC is able to achieve an accuracy of 0.789, and 0.791 with TF and TF-IDF feature extraction, respectively. The results demonstrate that ensemble classifiers achieve higher accuracy than non-ensemble classifiers. Experiments further proved that the performance of machine learning classifiers is better when TF-IDF is used as the feature extraction method. Word2vec feature extraction performs worse than TF and TF-IDF feature extraction. The LSTM achieves a lower accuracy than machine learning classifiers.

Download Full-text

Performance Analysis of Machine Learning Classifiers for Intrusion Detection using UNSW-NB15 Dataset

10.5121/csit.2020.102004 ◽

2020 ◽

Author(s):

Geeta Kocher ◽

Gulshan Kumar

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Mean Squared Error ◽

Internet Technology ◽

Stochastic Gradient Descent ◽

Detection Accuracy ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Positive Rate ◽

The Impact

With the advancement of internet technology, the numbers of threats are also rising exponentially. To reduce the impact of these threats, researchers have proposed many solutions for intrusion detection. In the literature, various machine learning classifiers are trained on older datasets for intrusion detection which limits their detection accuracy. So, there is a need to train the machine learning classifiers on latest dataset. In this paper, UNSW-NB15, the latest dataset is used to train machine learning classifiers. On the basis of theoretical analysis, taxonomy is proposed in terms of lazy and eager learners. From this proposed taxonomy, KNearest Neighbors (KNN), Stochastic Gradient Descent (SGD), Decision Tree (DT), Random Forest (RF), Logistic Regression (LR) and Naïve Bayes (NB) classifiers are selected for training. The performance of these classifiers is tested in terms of Accuracy, Mean Squared Error (MSE), Precision, Recall, F1-Score, True Positive Rate (TPR) and False Positive Rate (FPR) on UNSW-NB15 dataset and comparative analysis of these machine learning classifiers is carried out. The experimental results show that RF classifier outperforms other classifiers.

Download Full-text

Predictive modelling of hospital readmission: Evaluation of different preprocessing techniques on machine learning classifiers

Intelligent Data Analysis ◽

10.3233/ida-205468 ◽

2021 ◽

Vol 25 (5) ◽

pp. 1073-1098

Author(s):

Nor Hamizah Miswan ◽

Chee Seng Chan ◽

Chong Guan Ng

Keyword(s):

Machine Learning ◽

Hospital Readmission ◽

Performance Metrics ◽

Predictive Performance ◽

Predictive Modelling ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Set Up ◽

The Right ◽

The Impact

Hospital readmission is a major cost for healthcare systems worldwide. If patients with a higher potential of readmission could be identified at the start, existing resources could be used more efficiently, and appropriate plans could be implemented to reduce the risk of readmission. Therefore, it is important to predict the right target patients. Medical data is usually noisy, incomplete, and inconsistent. Hence, before developing a prediction model, it is crucial to efficiently set up the predictive model so that improved predictive performance is achieved. The current study aims to analyse the impact of different preprocessing methods on the performance of different machine learning classifiers. The preprocessing applied by previous hospital readmission studies were compared, and the most common approaches highlighted such as missing value imputation, feature selection, data balancing, and feature scaling. The hyperparameters were selected using Bayesian optimisation. The different preprocessing pipelines were assessed using various performance metrics and computational costs. The results indicated that the preprocessing approaches helped improve the model’s prediction of hospital readmission.

Download Full-text

Sentiment Analysis on E-Learning Using Machine Learning Classifiers in Python

Rising Threats in Expert Applications and Solutions - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-15-6014-9_1 ◽

2020 ◽

pp. 1-8

Author(s):

Shilpa Singh Hanswal ◽

Astha Pareek ◽

Geetika Vyas ◽

Amita Sharma

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

E Learning

Download Full-text

CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network

Scientific Reports ◽

10.1038/s41598-019-53034-3 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 5

Author(s):

Kanggeun Lee ◽

Hyoung-oh Jeong ◽

Semin Lee ◽

Won-Ki Jeong

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Genomic Data ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Somatic Alterations ◽

The Impact ◽

Type Classification

AbstractWith recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy.

Download Full-text