Classification of Sentiment of Reviews using Supervised Machine Learning Techniques

2020 ◽  
pp. 143-163
Author(s):  
Abinash Tripathy ◽  
Santanu Kumar Rath

Sentiment analysis helps to determine hidden intention of the concerned author of any topic and provides an evaluation report on the polarity of any document. The polarity may be positive, negative or neutral. It is observed that very often the data associated with the sentiment analysis consist of the feedback given by various specialists on any topic or product. Thus, the review may be categorized properly into any sort of class based on the polarity, in order to have a good knowledge about the product. This article proposes an approach to classify the review dataset made on basis of sentiment analysis into different polarity groups. Four machine learning algorithms viz., Naive Bayes (NB), Support Vector Machine (SVM), Random Forest, and Linear Discriminant Analysis (LDA) have been considered in this paper for classification process. The obtained result on values of accuracy of the algorithms are critically examined by using different performance parameters, applied on two different datasets.

2017 ◽  
Vol 4 (1) ◽  
pp. 56-74 ◽  
Author(s):  
Abinash Tripathy ◽  
Santanu Kumar Rath

Sentiment analysis helps to determine hidden intention of the concerned author of any topic and provides an evaluation report on the polarity of any document. The polarity may be positive, negative or neutral. It is observed that very often the data associated with the sentiment analysis consist of the feedback given by various specialists on any topic or product. Thus, the review may be categorized properly into any sort of class based on the polarity, in order to have a good knowledge about the product. This article proposes an approach to classify the review dataset made on basis of sentiment analysis into different polarity groups. Four machine learning algorithms viz., Naive Bayes (NB), Support Vector Machine (SVM), Random Forest, and Linear Discriminant Analysis (LDA) have been considered in this paper for classification process. The obtained result on values of accuracy of the algorithms are critically examined by using different performance parameters, applied on two different datasets.


Author(s):  
V Umarani ◽  
A Julian ◽  
J Deepa

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.


The advancement in cyber-attack technologies have ushered in various new attacks which are difficult to detect using traditional intrusion detection systems (IDS).Existing IDS are trained to detect known patterns because of which newer attacks bypass the current IDS and go undetected. In this paper, a two level framework is proposed which can be used to detect unknown new attacks using machine learning techniques. In the first level the known types of classes for attacks are determined using supervised machine learning algorithms such as Support Vector Machine (SVM) and Neural networks (NN). The second level uses unsupervised machine learning algorithms such as K-means. The experimentation is carried out with four models with NSL- KDD dataset in Openstack cloud environment. The Model with Support Vector Machine for supervised machine learning, Gradual Feature Reduction (GFR) for feature selection and K-means for unsupervised algorithm provided the optimum efficiency of 94.56 %.


2020 ◽  
Author(s):  
Abdulhameed Ado Osi ◽  
Hussaini Garba Dikko ◽  
Mannir Abdu ◽  
Auwalu Ibrahim ◽  
Lawan Adamu Isma'il ◽  
...  

COVID-19 is an infectious disease discovered after the outbreak began in Wuhan, China, in December 2019. COVID-19 is still becoming an increasing global threat to public health. The virus has been escalated to many countries across the globe. This paper analyzed and compared the performance of three different supervised machine learning techniques; Linear Discriminant Analysis (LDA), Random Forest (RF), and Support Vector Machine (SVM) on COVID-19 dataset. The best level of accuracy between these three algorithms was determined by comparison of some metrics for assessing predictive performance such as accuracy, sensitivity, specificity, F-score, Kappa index, and ROC. From the analysis results, RF was found to be the best algorithm with 100% prediction accuracy in comparison with LDA and SVM with 95.2% and 90.9% respectively. Our analysis shows that out of these three classification models RF predicts COVID-19 patient's survival outcome with the highest accuracy. Chi-square test reveals that all the seven features except sex were significantly correlated with the COVID-19 patient's outcome (P-value < 0.005). Therefore, RF was recommended for COVID-19 patient outcome prediction that will help in early identification of possible sensitive cases for quick provision of quality health care, support and supervision.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 403
Author(s):  
Muhammad Waleed ◽  
Tai-Won Um ◽  
Tariq Kamal ◽  
Syed Muhammad Usman

In this paper, we apply the multi-class supervised machine learning techniques for classifying the agriculture farm machinery. The classification of farm machinery is important when performing the automatic authentication of field activity in a remote setup. In the absence of a sound machine recognition system, there is every possibility of a fraudulent activity taking place. To address this need, we classify the machinery using five machine learning techniques—K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB). For training of the model, we use the vibration and tilt of machinery. The vibration and tilt of machinery are recorded using the accelerometer and gyroscope sensors, respectively. The machinery included the leveler, rotavator and cultivator. The preliminary analysis on the collected data revealed that the farm machinery (when in operation) showed big variations in vibration and tilt, but observed similar means. Additionally, the accuracies of vibration-based and tilt-based classifications of farm machinery show good accuracy when used alone (with vibration showing slightly better numbers than the tilt). However, the accuracies improve further when both (the tilt and vibration) are used together. Furthermore, all five machine learning algorithms used for classification have an accuracy of more than 82%, but random forest was the best performing. The gradient boosting and random forest show slight over-fitting (about 9%), but both algorithms produce high testing accuracy. In terms of execution time, the decision tree takes the least time to train, while the gradient boosting takes the most time.


P300 speller in Brain Computer Interface (BCI) allows locked-in or completely paralyzed patients to communicate with humans. To achieve the performance of characterization and increase accuracy, machine learning techniques are used. The study is about an event related potential (ERP) P300 signal detection and classification using various machine learning algorithms. Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) are used to classify P300 and Non-P300 signal from Electroencephalography (EEG) signal. The performance of the system is evaluated based on f1-score using BCI competition III dataset II. In our system, we used LDA and SVM classification algorithms. Both the classifiers gave 91.0% classification accuracy.


2020 ◽  
Vol 17 (3) ◽  
pp. 360-383 ◽  
Author(s):  
Anantha Narayanan ◽  
Farzanah Desai ◽  
Tom Stewart ◽  
Scott Duncan ◽  
Lisa Mackay

Background: Application of machine learning for classifying human behavior is increasingly common as access to raw accelerometer data improves. The aims of this scoping review are (1) to examine if machine-learning techniques can accurately identify human activity behaviors from raw accelerometer data and (2) to summarize the practical implications of these machine-learning techniques for future work. Methods: Keyword searches were performed in Scopus, Web of Science, and EBSCO databases in 2018. Studies that applied supervised machine-learning techniques to raw accelerometer data and estimated components of physical activity were included. Information on study characteristics, machine-learning techniques, and key study findings were extracted from included studies. Results: Of the 53 studies included in the review, 75% were published in the last 5 years. Most studies predicted postures and activity type, rather than intensity, and were conducted in controlled environments using 1 or 2 devices. The most common models were support vector machine, random forest, and artificial neural network. Overall, classification accuracy ranged from 62% to 99.8%, although nearly 80% of studies achieved an overall accuracy above 85%. Conclusions: Machine-learning algorithms demonstrate good accuracy when predicting physical activity components; however, their application to free-living settings is currently uncertain.


2021 ◽  
Vol 04 (01) ◽  
Author(s):  
Mahmood Umar ◽  

Nowadays, social media platforms, blogs, and e-commerce are commonly use to express opinion on politics, movies, products, education respectively; for election forecasting, business boosting and improvement of teaching and learning. As a result, data generation becomes easier; producing big data which requires appropriate techniques and tools to analyse easily, accurately and timely. Thus, making sentiment analysis very demanding research area. This study will investigate on what basis (sentiment classification level) or area of application (data source) do supervised machine learning approaches particularly Support Vector Machine (SVM), Naïve Bayes, and Maximum Entropy algorithms, and other technique-lexicon-based approach give the best result in sentiment analysis. Based on the review of the literature there is a contradiction on the point that SVM generated the best result in analyzing student sentiment on document level. This study also discovers that sentiment analysis differs from system to system based on polarity (types of the classes to predict: positive or negative, subjective or objective), different levels of classification (sentence, phrase, or document level) and language that is processed. This research produces a taxonomy which serves as a guide for the choice of techniques in sentiment analysis. The taxonomy explores the sentiment classification levels and data preprocessing stages. It also explores that sentiment analysis techniques were organised in to three (3) groups; Machine learning, Lexicon and hybrid or combination. The machine learning techniques were sub-grouped in to two (2) namely; supervised and unsupervised. The supervised were organized in to two (2): Classification and Regression. un-supervised machine learning techniques includes clustering and association. The clustering technique consist of k-means. Decision tree which is a classification based under supervised type of machine learning technique consist of random forest,(Akinkunmi, 2019) while the ruled-based classifiers consist of confidence criterion and support criterion. The commonly used tools are Weka, Python compiler, and R programming tool.


Sentiment analysis or opinion mining has gained much attention in recent years.With the constantly evolving social networks and internet marketing sites, reviews and blogs have been obtained among them, they act as an significant source for future analysis and better decision making. These reviews are naturally unstructured and thus require pre processing and further classification to gain the significant information for future use. These reviews and blogs can be of different types such as positive, negative and neutral . Supervised machine learning techniquess help to classify these reviews. In this paper five machine learning algorithms (K-Nearest Neighbors (KNN), Decision Tree, Artificial neural networks (ANNs), Naïve bayes and Support Vector Machine (SVM))are used for classification of sentiments. These algorithms are analyzed usingTwitter dataset. Performance analysis of these algorithms are done by using various performance measures such as Accuracy, precision, recall and F-measure. The evaluation of these techniques on Twitter datasetshowed predictive ability of Machine Learning in opinion mining


2019 ◽  
Vol 4 (1) ◽  
pp. 43
Author(s):  
Nfn Nofriani

Poverty has been a major problem for most countries around the world, including Indonesia. One approach to eradicate poverty is through equitable distribution of social assistance for target households based on Integrated Database of social assistance. This study has compared several well-known supervised machine learning techniques, namely: Naïve Bayes Classifier, Support Vector Machines, K-Nearest Neighbor Classification, C4.5 Algorithm, and Random Forest Algorithm to predict household welfare status classification by using an Integrated Database as a study case. The main objective of this study was to choose the best-supervised machine learning approach in predicting the classification of household’s welfare status based on attributes in the Integrated Database. The results showed that the Random Forest Algorithm was the best.


Sign in / Sign up

Export Citation Format

Share Document