Sentiment Analysis in Portuguese Texts from Online Health Community Forums: Data, Model and Evaluation

This study introduces novel data and models for the task of Sentiment Analysis in Portuguese texts about Diabetes Mellitus. The corpus contains 1290 posts retrieved from online health community forums in Portuguese and annotated by two annotators according to 3 sentiment categories (e.g. Positive, Neutral and Negative). Evaluation of traditional (Support Vector Machine, Decision Tree, Random Forest and Logistic Regression classifiers) and state-ofthe-art (BERT-based models) machine learning classifiers for the task showed the advantage in performance of the latter models as expected. Data and models are available to the community upon request.

Download Full-text

Classifying Lensed Gravitational Waves in the Geometrical Optics Limit with Machine Learning

American Journal of Undergraduate Research ◽

10.33697/ajur.2019.019 ◽

2019 ◽

Vol 16 (2) ◽

pp. 5-16

Author(s):

Amit Singh ◽

Ivan Li ◽

Otto Hannuksela ◽

Tjonnie Li ◽

Kyungmin Kim

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Gravitational Wave ◽

Gravitational Waves ◽

Geometrical Optics ◽

Supervised Machine Learning ◽

Support Vector ◽

Multi Layer Perceptron ◽

Machine Learning Classifiers ◽

Learning Classifiers

Gravitational waves are theorized to be gravitationally lensed when they propagate near massive objects. Such lensing effects cause potentially detectable repeated gravitational wave patterns in ground- and space-based gravitational wave detectors. These effects are difficult to discriminate when the lens is small and the repeated patterns superpose. Traditionally, matched filtering techniques are used to identify gravitational-wave signals, but we instead aim to utilize machine learning techniques to achieve this. In this work, we implement supervised machine learning classifiers (support vector machine, random forest, multi-layer perceptron) to discriminate such lensing patterns in gravitational wave data. We train classifiers with spectrograms of both lensed and unlensed waves using both point-mass and singular isothermal sphere lens models. As the result, classifiers return F1 scores ranging from 0:852 to 0:996, with precisions from 0:917 to 0:992 and recalls ranging from 0:796 to 1:000 depending on the type of classifier and lensing model used. This supports the idea that machine learning classifiers are able to correctly determine lensed gravitational wave signals. This also suggests that in the future, machine learning classifiers may be used as a possible alternative to identify lensed gravitational wave events and to allow us to study gravitational wave sources and massive astronomical objects through further analysis. KEYWORDS: Gravitational Waves; Gravitational Lensing; Geometrical Optics; Machine Learning; Classification; Support Vector Machine; Random Tree Forest; Multi-layer Perceptron

Download Full-text

Evaluating Annotated Dataset of Customer Reviews for Aspect Based Sentiment Analysis

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2122 ◽

2021 ◽

Author(s):

Dimple Chehal ◽

Parul Gupta ◽

Payal Gulati

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Supervised Machine Learning ◽

Support Vector ◽

Product Reviews ◽

K Nearest Neighbor ◽

Customer Reviews ◽

Percent Accuracy

Sentiment analysis of product reviews on e-commerce platforms aids in determining the preferences of customers. Aspect-based sentiment analysis (ABSA) assists in identifying the contributing aspects and their corresponding polarity, thereby allowing for a more detailed analysis of the customer’s inclination toward product aspects. This analysis helps in the transition from the traditional rating-based recommendation process to an improved aspect-based process. To automate ABSA, a labelled dataset is required to train a supervised machine learning model. As the availability of such dataset is limited due to the involvement of human efforts, an annotated dataset has been provided here for performing ABSA on customer reviews of mobile phones. The dataset comprising of product reviews of Apple-iPhone11 has been manually annotated with predefined aspect categories and aspect sentiments. The dataset’s accuracy has been validated using state-of-the-art machine learning techniques such as Naïve Bayes, Support Vector Machine, Logistic Regression, Random Forest, K-Nearest Neighbor and Multi Layer Perceptron, a sequential model built with Keras API. The MLP model built through Keras Sequential API for classifying review text into aspect categories produced the most accurate result with 67.45 percent accuracy. K- nearest neighbor performed the worst with only 49.92 percent accuracy. The Support Vector Machine had the highest accuracy for classifying review text into aspect sentiments with an accuracy of 79.46 percent. The model built with Keras API had the lowest 76.30 percent accuracy. The contribution is beneficial as a benchmark dataset for ABSA of mobile phone reviews.

Download Full-text

Implementation of n-gram Methodology for Rotten Tomatoes Review Dataset Sentiment Analysis

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2017010103 ◽

2017 ◽

Vol 7 (1) ◽

pp. 30-41 ◽

Cited By ~ 12

Author(s):

Prayag Tiwari ◽

Brojo Kishore Mishra ◽

Sachin Kumar ◽

Vivek Kumar

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Maximum Entropy ◽

Learning Strategies ◽

Supervised Machine Learning ◽

Support Vector ◽

N Gram ◽

F Measure ◽

Blog Posts

Sentiment Analysis intends to get the basic perspective of the content, which may be anything that holds a subjective supposition, for example, an online audit, Comments on Blog posts, film rating and so forth. These surveys and websites might be characterized into various extremity gatherings, for example, negative, positive, and unbiased keeping in mind the end goal to concentrate data from the info dataset. Supervised machine learning strategies group these reviews. In this paper, three distinctive machine learning calculations, for example, Support Vector Machine (SVM), Maximum Entropy (ME) and Naive Bayes (NB), have been considered for the arrangement of human conclusions. The exactness of various strategies is basically inspected keeping in mind the end goal to get to their execution on the premise of parameters, e.g. accuracy, review, f-measure, and precision.

Download Full-text

Comparing Supervised Machine Learning Strategies and Linguistic Features to Search for Very Negative Opinions

Information ◽

10.3390/info10010016 ◽

2019 ◽

Vol 10 (1) ◽

pp. 16 ◽

Cited By ~ 3

Author(s):

Sattam Almatarneh ◽

Pablo Gamallo

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Empirical Study ◽

Learning Strategies ◽

Supervised Machine Learning ◽

Support Vector ◽

Word Embeddings ◽

Linguistic Features ◽

Machine Learning Classifiers ◽

Supervised Machine Learning Classifiers

In this paper, we examine the performance of several classifiers in the process of searching for very negative opinions. More precisely, we do an empirical study that analyzes the influence of three types of linguistic features (n-grams, word embeddings, and polarity lexicons) and their combinations when they are used to feed different supervised machine learning classifiers: Naive Bayes (NB), Decision Tree (DT), and Support Vector Machine (SVM). The experiments we have carried out show that SVM clearly outperforms NB and DT in all datasets by taking into account all features individually as well as their combinations.

Download Full-text

Aspect Category Classification dengan Pendekatan Machine Learning Menggunakan Dataset Bahasa Indonesia

Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI) ◽

10.22146/jnteti.v10i3.1819 ◽

2021 ◽

Vol 10 (3) ◽

pp. 229-235

Author(s):

Syaifulloh Amien Pandega Perdana ◽

Teguh Bharata Aji ◽

Ridi Ferdiana

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Sentiment Analysis ◽

Support Vector ◽

Term Weighting ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Bahasa Indonesia

Ulasan pelanggan merupakan opini terhadap kualitas barang atau jasa yang dirasakan konsumen. Ulasan pelanggan mengandung informasi yang berguna bagi konsumen maupun penyedia barang atau jasa. Ketersediaan ulasan pelanggan dalam jumlah besar pada website membutuhkan suatu framework untuk mengekstraksi sentimen secara otomatis. Sebuah ulasan pelanggan sering kali mengandung banyak aspek sehingga Aspect Based Sentiment Analysis (ABSA) harus digunakan untuk mengetahui polaritas masing-masing aspek. Salah satu tugas penting dalam ABSA adalah Aspect Category Detection. Metode machine learning untuk Aspect Category Detection sudah banyak dilakukan pada domain berbahasa Inggris, tetapi pada domain bahasa Indonesia masih sedikit. Makalah ini membandingkan kinerja tiga algoritme machine learning, yaitu Naïve Bayes (NB), Support Vector Machine (SVM), dan Random Forest (RF) pada ulasan pelanggan berbahasa Indonesia menggunakan Term Frequency–Inverse Document Frequency (TF-IDF) sebagai term weighting. Hasil menunjukkan bahwa RF memiliki kinerja paling unggul dibandingkan NB dan SVM pada tiga domain yang berbeda, yaitu restoran, hotel, dan e-commerce, dengan nilai f1-score untuk masing-masing domain adalah 84.3%, 85.7%, dan 89,3%.

Download Full-text

Sentiment Analysis of Product Reviews using Support Vector Machine Learning Algorithm

Indian Journal of Science and Technology ◽

10.17485/ijst/2017/v10i35/118965 ◽

2017 ◽

Vol 10 (35) ◽

pp. 1-9 ◽

Cited By ~ 6

Author(s):

Esha Tyagi ◽

Arvind Kumar Sharma ◽

◽

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Learning Algorithm ◽

Support Vector ◽

Machine Learning Algorithm ◽

Product Reviews

Download Full-text

Sentiment Analysis on E-commerce Product using Machine Learning and Combination of TF-IDF and Backward Elimination

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f7889.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 2862-2867

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Feature Selection ◽

Sentiment Analysis ◽

Opinion Mining ◽

Classification Performance ◽

Support Vector ◽

Product Reviews ◽

Feature Selection Technique ◽

Backward Elimination

E-commerce is a website or mobile application platform that help people to buy products. Before purchasing the product, customer will decide to buy it or not by reading the review from previous buyer. There is a problem that there are a lot of review so it will take a long time for customer to read it all. This research will be using sentiment analysis method to classify the review data. Sentiment analysis or opinion mining is a machine learning approach to classify and analyse texts or documents about human’s sentiments, emotions, and opinions. In this research, sentiment analysis was used to classify product reviews from e-commerce websites into positive or negative classes. The results could be processed further and be used to summarize customers' opinions about a certain product without reading every single review. The goal of this research is to optimize classification performance by using feature selection technique. Terms Frequency-Inverse Document Frequency (TF-IDF) feature extraction, Backward Elimination feature selection, and five different classifiers (Naïve Bayes, Support Vector Machine, K-Nearest Neighbour, Decision Tree, Random Forest) were used in analysing the sentiment of the reviews. In this research, the dataset used are Indonesian language and classified into two classes(positive and negative). The best accuracy is achieved by using TF-IDF, Backward Elimination and Support Vector Machine (SVM) with a score of 85.97%, which increases by 7.91% if compared to the process without feature selection. Based on the results, Backward Elimination feature selection succeeded in improving all performance for all classifiers used in this research.

Download Full-text

Comparing Supervised Machine Learning Strategies and Linguistic Features to Search for Very Negative Opinions

10.20944/preprints201811.0436.v1 ◽

2018 ◽

Author(s):

Sattam Almatarneh ◽

Pablo Gamallo

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Empirical Study ◽

Learning Strategies ◽

Supervised Machine Learning ◽

Support Vector ◽

Word Embeddings ◽

Linguistic Features ◽

Machine Learning Classifiers ◽

Supervised Machine Learning Classifiers

In this paper, we examine the performance of several classifiers in the process of searching for very negative opinions. More precisely, we do an empirical study that analyzes the influence of three types of linguistic features (n-grams, word embeddings, and polarity lexicons) and their combinations when they are used to feed different supervised machine learning classifiers: Support Vector Machine (SVM), Naive Bayes (NB), and Decision Tree (DT).

Download Full-text

Analisis Sentimen Wacana Pemindahan Ibu Kota Indonesia Menggunakan Algoritma Support Vector Machine (SVM)

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.0813944 ◽

2021 ◽

Vol 8 (1) ◽

pp. 147

Author(s):

Primandani Arsi ◽

Retno Waluyo

Keyword(s):

Machine Learning ◽

Social Media ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Public Discourse ◽

Capital City ◽

Support Vector ◽

The Public ◽

Machine Learning Methods ◽

Classification Technique

Dewasa ini, media sosial berkembang pesat di internet, salah satu yang banyak digemari adalah Twitter. Berbagai topik ramai diperbincangkan di Twitter mulai dari ekonomi, politik, sosial, budaya, hukum dan lain-lain. Salah satu topik yang ramai diperbincangkan di Twitter adalah terkait isu pemindahan ibu kota Indonesia. Namun dibalik hal tersebut terdapat kontroversi dari pihak yang merasa pro dan kontra, masing-masing memiiki sudut pandang yang berbeda. Hal ini menyebabkan munculnya fenomena perdebatan khususnya di Twitter yang sebenarnya menunjukkan perhatian kolektif mengenai wacana publik tersebut. Analisis sentimen adalah proses mengekstraksi, memahami dan mengolah data berupa teks yang tidak terstruktur secara otomatis guna mendapatkan informasi sentimen yang terdapat pada sebuah kalimat pendapat atau opini. Dalam penerapan analisis sentimen menggunakan metode machine learning terdapat beberapa metode yang sering digunakan. Dalam penelitian ini diusulkan metode Support Vector Machine (SVM) untuk diterapkan pada tweets topik pemindahan ibu kota Indonesia untuk tujuan klasifikasi kelas sentimen pada media sosial twitter. Teknis klasifikasi dilakukan dengan cara mengklasifikasikan menjadi 2 kelas yakni positif dan negatif. Berdasarkan hasil pengujian yang dilakukan terhadap tweets sentimen pemindahan ibu kota dari media sosial twitter sebanyak 1.236 tweets (404 positif dan 832 negatif) menggunakan SVM diperoleh akurasi =96,68%, precision=95.82%, recall=94.04% dan AUC = 0,979. AbstractToday, social media is growing fast on the internet.One of the most popular social media is Twitter. Many topics are discussed on Twitter such as economic, politic, social, culture, and law. One of the hot topics discussed on Twitter is the issue of relocating Indonesia's capital city. However, there is controversy from supporters and opponents. They have different views. This issue leads to a phenomenon of debate on Twitter that actually shows a collective concern about the public discourse. Sentiment analysis is a process of extracting, understanding and processing unstructured data to get sentiment information which is found in an opinion sentence. Application of sentiment analysis using machine learning methods shows that there are several methods that are often used. In this study, the Support Vector Machine (SVM) method is proposed to be applied to tweets on the topic of relocating Indonesia's capital city for sentiment classification on social media twitter. The classification technique is carried out into 2 classes, namely positive and negative. Based on testing on the sentiment of relocating Indonesia's capital city from social media twitter from 1,116 tweets (404 positive and 832 negative) using SVM obtained accuracy = 96.68%, precision = 95.82%, recall = 94.04% and AUC = 0.979.

Download Full-text

Comparison of Machine Learning and Sentiment Analysis in Detection of Suspicious Online Reviewers on Different Type of Data

Sensors ◽

10.3390/s22010155 ◽

2021 ◽

Vol 22 (1) ◽

pp. 155

Author(s):

Kristina Machova ◽

Marian Mach ◽

Matej Vasilko

Keyword(s):

Machine Learning ◽

Social Networks ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Online Discussions ◽

Support Vector ◽

Machine Accuracy ◽

Detection Model ◽

Machine Learning Methods ◽

The Difference

The article focuses on solving an important problem of detecting suspicious reviewers in online discussions on social networks. We have concentrated on a special type of suspicious authors, on trolls. We have used methods of machine learning for generation of detection models to discriminate a troll reviewer from a common reviewer, but also methods of sentiment analysis to recognize the sentiment typical for troll’s comments. The sentiment analysis can be provided also using machine learning or lexicon-based approach. We have used lexicon-based sentiment analysis for its better ability to detect a dictionary typical for troll authors. We have achieved Accuracy = 0.95 and F1 = 0.80 using sentiment analysis. The best results using machine learning methods were achieved by support vector machine, Accuracy = 0.986 and F1 = 0.988, using a dataset with the set of all selected attributes. We can conclude that detection model based on machine learning is more successful than lexicon-based sentiment analysis, but the difference in accuracy is not so large as in F1 measure.

Download Full-text