scholarly journals A Hybrid Approach for Sarcasm Detection

2019 ◽  
Vol 1 (1) ◽  
pp. 1-9
Author(s):  
S. Luintel ◽  
R.K. Sah ◽  
B.R. Lamichhane

There is an excessive growth in user generated textual data due to increment in internet and social media users which includes enormous amount of sarcastic words, emoji, sentences. Sarcasm is a nuanced form of communication where individual states opposite of what is implied which is done in order to insult someone, to show irritation, or to be funny. Sarcasm is considered as one of the most difficult problems in sentiment analysis due to its ambiguous nature. Recognizing sarcasm in the texts can promote many sentiment analysis and text summarization applications. So for addressing the problem of sarcasm many steps have been adopted for sarcasm detection. Different preprocessing techniques such as Hypertext markup language removal, stop words removal, etc. have been done. Similarly, conversion of the emoji and smileys into their textual equivalent has been performed. Most frequent features has been selected and a hybrid cascade and hybrid weighted average approaches which are the combinations of the algorithms random forest, naïve Bayes and support vector machine have been used for sarcasm detection. The comparison of these two approaches on different basis has been done which has shown cascade outperformed weighted approach. Moreover, comparison of cascade approaches in terms of the algorithm placement has also been performed in which random forest has proved to be the best.

Author(s):  
Taynan Ferreira ◽  
Francisco Paiva ◽  
Roberto Silva ◽  
Angel Paula ◽  
Anna Costa ◽  
...  

Sentiment analysis (SA) is increasing its importance due to the enormous amount of opinionated textual data available today. Most of the researches have investigated different models, feature representation and hyperparameters in SA classification tasks. However, few studies were conducted to evaluate the impact of these features on regression SA tasks. In this paper, we conduct such assessment on a financial domain data set by investigating different feature representations and hyperparameters in two important models -- Support Vector Regression (SVR) and Convolution Neural Networks (CNN). We conclude presenting the most relevant feature representations and hyperparameters and how they impact outcomes on a regression SA task.


2018 ◽  
Vol 127 ◽  
pp. 511-520 ◽  
Author(s):  
Yassine Al Amrani ◽  
Mohamed Lazaar ◽  
Kamal Eddine El Kadiri

2019 ◽  
Vol 11 (2) ◽  
pp. 144
Author(s):  
Danar Wido Seno ◽  
Arief Wibowo

Social media writing content growing make a lot of new words that appear on Twitter in the form of words and abbreviations that appear so that sentiment analysis is increasingly difficult to get high accuracy of textual data on Twitter social media. In this study, the authors conducted research on sentiment analysis of the pairs of candidates for President and Vice President of Indonesia in the 2019 Elections. To obtain higher accuracy results and accommodate the problem of textual data development on Twitter, the authors conducted a combination of methods to conduct the sentiment analysis with unsupervised and supervised methods. namely Lexicon Based. This study used Twitter data in October 2018 using the search keywords with the names of each pair of candidates for President and Vice President of the 2019 Elections totaling 800 datasets. From the study with 800 datasets the best accuracy was obtained with a value of 92.5% with 80% training data composition and 20% testing data with a Precision value in each class between 85.7% - 97.2% and Recall value for each class among 78, 2% - 93.5%. With the Lexicon Based method as a labeling dataset, the process of labeling the Support Vector Machine dataset is no longer done manually but is processed by the Lexicon Based method and the dictionary on the lexicon can be added along with the development of data content on Twitter social media.


2020 ◽  
Vol 11 (2) ◽  
pp. 66-81
Author(s):  
Badia Klouche ◽  
Sidi Mohamed Benslimane ◽  
Sakina Rim Bennabi

Sentiment analysis is one of the recent areas of emerging research in the classification of sentiment polarity and text mining, particularly with the considerable number of opinions available on social media. The Algerian Operator Telephone Ooredoo, as other operators, deploys in its new strategy to conquer new customers, by exploiting their opinions through a sentiments analysis. The purpose of this work is to set up a system called “Ooredoo Rayek”, whose objective is to collect, transliterate, translate and classify the textual data expressed by the Ooredoo operator's customers. This article developed a set of rules allowing the transliteration from Algerian Arabizi to Algerian dialect. Furthermore, the authors used Naïve Bayes (NB) and (Support Vector Machine) SVM classifiers to assign polarity tags to Facebook comments from the official pages of Ooredoo written in multilingual and multi-dialect context. Experimental results show that the system obtains good performance with 83% of accuracy.


Chronic Kidney Disease (CKD) is a worldwide concern that influences roughly 10% of the grown-up population on the world. For most of the people the early diagnosis of CKD is often not possible. Therefore, the utilization of present-day Computer aided supported strategies is important to help the conventional CKD finding framework to be progressively effective and precise. In this project, six modern machine learning techniques namely Multilayer Perceptron Neural Network, Support Vector Machine, Naïve Bayes, K-Nearest Neighbor, Decision Tree, Logistic regression were used and then to enhance the performance of the model Ensemble Algorithms such as ADABoost, Gradient Boosting, Random Forest, Majority Voting, Bagging and Weighted Average were used on the Chronic Kidney Disease dataset from the UCI Repository. The model was tuned finely to get the best hyper parameters to train the model. The performance metrics used to evaluate the model was measured using Accuracy, Precision, Recall, F1-score, Mathew`s Correlation Coefficient and ROC-AUC curve. The experiment was first performed on the individual classifiers and then on the Ensemble classifiers. The ensemble classifier like Random Forest and ADABoost performed better with 100% Accuracy, Precision and Recall when compared to the individual classifiers with 99.16% accuracy, 98.8% Precision and 100% Recall obtained from Decision Tree Algorithm


Author(s):  
Syaifulloh Amien Pandega Perdana ◽  
Teguh Bharata Aji ◽  
Ridi Ferdiana

Ulasan pelanggan merupakan opini terhadap kualitas barang atau jasa yang dirasakan konsumen. Ulasan pelanggan mengandung informasi yang berguna bagi konsumen maupun penyedia barang atau jasa. Ketersediaan ulasan pelanggan dalam jumlah besar pada website membutuhkan suatu framework untuk mengekstraksi sentimen secara otomatis. Sebuah ulasan pelanggan sering kali mengandung banyak aspek sehingga Aspect Based Sentiment Analysis (ABSA) harus digunakan untuk mengetahui polaritas masing-masing aspek. Salah satu tugas penting dalam ABSA adalah Aspect Category Detection. Metode machine learning untuk Aspect Category Detection sudah banyak dilakukan pada domain berbahasa Inggris, tetapi pada domain bahasa Indonesia masih sedikit. Makalah ini membandingkan kinerja tiga algoritme machine learning, yaitu Naïve Bayes (NB), Support Vector Machine (SVM), dan Random Forest (RF) pada ulasan pelanggan berbahasa Indonesia menggunakan Term Frequency–Inverse Document Frequency (TF-IDF) sebagai term weighting. Hasil menunjukkan bahwa RF memiliki kinerja paling unggul dibandingkan NB dan SVM pada tiga domain yang berbeda, yaitu restoran, hotel, dan e-commerce, dengan nilai f1-score untuk masing-masing domain adalah 84.3%, 85.7%, dan 89,3%.


Multiple sclerosis (MS) is among the world’s most common neurologic disorder. Severity classification of MS disease is necessary for treatment and medication dosage decisions and to understand the disease progression. To the best of authors’ knowledge, this is the first study for the severity classification of MS disease. In this study, Rough set (RS) approach is applied to discern the three classes (mild, moderate, and severe) of the severity of MS disease. Furthermore, the performance of the RS approach is compared with Machine learning (ML) classifiers namely, random forest, K-nearest neighbour, and support vector machine. The performance is evaluated on the dataset acquired from Multiple sclerosis outcome assessments consortium (MSOAC), Arizona, US. The weighted average accuracy, precision, recall, and specificity values for the RS approach are found to be 84.04%, 76.99%, 76.75%, and 83.84% respectively. However, among the ML classifiers, the performance of random forest classifier is found best for which the weighted average accuracy, precision, recall, and specificity values are 62.19 %, 52.65 %, 56.84 %, and 59.87 % respectively. The RS approach is found much superior to ML classifiers and may be used for MS disease severity classification. This study may be helpful for the clinicians to assess the severity of the MS patients and to take medication and dosage decisions.


2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

This research presents a way of feature selection problem for classification of sentiments that use ensemble-based classifier. This includes a hybrid approach of minimum redundancy and maximum relevance (mRMR) technique and Forest Optimization Algorithm (FOA) (i.e. mRMR-FOA) based feature selection. Before applying the FOA on sentiment analysis, it has been used as feature selection technique applied on 10 different classification datasets publically available on UCI machine learning repository. The classifiers for example k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and Naïve Bayes used the ensemble based algorithm for available datasets. The mRMR-FOA uses the Blitzer’s dataset (customer reviews on electronic products survey) to select the significant features. The classification of sentiments has noticed to improve by 12 to 18%. The evaluated results are further enhanced by the ensemble of k-NN, NB and SVM with an accuracy of 88.47% for the classification of sentiment analysis task.


Author(s):  
Hendri Murfi ◽  
Furida Lusi Siagian ◽  
Yudi Satria

Purpose The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets. Design/methodology/approach Given Indonesian tweets, the processes of sentiment analysis start by extracting features from the tweets. The features are words or topics. The authors use non-negative matrix factorization to extract the topics and apply a support vector machine to classify the tweets into its sentiment class. Findings The authors analyze the accuracy using the two-class and three-class sentiment analysis data sets. Both data sets are about sentiments of candidates for Indonesian presidential election. The experiments show that the standard word features give better accuracies than the topics features for the two-class sentiment analysis. Moreover, the topic features can slightly improve the accuracy of the standard word features. The topic features can also improve the accuracy of the standard word features for the three-class sentiment analysis. Originality/value The standard textual data representation for sentiment analysis using machine learning is bag of word and its extensions mainly created by natural language processing. This paper applies topics as novel features for the machine learning-based sentiment analysis in Indonesian tweets.


Sign in / Sign up

Export Citation Format

Share Document