Conceptual Sentiment Analysis Model

Bag-of-words approach is popularly used for Sentiment analysis. It maps the terms in the reviews to term-document vectors and thus disrupts the syntactic structure of sentences in the reviews. Association among the terms or the semantic structure of sentences is also not preserved. This research work focuses on classifying the sentiments by considering the syntactic and semantic structure of the sentences in the review. To improve accuracy, sentiment classifiers based on relative frequency, average frequency and term frequency inverse document frequency were proposed. To handle terms with apostrophe, preprocessing techniques were extended. To focus on opinionated contents, subjectivity extraction was performed at phrase level. Experiments were performed on Pang & Lees, Kaggle’s and UCI’s dataset. Classifiers were also evaluated on the UCI’s Product and Restaurant dataset. Sentiment Classification accuracy improved from 67.9% for a comparable term weighing technique, DeltaTFIDF, up to 77.2% for proposed classifiers. Inception of the proposed concept based approach, subjectivity extraction and extensions to preprocessing techniques, improved the accuracy to 93.9%.

Download Full-text

An Improved Cross-Domain Sentiment Analysis Based on a Semi-Supervised Convolutional Neural Network

10.4018/978-1-7998-8413-2.ch007 ◽

2022 ◽

pp. 155-170

Author(s):

Lap-Kei Lee ◽

Kwok Tai Chui ◽

Jingjing Wang ◽

Yin-Chun Fung ◽

Zhanhui Tan

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Sentiment Analysis ◽

Language Processing ◽

Research Work ◽

Latent Semantic Indexing ◽

Future Research ◽

Training Time ◽

Cross Domain ◽

Document Frequency

The dependence on Internet in our daily life is ever-growing, which provides opportunity to discover valuable and subjective information using advanced techniques such as natural language processing and artificial intelligence. In this chapter, the research focus is a convolutional neural network for three-class (positive, neutral, and negative) cross-domain sentiment analysis. The model is enhanced in two-fold. First, a similarity label method facilitates the management between the source and target domains to generate more labelled data. Second, term frequency-inverse document frequency (TF-IDF) and latent semantic indexing (LSI) are employed to compute the similarity between source and target domains. Performance evaluation is conducted using three datasets, beauty reviews, toys reviews, and phone reviews. The proposed method enhances the accuracy by 4.3-7.6% and reduces the training time by 50%. The limitations of the research work have been discussed, which serve as the rationales of future research directions.

Download Full-text

Sentiment Analysis with Machine Learning Methods on Social Media

ADCAIJ ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL ◽

10.14201/adcaij202093515 ◽

2020 ◽

Vol 9 (3) ◽

pp. 5-15

Author(s):

Muhammet Sinan Basarslan ◽

Fatih Kayaalp

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Support Vector ◽

Inverse Document Frequency ◽

Document Frequency ◽

The Social ◽

Vector Machines ◽

Artificial Neural ◽

Use Of The Internet ◽

Python Programming

Social media has become an important part of our everyday life due to the widespread use of the Internet. Of the social media services, Twitter is among the most used ones around the world. People share their opinions by writing tweets about numerous subjects, such as politics, sports, economy, etc. Millions of tweets per day create a huge dataset, which drew attention of the data scientists to focus on these data for sentiment analysis. The sentiment analysis focuses to identify the social media posts of users about a specific topic and categorize them as positive, negative or neutral. Thus, the study aims to investigate the effect of types of text representation on the performance of sentiment analysis. In this study, two datasets were used in the experiments. The first one is the user reviews about movies from the IMDB, which has been labeled by Kotzias, and the second one is the Twitter tweets, including the tweets of users about health topic in English in 2019, collected using the Twitter API. The Python programming language was used in the study both for implementing the classification models using the Naïve Bayes (NB), Support Vector Machines (SVM) and Artificial Neural Networks (ANN) algorithms, and for categorizing the sentiments as positive, negative and neutral. The feature extraction from the dataset was performed using Term Frequency-Inverse Document Frequency (TF-IDF) and Word2Vec (W2V) modeling techniques. The success percentages of the classification algorithms were compared at the end. According to the experimental results, Artificial Neural Network had the best accuracy performance in both datasets compared to the others.

Download Full-text

Aspect Category Classification dengan Pendekatan Machine Learning Menggunakan Dataset Bahasa Indonesia

Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI) ◽

10.22146/jnteti.v10i3.1819 ◽

2021 ◽

Vol 10 (3) ◽

pp. 229-235

Author(s):

Syaifulloh Amien Pandega Perdana ◽

Teguh Bharata Aji ◽

Ridi Ferdiana

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Sentiment Analysis ◽

Support Vector ◽

Term Weighting ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Bahasa Indonesia

Ulasan pelanggan merupakan opini terhadap kualitas barang atau jasa yang dirasakan konsumen. Ulasan pelanggan mengandung informasi yang berguna bagi konsumen maupun penyedia barang atau jasa. Ketersediaan ulasan pelanggan dalam jumlah besar pada website membutuhkan suatu framework untuk mengekstraksi sentimen secara otomatis. Sebuah ulasan pelanggan sering kali mengandung banyak aspek sehingga Aspect Based Sentiment Analysis (ABSA) harus digunakan untuk mengetahui polaritas masing-masing aspek. Salah satu tugas penting dalam ABSA adalah Aspect Category Detection. Metode machine learning untuk Aspect Category Detection sudah banyak dilakukan pada domain berbahasa Inggris, tetapi pada domain bahasa Indonesia masih sedikit. Makalah ini membandingkan kinerja tiga algoritme machine learning, yaitu Naïve Bayes (NB), Support Vector Machine (SVM), dan Random Forest (RF) pada ulasan pelanggan berbahasa Indonesia menggunakan Term Frequency–Inverse Document Frequency (TF-IDF) sebagai term weighting. Hasil menunjukkan bahwa RF memiliki kinerja paling unggul dibandingkan NB dan SVM pada tiga domain yang berbeda, yaitu restoran, hotel, dan e-commerce, dengan nilai f1-score untuk masing-masing domain adalah 84.3%, 85.7%, dan 89,3%.

Download Full-text

Transfer learning for Twitter sentiment analysis: Choosing an effective source dataset

10.5753/kdmile.2020.11972 ◽

2020 ◽

Author(s):

Eliseu Guimarães ◽

Jonnathan Carvalho ◽

Aline Paes ◽

Alexandre Plastino

Keyword(s):

Sentiment Analysis ◽

Transfer Learning ◽

Distance Metrics ◽

Learning Approaches ◽

Target Domain ◽

Social Media Data ◽

Inverse Document Frequency ◽

Source Domain ◽

Document Frequency ◽

Media Data

Sentiment analysis on social media data can be a challenging task, among other reasons, because labeled data for training is not always available. Transfer learning approaches address this problem by leveraging a labeled source domain to obtain a model for a target domain that is different but related to the source domain. However, the question that arises is how to choose proper source data for training the target classifier, which can be made considering the similarity between source and target data using distance metrics. This article investigates the relation between these distance metrics and the classifiers’ performance. For this purpose, we propose to evaluate four metrics combined with distinct dataset representations. Computational experiments, conducted in the Twitter sentiment analysis scenario, showed that the cosine similarity metric combined with bag-of-words normalized with term frequency-inverse document frequency presented the best results in terms of predictive power, outperforming even the classifiers trained with the target dataset in many cases.

Download Full-text

A Method of Product Selection Based on Online Reviews

Mobile Information Systems ◽

10.1155/2021/9656315 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Xia Liang ◽

Jie Guo ◽

Yan Sun ◽

Xiaoxiao Liu

Keyword(s):

Sentiment Analysis ◽

Rapid Development ◽

Online Reviews ◽

Product Attributes ◽

Product Selection ◽

Inverse Document Frequency ◽

Roulette Wheel Selection ◽

Document Frequency ◽

Criteria Importance

With the rapid development of information technology and market economy, global e-commerce platform develops rapidly. Recently, online reviews are widely available on e-commerce platforms to express customers’ experience of products. When ranking alternative products based on online reviews, how to make full use of the information in online reviews to represent the sentiment analysis results of online reviews is an important prerequisite for decision analysis. To this end, we propose a method for measuring the time utility and support utility of online reviews. Then a method for representing the sentiment analysis results of online reviews in the form of linguistic distribution is proposed. In addition, in view of the attributes and their weights being unknown, we propose a method for extracting product attributes from online reviews by using the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm; and the objective weights of attributes are determined through the Criteria Importance through Intercriteria Correlation (CRITIC) method. Additionally, in order to highlight the differences between the alternatives, the roulette wheel selection algorithm is first used to randomly select product attributes. Then the alternative products can be ranked by the extended Multi-Attributive Border Approximation area Comparison (MABAC) method with mixed information. Finally, we illustrate the applicability of the proposed method through a case study of selecting a 5G mobile phone and simulation experiment.

Download Full-text

Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis

Scientific Journal of Informatics ◽

10.15294/sji.v6i1.14244 ◽

2019 ◽

Vol 6 (1) ◽

pp. 138-149

Author(s):

Ukhti Ikhsani Larasati ◽

Much Aziz Muslim ◽

Riza Arifudin ◽

Alamsyah Alamsyah

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Text Mining ◽

Sentiment Analysis ◽

Feature Weighting ◽

Support Vector ◽

Chi Square ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

Data processing can be done with text mining techniques. To process large text data is required a machine to explore opinions, including positive or negative opinions. Sentiment analysis is a process that applies text mining methods. Sentiment analysis is a process that aims to determine the content of the dataset in the form of text is positive or negative. Support vector machine is one of the classification algorithms that can be used for sentiment analysis. However, support vector machine works less well on the large-sized data. In addition, in the text mining process there are constraints one is number of attributes used. With many attributes it will reduce the performance of the classifier so as to provide a low level of accuracy. The purpose of this research is to increase the support vector machine accuracy with implementation of feature selection and feature weighting. Feature selection will reduce a large number of irrelevant attributes. In this study the feature is selected based on the top value of K = 500. Once selected the relevant attributes are then performed feature weighting to calculate the weight of each attribute selected. The feature selection method used is chi square statistic and feature weighting using Term Frequency Inverse Document Frequency (TFIDF). Result of experiment using Matlab R2017b is integration of support vector machine with chi square statistic and TFIDF that uses 10 fold cross validation gives an increase of accuracy of 11.5% with the following explanation, the accuracy of the support vector machine without applying chi square statistic and TFIDF resulted in an accuracy of 68.7% and the accuracy of the support vector machine by applying chi square statistic and TFIDF resulted in an accuracy of 80.2%.

Download Full-text

Sentiment analysis of customer reviews in zomato bangalore restaurants using random forest classifier

Abstract Proceedings International Scholars Conference ◽

10.35974/isc.v7i1.1003 ◽

2019 ◽

Vol 7 (1) ◽

pp. 1831-1840

Author(s):

Bern Jonathan ◽

Jay Idoan Sihotang ◽

Stanley Martin

Keyword(s):

Natural Language Processing ◽

Random Forest ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Natural Languages ◽

Inverse Document Frequency ◽

Customer Reviews ◽

Document Frequency ◽

Split Test

Introduction: Natural Language Processing is one part of Artificial Intelligence and Machine Learning to make an understanding of the interactions between computers and human (natural) languages. Sentiment analysis is one part of Natural Language Processing, that often used to analyze words based on the patterns of people in writing to find positive, negative, or neutral sentiments. Sentiment analysis is useful for knowing how users like something or not. Zomato is an application for rating restaurants. The rating has a review of the restaurant which can be used for sentiment analysis. Based on this, writers want to discuss the sentiment of the review to be predicted. Method: The method used for preprocessing the review is to make all words lowercase, tokenization, remove numbers and punctuation, stop words, and lemmatization. Then after that, we create word to vector with the term frequency-inverse document frequency (TF-IDF). The data that we process are 150,000 reviews. After that make positive with reviews that have a rating of 3 and above, negative with reviews that have a rating of 3 and below, and neutral who have a rating of 3. The author uses Split Test, 80% Data Training and 20% Data Testing. The metrics used to determine random forest classifiers are precision, recall, and accuracy. The accuracy of this research is 92%. Result: The precision of positive, negative, and neutral sentiment is 92%, 93%, 96%. The recall of positive, negative, and neutral sentiment are 99%, 89%, 73%. Average precision and recall are 93% and 87%. The 10 words that affect the results are: “bad”, “good”, “average”, “best”, “place”, “love”, “order”, “food”, “try”, and “nice”.

Download Full-text

Sentiment Analysis Based on Deep Learning: A Comparative Study

Electronics ◽

10.3390/electronics9030483 ◽

2020 ◽

Vol 9 (3) ◽

pp. 483 ◽

Cited By ~ 12

Author(s):

Nhan Cach Dang ◽

María N. Moreno-García ◽

Fernando De la Prieta

Keyword(s):

Deep Learning ◽

Comparative Study ◽

Sentiment Analysis ◽

Language Processing ◽

Learning Models ◽

Inverse Document Frequency ◽

Document Frequency ◽

Wide Range ◽

Powerful Means ◽

Promising Solution

The study of public opinion can provide us with valuable information. The analysis of sentiment on social networks, such as Twitter or Facebook, has become a powerful means of learning about the users’ opinions and has a wide range of applications. However, the efficiency and accuracy of sentiment analysis is being hindered by the challenges encountered in natural language processing (NLP). In recent years, it has been demonstrated that deep learning models are a promising solution to the challenges of NLP. This paper reviews the latest studies that have employed deep learning to solve sentiment analysis problems, such as sentiment polarity. Models using term frequency-inverse document frequency (TF-IDF) and word embedding have been applied to a series of datasets. Finally, a comparative study has been conducted on the experimental results obtained for the different models and input features.

Download Full-text

A comparative study of sentiment analysis using SVM and SentiWordNet

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i3.pp902-909 ◽

2019 ◽

Vol 13 (3) ◽

pp. 902 ◽

Cited By ~ 7

Author(s):

Mohammad Fikri ◽

Riyanarto Sarno

Keyword(s):

Sentiment Analysis ◽

Extraction Method ◽

Support Vector ◽

The Internet ◽

Imbalanced Dataset ◽

Rule Based ◽

Inverse Document Frequency ◽

Feature Extraction Method ◽

Document Frequency ◽

Svm Algorithm

Sentiment analysis has grown rapidly which impact on the number of services using the internet popping up in Indonesia. In this research, the sentiment analysis uses the rule-based method with the help of SentiWordNet and Support Vector Machine (SVM) algorithm with Term Frequency–Inverse Document Frequency (TF-IDF) as feature extraction method. Since the number of sentences in positive, negative and neutral classes is imbalanced, the oversampling method is implemented. For imbalanced dataset, the rule-based SentiWordNet and SVM algorithm achieve accuracies of 56% and 76%, respectively. However, for the balanced dataset, the rule-based SentiWordNet and SVM algorithm achieve accuracies of 52% and 89%, respectively.

Download Full-text

Disquisition of Sentiment Inquiry with Hashing and Counting Vectorizer using Machine Learning Classification

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a4220.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 737-743

Keyword(s):

Sentiment Analysis ◽

Data Cleaning ◽

Bayes Classifier ◽

Frequency Method ◽

Inverse Document Frequency ◽

Machine Learning Classification ◽

Review Analysis ◽

Document Frequency ◽

Resampled Dataset ◽

Logistic Regression Classifier

With the rapid growth in technology, analysis of feedback and reviews by the customers in companies and industries becomes a major challenge. The profit of the company mainly depends on the customer satisfaction. The view of the customer can be analyzed only through feedback. The review analysis can be utilized for the prediction of current sales and future sales of the company. With this overview, the paper aims in performing the sentiment analysis of the movie review. The Type of comment given by the customer is predicted and categorized into classes. The sentiment Analysis on movie Review dataset taken from the KAGGLE leading Dataset repository is used for implementation. The categorization of sentiment classes is achieved in five categories. Firstly, the target count for each sentiment is portrayed. The Resampling is done for equalizing the target sentiment count. Secondly, the extraction of sentiment feature words for each target is displayed and the data cleaning is done with Term Frequency Inverse document Frequency method. Thirdly, the resampled dataset is then fitted with the various classifiers like Multinomial Naives Bayes Classifier, Logistic Regression Classifier, KNearest Neighbors Classifier, Bernoulli Naives Bayes Classifier, Complement Naives Bayes Classifier, Nearest Centroid Classifer, Passive Aggressive Classifier, SGD Classifier, Ridge Classifier, Perceptron Classifier. Fourth, the feature extraction is done with Hashing Vectorizer and Counting Vectorizer. The vocabulary features are also displayed from the dataset. Fifth, the Performance analysis of clasifier is done with metrics like Accuracy, Recall, FScore and Precision. The implementation is carried out using python code in Spyder Anaconda Navigator IP Console. Experimental results shows that the sentiment prediction and classification done by Ridge classifier is found to be effective with Precision of 0.89, Recall of 0.88, FScore of 0.87 and Accuracy of 89%.

Download Full-text