An Efficient Framework for Vietnamese Sentiment Classification

Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200579 ◽

2020 ◽

Author(s):

Cuong V. Nguyen ◽

Khiem H. Le ◽

Anh M. Tran ◽

Binh T. Nguyen

Keyword(s):

Product Quality ◽

Sentiment Analysis ◽

New Products ◽

Classification Problem ◽

Research Community ◽

Sentiment Classification ◽

Experimental Results ◽

Data Sets ◽

Online Retailers

With the booming development of E-commerce platforms in many counties, there is a massive amount of customers’ review data in different products and services. Understanding customers’ feedbacks in both current and new products can give online retailers the possibility to improve the product quality, meet customers’ expectations, and increase the corresponding revenue. In this paper, we investigate the Vietnamese sentiment classification problem on two datasets containing Vietnamese customers’ reviews. We propose eight different approaches, including Bi-LSTM, Bi-LSTM + Attention, Bi-GRU, Bi-GRU + Attention, Recurrent CNN, Residual CNN, Transformer, and PhoBERT, and conduct all experiments on two datasets, AIVIVN 2019 and our dataset self-collected from multiple Vietnamese e-commerce websites. The experimental results show that all our proposed methods outperform the winning solution of the competition “AIVIVN 2019 Sentiment Champion” with a significant margin. Especially, Recurrent CNN has the best performance in comparison with other algorithms in terms of both AUC (98.48%) and F1-score (93.42%) in this competition dataset and also surpasses other techniques in our dataset collected. Finally, we aim to publish our codes, and these two data-sets later to contribute to the current research community related to the field of sentiment analysis.

Download Full-text

TEXT SENTIMENT ANALYSIS BASED ON CNNS AND SVM

International Journal of Research -GRANTHAALAYAH ◽

10.29121/granthaalayah.v7.i6.2019.761 ◽

2019 ◽

Vol 7 (6) ◽

pp. 77-83 ◽

Cited By ~ 1

Author(s):

Dr. C. Arunabala ◽

P. Jwalitha ◽

Soniya Nuthalapati

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Sentiment Analysis ◽

Expressive Power ◽

Sentiment Classification ◽

Experimental Results ◽

Analysis Method ◽

Mapping Functions ◽

Generalization Ability ◽

Text Sentiment Analysis

The traditional text sentiment analysis method is mainly based on machine learning. However, its dependence on emotion dictionary construction and artificial design and extraction features makes the generalization ability limited. In contrast, depth models have more powerful expressive power, and can learn complex mapping functions from data to affective semantics better. In this paper, a Convolution Neural Networks (CNNs) model combined with SVM text sentiment analysis is proposed. The experimental results show that the proposed method improves the accuracy of text sentiment classification effectively compared with traditional CNN, and confirms the effectiveness of sentiment analysis based on CNNs and SVM

Download Full-text

On the Feature Selection and Classification Based on Information Gain for Document Sentiment Analysis

Applied Computational Intelligence and Soft Computing ◽

10.1155/2018/1407817 ◽

2018 ◽

Vol 2018 ◽

pp. 1-5 ◽

Cited By ~ 14

Author(s):

Asriyanti Indah Pratiwi ◽

Adiwijaya

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Classification Scheme ◽

Information Gain ◽

Sentiment Classification ◽

Experimental Results ◽

Enormous Number

Sentiment analysis in a movie review is the needs of today lifestyle. Unfortunately, enormous features make the sentiment of analysis slow and less sensitive. Finding the optimum feature selection and classification is still a challenge. In order to handle an enormous number of features and provide better sentiment classification, an information-based feature selection and classification are proposed. The proposed method reduces more than 90% unnecessary features while the proposed classification scheme achieves 96% accuracy of sentiment classification. From the experimental results, it can be concluded that the combination of proposed feature selection and classification achieves the best performance so far.

Download Full-text

A Statistical Parsing Framework for Sentiment Classification

Computational Linguistics ◽

10.1162/coli_a_00221 ◽

2015 ◽

Vol 41 (2) ◽

pp. 293-336 ◽

Cited By ~ 22

Author(s):

Li Dong ◽

Furu Wei ◽

Shujie Liu ◽

Ming Zhou ◽

Ke Xu

Keyword(s):

Sentiment Analysis ◽

Sentiment Classification ◽

Training Data ◽

Data Sets ◽

Syntactic Parsing ◽

Benchmark Data ◽

Sentence Level ◽

Statistical Parsing ◽

Parse Trees ◽

Context Free

We present a statistical parsing framework for sentence-level sentiment classification in this article. Unlike previous works that use syntactic parsing results for sentiment analysis, we develop a statistical parser to directly analyze the sentiment structure of a sentence. We show that complicated phenomena in sentiment analysis (e.g., negation, intensification, and contrast) can be handled the same way as simple and straightforward sentiment expressions in a unified and probabilistic way. We formulate the sentiment grammar upon Context-Free Grammars (CFGs), and provide a formal description of the sentiment parsing framework. We develop the parsing model to obtain possible sentiment parse trees for a sentence, from which the polarity model is proposed to derive the sentiment strength and polarity, and the ranking model is dedicated to selecting the best sentiment tree. We train the parser directly from examples of sentences annotated only with sentiment polarity labels but without any syntactic annotations or polarity annotations of constituents within sentences. Therefore we can obtain training data easily. In particular, we train a sentiment parser, s.parser, from a large amount of review sentences with users' ratings as rough sentiment polarity labels. Extensive experiments on existing benchmark data sets show significant improvements over baseline sentiment classification approaches.

Download Full-text

Fusing Multi-Modal Char-Level Embeddings for Chinese Sentence Sentiment Analysis

10.21203/rs.3.rs-786024/v1 ◽

2021 ◽

Author(s):

Dong Liu ◽

Caihuan Zhang ◽

Yongxin Zhang ◽

Youzhong Ma

Keyword(s):

Sentiment Analysis ◽

Chinese Character ◽

Sentiment Classification ◽

Experimental Results ◽

Visual Features ◽

Chinese Characters ◽

Writing Systems ◽

Phonetic Information ◽

Context Features

Abstract Chinese characters are one of the logographic writing systems. There is some association between semantics and structures, shape, phonetic information of Chinese characters. In this work, multi-modal Chinese character-level embeddings are extracted, including visual features, pre-trained embeddings, shapes, and phonetic information. These embedding sequences of Chinese sentences are first fed into individual Bi-LSTM networks to capture context features, and then fused into one vector for sentiment analysis. Experimental results validate that multi-modal character-level can contribute to Chinese sentence sentiment classification. And its effect on the result is analyzed by modal features ablation test.

Download Full-text

Clustering helps to improve price prediction in online booking systems

International Journal of Web Information Systems ◽

10.1108/ijwis-11-2020-0065 ◽

2021 ◽

Vol 17 (1) ◽

pp. 45-53

Author(s):

Le Hong Trang ◽

Tran Duong Huy ◽

Anh Ngoc Le

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Sentiment Analysis ◽

Design Methodology ◽

Prediction Performance ◽

Experimental Results ◽

Data Sets ◽

Classification Models ◽

Content Type ◽

Price Prediction

Purpose Pricing on the online booking systems is a difficult task for the host, the systems usually set the prices that are lower than the general premises and quality, and that only gives benefits to the system by easily attracting the customer to use the service. The setting price of the new accommodation is often based on location, the number of beds, type of house and so on. The main problem is to predict the most reasonable price for the host. This paper aims to study the use of machine learning and sentiment analysis for predicting the price of online booking systems. Design/methodology/approach In particular, an empirical study is performed first for some well-known classification models for the problems. The authors then propose to apply k-means, a clustering technique, together with Gradient Boost and XGBoost models to improve the prediction performance. Experiments are conducted and tested for real Airbnb data sets collected in London City. Findings Experimental results are given and compared to show that the authors’ method outperforms to an updated method. Originality/value The authors use k-means and sampling together with Gradient Boost and XGBoost models to improve the prediction performance.

Download Full-text

A Twin-Candidate Model for Learning-Based Anaphora Resolution

Computational Linguistics ◽

10.1162/coli.2008.07-004-r2-06-57 ◽

2008 ◽

Vol 34 (3) ◽

pp. 327-356 ◽

Cited By ~ 13

Author(s):

Xiaofeng Yang ◽

Jian Su ◽

Chew Lim Tan

Keyword(s):

Main Idea ◽

Classification Problem ◽

Learning Model ◽

Experimental Results ◽

Data Sets ◽

Coreference Resolution ◽

Anaphora Resolution ◽

Candidate Model ◽

Content Extraction ◽

Pronominal Anaphora

The traditional single-candidate learning model for anaphora resolution considers the antecedent candidates of an anaphor in isolation, and thus cannot effectively capture the preference relationships between competing candidates for its learning and resolution. To deal with this problem, we propose a twin-candidate model for anaphora resolution. The main idea behind the model is to recast anaphora resolution as a preference classification problem. Specifically, the model learns a classifier that determines the preference between competing candidates, and, during resolution, chooses the antecedent of a given anaphor based on the ranking of the candidates. We present in detail the framework of the twin-candidate model for anaphora resolution. Further, we explore how to deploy the model in the more complicated coreference resolution task. We evaluate the twin-candidate model in different domains using the Automatic Content Extraction data sets. The experimental results indicate that our twin-candidate model is superior to the single-candidate model for the task of pronominal anaphora resolution. For the task of coreference resolution, it also performs equally well, or better.

Download Full-text

Aspect Sentiment Classification with both Word-level and Clause-level Attention Networks

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/617 ◽

2018 ◽

Cited By ~ 10

Author(s):

Jingjing Wang ◽

Jie Li ◽

Shoushan Li ◽

Yangyang Kang ◽

Min Zhang ◽

...

Keyword(s):

Sentiment Analysis ◽

Sentiment Classification ◽

Experimental Results ◽

Hierarchical Network ◽

Attention Networks ◽

Word Level ◽

Sentence Level

Aspect sentiment classification, a challenging task in sentiment analysis, has been attracting more and more attention in recent years. In this paper, we highlight the need for incorporating the importance degrees of both words and clauses inside a sentence and propose a hierarchical network with both word-level and clause-level attentions to aspect sentiment classification. Specifically, we first adopt sentence-level discourse segmentation to segment a sentence into several clauses. Then, we leverage multiple Bi-directional LSTM layers to encode all clauses and propose a word-level attention layer to capture the importance degrees of words in each clause. Third and finally, we leverage another Bi-directional LSTM layer to encode the outputs from the former layers and propose a clause-level attention layer to capture the importance degrees of all the clauses inside a sentence. Experimental results on the laptop and restaurant datasets from SemEval-2015 demonstrate the effectiveness of our proposed approach to aspect sentiment classification.

Download Full-text

Pretraining and Fine-Tuning Strategies for Sentiment Analysis of Latvian Tweets

Frontiers in Artificial Intelligence and Applications - Human Language Technologies – The Baltic Perspective ◽

10.3233/faia200602 ◽

2020 ◽

Cited By ~ 1

Author(s):

Gaurish Thakkar ◽

Mārcis Pinnis

Keyword(s):

Sentiment Analysis ◽

Substantial Improvement ◽

Sentiment Classification ◽

Experimental Results ◽

Fine Tuning ◽

Classification Task ◽

Language Representation ◽

Training Strategies ◽

Fine Tune

In this paper, we present various pre-training strategies that aid in improving the accuracy of the sentiment classification task. At first, we pre-train language representation models using these strategies and then fine-tune them on the downstream task. Experimental results on a time-balanced tweet evaluation set show the improvement over the previous technique. We achieve 76% accuracy for sentiment analysis on Latvian tweets, which is a substantial improvement over previous work.

Download Full-text

An Improved Intelligent Approach to Enhance the Sentiment Classifier for Knowledge Discovery Using Machine Learning

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910999200528114552 ◽

2020 ◽

Vol 10 (4) ◽

pp. 582-593

Author(s):

Midde Venkateswarlu Naik ◽

D. Vasumathi ◽

A.P. Siva Kumar

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Global Warming ◽

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Optimization Technique ◽

Particle Swarm ◽

Sentiment Classification ◽

Support Vector ◽

Swarm Optimization

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.

Download Full-text

Improving Sentiment Analysis using Hybrid Deep Learning Model

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190328200012 ◽

2020 ◽

Vol 13 (4) ◽

pp. 627-640 ◽

Cited By ~ 1

Author(s):

Avinash Chandra Pandey ◽

Dharmveer Singh Rajpoot

Keyword(s):

Neural Network ◽

Deep Learning ◽

Sentiment Analysis ◽

Classification Accuracy ◽

Short Term Memory ◽

Computational Cost ◽

Extraction Process ◽

Learning Model ◽

Sentiment Classification ◽

Deep Learning Model

Background: Sentiment analysis is a contextual mining of text which determines viewpoint of users with respect to some sentimental topics commonly present at social networking websites. Twitter is one of the social sites where people express their opinion about any topic in the form of tweets. These tweets can be examined using various sentiment classification methods to find the opinion of users. Traditional sentiment analysis methods use manually extracted features for opinion classification. The manual feature extraction process is a complicated task since it requires predefined sentiment lexicons. On the other hand, deep learning methods automatically extract relevant features from data hence; they provide better performance and richer representation competency than the traditional methods. Objective: The main aim of this paper is to enhance the sentiment classification accuracy and to reduce the computational cost. Method: To achieve the objective, a hybrid deep learning model, based on convolution neural network and bi-directional long-short term memory neural network has been introduced. Results: The proposed sentiment classification method achieves the highest accuracy for the most of the datasets. Further, from the statistical analysis efficacy of the proposed method has been validated. Conclusion: Sentiment classification accuracy can be improved by creating veracious hybrid models. Moreover, performance can also be enhanced by tuning the hyper parameters of deep leaning models.

Download Full-text