Comparative Analysis of Supervised Learning for Sentiment Classification

There is often the need to perform sentiment classification in a particular domain where no labeled document is available. Although we could make use of a general-purpose off-the-shelf sentiment classifier or a pre-built one for a different domain, the effectiveness would be inferior. In this paper, we explore the possibility of building domain-specific sentiment classifiers with unlabeled documents only. Our investigation indicates that in the word embeddings learned from the unlabeled corpus of a given domain, the distributed word representations (vectors) for opposite sentiments form distinct clusters, though those clusters are not transferable across domains. Exploiting such a clustering structure, we are able to utilize machine learning algorithms to induce a quality domain-specific sentiment lexicon from just a few typical sentiment words (“seeds”). An important finding is that simple linear model based supervised learning algorithms (such as linear SVM) can actually work better than more sophisticated semi-supervised/transductive learning algorithms which represent the state-of-the-art technique for sentiment lexicon induction. The induced lexicon could be applied directly in a lexicon-based method for sentiment classification, but a higher performance could be achieved through a two-phase bootstrapping method which uses the induced lexicon to assign positive/negative sentiment scores to unlabeled documents first, a nd t hen u ses those documents found to have clear sentiment signals as pseudo-labeled examples to train a document sentiment classifier v ia supervised learning algorithms (such as LSTM). On several benchmark datasets for document sentiment classification, our end-to-end pipelined approach which is overall unsupervised (except for a tiny set of seed words) outperforms existing unsupervised approaches and achieves an accuracy comparable to that of fully supervised approaches.

Download Full-text

Comparative Analysis of Regression based and Supervised Learning Algorithms for Predicting Traffic Noise Levels in Indian Scenario

International Journal of Computer Applications ◽

10.5120/12965-0240 ◽

2013 ◽

Vol 74 (15) ◽

pp. 45-50

Author(s):

Prashant Ruwali ◽

Vikas Tripathi

Keyword(s):

Comparative Analysis ◽

Supervised Learning ◽

Traffic Noise ◽

Learning Algorithms ◽

Noise Levels ◽

Supervised Learning Algorithms ◽

Indian Scenario

Download Full-text

Semi-supervised Learning for Sentiment Classification using Small Number of Labeled Data

Procedia Computer Science ◽

10.1016/j.procs.2019.11.159 ◽

2019 ◽

Vol 161 ◽

pp. 577-584 ◽

Cited By ~ 1

Author(s):

Vivian Lay Shan Lee ◽

Keng Hoon Gan ◽

Tien Ping Tan ◽

Rosni Abdullah

Keyword(s):

Supervised Learning ◽

Sentiment Classification

Download Full-text

Cooperative Hybrid Semi-Supervised Learning for Text Sentiment Classification

Symmetry ◽

10.3390/sym11020133 ◽

2019 ◽

Vol 11 (2) ◽

pp. 133 ◽

Cited By ~ 2

Author(s):

Yang Li ◽

Ying Lv ◽

Suge Wang ◽

Jiye Liang ◽

Juanzi Li ◽

...

Keyword(s):

Supervised Learning ◽

Large Scale ◽

Ensemble Classifier ◽

Sentiment Classification ◽

Training Dataset ◽

Support Vector ◽

Seed Selection ◽

Training Strategy ◽

Whole Process ◽

Self Learning

A large-scale and high-quality training dataset is an important guarantee to learn an ideal classifier for text sentiment classification. However, manually constructing such a training dataset with sentiment labels is a labor-intensive and time-consuming task. Therefore, based on the idea of effectively utilizing unlabeled samples, a synthetical framework that covers the whole process of semi-supervised learning from seed selection, iterative modification of the training text set, to the co-training strategy of the classifier is proposed in this paper for text sentiment classification. To provide an important basis for selecting the seed texts and modifying the training text set, three kinds of measures—the cluster similarity degree of an unlabeled text, the cluster uncertainty degree of a pseudo-label text to a learner, and the reliability degree of a pseudo-label text to a learner—are defined. With these measures, a seed selection method based on Random Swap clustering, a hybrid modification method of the training text set based on active learning and self-learning, and an alternately co-training strategy of the ensemble classifier of the Maximum Entropy and Support Vector Machine are proposed and combined into our framework. The experimental results on three Chinese datasets (COAE2014, COAE2015, and a Hotel review, respectively) and five English datasets (Books, DVD, Electronics, Kitchen, and MR, respectively) in the real world verify the effectiveness of the proposed framework.

Download Full-text

Sentiment Classification in Under-Resourced Languages Using Graph-Based Semi-Supervised Learning Methods

IEICE Transactions on Information and Systems ◽

10.1587/transinf.e97.d.790 ◽

2014 ◽

Vol E97.D (4) ◽

pp. 790-797 ◽

Cited By ~ 4

Author(s):

Yong REN ◽

Nobuhiro KAJI ◽

Naoki YOSHINAGA ◽

Masaru KITSUREGAWA

Keyword(s):

Supervised Learning ◽

Sentiment Classification ◽

Learning Methods

Download Full-text

Research of Sentiment Classification for Tibetan Texts by Supervised Learning

International Journal of Multimedia and Ubiquitous Engineering ◽

10.14257/ijmue.2016.11.9.38 ◽

2016 ◽

Vol 9 (9) ◽

pp. 385-394

Author(s):

Lirong Qiu ◽

Zhen Zhang

Keyword(s):

Supervised Learning ◽

Sentiment Classification

Download Full-text

Sentiment Classification of Bank Clients’ Reviews Written in the Polish Language

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.353.03 ◽

2021 ◽

pp. 43-56

Author(s):

Adam Piotr Idczak

Keyword(s):

Logistic Regression ◽

Comparative Analysis ◽

Text Classification ◽

Sentiment Classification ◽

Bayes Classifier ◽

Text Documents ◽

Text Document ◽

Polish Language ◽

Common Problems

It is estimated that approximately 80% of all data gathered by companies are text documents. This article is devoted to one of the most common problems in text mining, i. e. text classification in sentiment analysis, which focuses on determining document’s sentiment. Lack of defined structure of the text makes this problem more challenging. This has led to development of various techniques used in determining document’s sentiment. In this paper the comparative analysis of two methods in sentiment classification: naive Bayes classifier and logistic regression was conducted. Analysed texts are written in Polish language and come from banks. Classification was conducted by means of bag-of-n-grams approach where text document is presented as set of terms and each term consists of n words. The results show that logistic regression performed better.

Download Full-text