scholarly journals Bootstrap Domain-Specific Sentiment Classifiers from Unlabeled Corpora

2018 ◽  
Vol 6 ◽  
pp. 269-285 ◽  
Author(s):  
Andrius Mudinas ◽  
Dell Zhang ◽  
Mark Levene

There is often the need to perform sentiment classification in a particular domain where no labeled document is available. Although we could make use of a general-purpose off-the-shelf sentiment classifier or a pre-built one for a different domain, the effectiveness would be inferior. In this paper, we explore the possibility of building domain-specific sentiment classifiers with unlabeled documents only. Our investigation indicates that in the word embeddings learned from the unlabeled corpus of a given domain, the distributed word representations (vectors) for opposite sentiments form distinct clusters, though those clusters are not transferable across domains. Exploiting such a clustering structure, we are able to utilize machine learning algorithms to induce a quality domain-specific sentiment lexicon from just a few typical sentiment words (“seeds”). An important finding is that simple linear model based supervised learning algorithms (such as linear SVM) can actually work better than more sophisticated semi-supervised/transductive learning algorithms which represent the state-of-the-art technique for sentiment lexicon induction. The induced lexicon could be applied directly in a lexicon-based method for sentiment classification, but a higher performance could be achieved through a two-phase bootstrapping method which uses the induced lexicon to assign positive/negative sentiment scores to unlabeled documents first, a nd t hen u ses those documents found to have clear sentiment signals as pseudo-labeled examples to train a document sentiment classifier v ia supervised learning algorithms (such as LSTM). On several benchmark datasets for document sentiment classification, our end-to-end pipelined approach which is overall unsupervised (except for a tiny set of seed words) outperforms existing unsupervised approaches and achieves an accuracy comparable to that of fully supervised approaches.

2021 ◽  
Vol 26 (5) ◽  
pp. 501-506
Author(s):  
Anuj Kumar Singh ◽  
Sandeep Kumar ◽  
Shashi Bhushan ◽  
Pramod Kumar ◽  
Arun Vashishtha

When anyone is looking to enroll for a freely available online course so the first and famous name comes in front of the searcher is MOOC courses. So here in this article our focus is to collect the comments by enrolled users for the specified MOOC course and apply sentiment analysis over that data. The significance of our article is to introduce a proficient sentiment analysis algorithm with high perceptive execution in MOOC courses, by seeking after the standards of gathering various supervised learning methods where the performance of various supervised machine learning algorithms in performing sentiment analysis of MOOC data. Some research questions have been addressed on sentiment analysis of MOOC data. For the assessment task, we have investigated a large no of MOOC courses, with the different Supervised Learning methods and calculated accuracy of the data by using parameters such as Precision, Recall and F1 Score. From the results we can conclude that when the bigram model was applied to the logistic regression, the Multilayer Perceptron (MLP) overcomes the accuracy by other algorithms as SVM, Naive Bayes and achieved an accuracy of 92.44 percent. To determine the sentiment polarity of a sentence, the suggested method use term frequency (No of Positive, Negative terms in the text) to calculate the sentiment polarity of the text. We use a logistic regression Function to predict the sentiment classification accuracy of positive and negative comments from the data.


2012 ◽  
pp. 695-703
Author(s):  
George Tzanis ◽  
Christos Berberidis ◽  
Ioannis Vlahavas

Machine learning is one of the oldest subfields of artificial intelligence and is concerned with the design and development of computational systems that can adapt themselves and learn. The most common machine learning algorithms can be either supervised or unsupervised. Supervised learning algorithms generate a function that maps inputs to desired outputs, based on a set of examples with known output (labeled examples). Unsupervised learning algorithms find patterns and relationships over a given set of inputs (unlabeled examples). Other categories of machine learning are semi-supervised learning, where an algorithm uses both labeled and unlabeled examples, and reinforcement learning, where an algorithm learns a policy of how to act given an observation of the world.


Author(s):  
George Tzanis ◽  
Christos Berberidis ◽  
Ioannis Vlahavas

Machine learning is one of the oldest subfields of artificial intelligence and is concerned with the design and development of computational systems that can adapt themselves and learn. The most common machine learning algorithms can be either supervised or unsupervised. Supervised learning algorithms generate a function that maps inputs to desired outputs, based on a set of examples with known output (labeled examples). Unsupervised learning algorithms find patterns and relationships over a given set of inputs (unlabeled examples). Other categories of machine learning are semi-supervised learning, where an algorithm uses both labeled and unlabeled examples, and reinforcement learning, where an algorithm learns a policy of how to act given an observation of the world.


2020 ◽  
Vol 1 (2) ◽  
pp. 1-4
Author(s):  
Priyam Guha ◽  
Abhishek Mukherjee ◽  
Abhishek Verma

This research paper deals with using supervised machine learning algorithms to detect authenticity of bank notes. In this research we were successful in achieving very high accuracy (of the order of 99%) by applying some data preprocessing tricks and then running the processed data on supervised learning algorithms like SVM, Decision Trees, Logistic Regression, KNN. We then proceed to analyze the misclassified points. We examine the confusion matrix to find out which algorithms had more number of false positives and which algorithm had more number of False negatives. This research paper deals with using supervised machine learning algorithms to detect authenticity of bank notes. In this research we were successful in achieving very high accuracy (of the order of 99%) by applying some data preprocessing tricks and then running the processed data on supervised learning algorithms like SVM, Decision Trees, Logistic Regression, KNN. We then proceed to analyze the misclassified points. We examine the confusion matrix to find out which algorithms had more number of false positives and which algorithm had more number of False negatives.


2015 ◽  
Vol 28 (6) ◽  
pp. 570-600 ◽  
Author(s):  
Grant Duwe ◽  
KiDeuk Kim

Recent research has produced mixed results as to whether newer machine learning algorithms outperform older, more traditional methods such as logistic regression in predicting recidivism. In this study, we compared the performance of 12 supervised learning algorithms to predict recidivism among offenders released from Minnesota prisons. Using multiple predictive validity metrics, we assessed the performance of these algorithms across varying sample sizes, recidivism base rates, and number of predictors in the data set. The newer machine learning algorithms generally yielded better predictive validity results. LogitBoost had the best overall performance, followed by Random forests, MultiBoosting, bagged trees, and logistic model trees. Still, the gap between the best and worst algorithms was relatively modest, and none of the methods performed the best in each of the 10 scenarios we examined. The results suggest that multiple methods, including machine learning algorithms, should be considered in the development of recidivism risk assessment instruments.


Author(s):  
M. Govindarajan

Big data mining involves knowledge discovery from these large data sets. The purpose of this chapter is to provide an analysis of different machine learning algorithms available for performing big data analytics. The machine learning algorithms are categorized in three key categories, namely, supervised, unsupervised, and semi-supervised machine learning algorithm. The supervised learning algorithms are trained with a complete set of data, and thus, the supervised learning algorithms are used to predict/forecast. Example algorithms include logistic regression and the back propagation neural network. The unsupervised learning algorithms starts learning from scratch, and therefore, the unsupervised learning algorithms are used for clustering. Example algorithms include: the Apriori algorithm and K-Means. The semi-supervised learning combines both supervised and unsupervised learning algorithms. The semi-supervised algorithms are trained, and the algorithms also include non-trained learning.


Sign in / Sign up

Export Citation Format

Share Document