Sentiment classification of online reviews to travel destinations by supervised machine learning approaches

2009 ◽  
Vol 36 (3) ◽  
pp. 6527-6535 ◽  
Author(s):  
Qiang Ye ◽  
Ziqiong Zhang ◽  
Rob Law
2011 ◽  
Vol 10 (06) ◽  
pp. 1097-1110 ◽  
Author(s):  
YIJUN LI ◽  
QIANG YE ◽  
ZIQIONG ZHANG ◽  
TIENAN WANG

Sentiment classification seeks to identify general attitude of a piece of text of comments or reviews on certain subject, be it positive or negative. Most existing researches on sentiment classification employ supervised learning approaches that rely on annotated data. However, sentiment is expressed differently on different subjects in different domains, and having annotated corpora for every domain of interest is not always practical. This paper proposes an unsupervised learning approach for classifying text of online reviews as recommended or not recommended. The proposed method is based on search engine snippet, summary information on the result page of a search engine. A basic assumption is that terms with similar orientation tend to co-occur. The co-occurrence is measured by utilizing snippets returned from search engines, with a query consisting of the text and a seed positive or negative word. With the information of snippets, the proposed method may estimate the association of candidate terms more accurately. This allows us to reliably predict the sentiment orientation of customer reviews. Texts of customer reviews are then classified as recommended or not recommended if the average sentiment orientations of its phrases are positive or negative. The research data set of this study consists of 600 Chinese online reviews about travel destinations retrieved from Ctrip.com. Our approach achieves an accuracy of 76.5%. Factors that influence the accuracy of the sentiment classification of Chinese online reviews were discussed.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Atika Qazi ◽  
Ram Gopal Raj ◽  
Muhammad Tahir ◽  
Erik Cambria ◽  
Karim Bux Shah Syed

Appropriate identification and classification of online reviews to satisfy the needs of current and potential users pose a critical challenge for the business environment. This paper focuses on a specific kind of reviews: the suggestive type. Suggestions have a significant influence on both consumers’ choices and designers’ understanding and, hence, they are key for tasks such as brand positioning and social media marketing. The proposed approach consists of three main steps: (1) classify comparative and suggestive sentences; (2) categorize suggestive sentences into different types, either explicit or implicit locutions; (3) perform sentiment analysis on the classified reviews. A range of supervised machine learning approaches and feature sets are evaluated to tackle the problem of suggestive opinion mining. Experimental results for all three tasks are obtained on a dataset of mobile phone reviews and demonstrate that extending a bag-of-words representation with suggestive and comparative patterns is ideal for distinguishing suggestive sentences. In particular, it is observed that classifying suggestive sentences into implicit and explicit locutions works best when using a mixed sequential rule feature representation. Sentiment analysis achieves maximum performance when employing additional preprocessing in the form of negation handling and target masking, combined with sentiment lexicons.


2017 ◽  
Author(s):  
Sabrina Jaeger ◽  
Simone Fulle ◽  
Samo Turk

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.


Sign in / Sign up

Export Citation Format

Share Document