Predictive aspect-based sentiment classification of online tourist reviews

2018 ◽  
Vol 45 (3) ◽  
pp. 341-363 ◽  
Author(s):  
Muhammad Afzaal ◽  
Muhammad Usman ◽  
Alvis Fong

With the increase of online tourists reviews, discovering sentimental idea regarding a tourist place through the posted reviews is becoming a challenging task. The presence of various aspects discussed in user reviews makes it even harder to accurately extract and classify the sentiments. Aspect-based sentiment analysis aims to extract and classify user’s positive or negative orientation towards each aspect. Although several aspect-based sentiment classification methods have been proposed in the past, limited work has been targeted towards the automatic extraction of implicit, infrequent and co-referential aspects. Moreover, existing methods lack the ability to accurately classify the overall polarity of multi-aspect sentiments. This study aims to develop a predictive framework for aspect-based extraction and classification. The proposed framework utilises the semantic relations among review phrases to extract implicit and infrequent aspects for accurate sentiment predictions. Experiments have been performed using real-world data sets crawled from predominant tourist websites such as TripAdvisor and OpenTable. Experimental results and comparison with previously reported findings prove that the predictive framework not only extracts the aspects effectively but also improves the prediction accuracy of aspects.

2013 ◽  
Vol 765-767 ◽  
pp. 1441-1445
Author(s):  
Jia Jun Cheng ◽  
Xin Zhang ◽  
Peng Yi Fan ◽  
Pei Li ◽  
Hui Wang

Chinese microblogging texts are always short and casual, which bring some troubles to the traditional sentiment classification methods based on learning. To overcome this problem, we use a rule-based approach to classify the sentiment of Chinese microblogging texts. According to the characteristics of Chinese microblogging texts, we construct a thesaurus of subjective words for it, summarize the basic semantic rules expressing emotion and propose a rule-based approach to sentiment classification of Chinese microblogging texts. Finally, we compare our approach with a SVM-based approach. Our rule-based approach achieves an accuracy of 0.865, which is better than that of SVM-based approach.


2019 ◽  
Vol 8 (3) ◽  
pp. 7071-7081

Current generation real-world data sets processed through machine learning are imbalanced by nature. This imbalanced data enables the researchers with a challenging scenario in the context of perdition for both the machine learning and data mining algorithms. It is observed from the past research studies most of the imbalanced data sets consists of the major classes and minor classes and the major class leads the minor class. Several standards and hybrid prediction algorithms are proposed in various application domains but in most of the real-time data sets analyzed in the studies are imbalanced by nature thereby affecting the accuracy of the prediction. This paper presents a systematic survey of the past research studies to analyze intrinsic data characteristics and techniques utilized for handling class-imbalanced data. In addition, this study reveals the research gaps, trends and patterns in existing studies and discusses briefly on future research directions


1994 ◽  
Vol 45 (6) ◽  
pp. 945 ◽  
Author(s):  
R Marchant ◽  
LA Barmuta ◽  
BC Chessman

Data on undisturbed lotic macroinvertebrate communities were assembled from a number of studies carried out in Victoria over the past 15 years; species-level information for 40 sites on nine rivers was available. Ordination (DECORANA and semi-strong hybrid multidimensional scaling) and classification (flexible UPGMA and TWINSPAN) techniques were used to assess the similarity of community composition among the sites. Correlation of environmental variables with both ordinations indicated that factors related to altitude and substratum were the most obvious gradients; a conductivity gradient was also present. The classification analyses identified four groups of sites that matched the altitudinal trends evident in the ordinations; but these techniques did not emphasize the substratum gradient. TWINSPAN also identified six groups of taxa that were characteristic of particular altitudes or regions or were widespread across all sites. The distinctiveness of the patterns from this preliminary study indicates that it would be worthwhile extending these analyses to much larger data sets from Victorian rivers.


2020 ◽  
Vol 48 (3) ◽  
pp. 117-128
Author(s):  
Barkha Bansal ◽  
Sangeet Srivastava

Purpose Aspect based sentiment classification is valuable for providing deeper insight into online consumer reviews (OCR). However, the majority of the previous studies explicitly determine the orientation of aspect related sentiment bearing word and overlook the aspect-context. Therefore, this paper aims to propose an aspect-context aware sentiment classification of OCR for deeper and more accurate insights. Design/methodology/approach In the proposed methodology, first, aspect descriptions and sentiment bearing words are extracted. Then, the skip-gram model is used to extract the first set of features to capture contextual information. For the second category of features, cosine similarity is used between a pre-defined seed word list and aspects, to capture aspect context sensitive sentiments. The third set of features includes weighted word vectors using term frequency-inverse document frequency. After concatenating features, ensemble classifier is used using three base classifiers. Findings Experimental results on two real-world data sets with variable lengths, acquired from Amazon.com and TripAdvisor.com, show that the advised ensemble approach significantly outperforms sentiment classification accuracy of state-of-the-art and baseline methods. Originality/value This method is capable of capturing the correct sentiment of ambiguous words and other special words by extracting aspect-context using word vector similarity instead of expensive lexical resources, and hence, shows superior performance in terms of accuracy as compared to other methods.


2014 ◽  
Vol 511-512 ◽  
pp. 871-874
Author(s):  
Hong Yan Zuo ◽  
Zhou Quan Luo ◽  
Chao Wu

A novel Mamdani fuzzy classifier based on improved chaos immune algorithm is developed, in which bilateral Gaussian membership function parameters are set as constraint conditions and the indexes of fuzzy classification effectiveness and number of correct samples of fuzzy classification as the subgoal of fitness function. Moreover, Iris database is used for classification effectiveness simulation experiment. The results show that Mamdani fuzzy classifier based on improved chaos immune algorithm can effectively improve the prediction accuracy of classification of data sets with noises and outliers.


2020 ◽  
Vol 34 (04) ◽  
pp. 4691-4698
Author(s):  
Shu Li ◽  
Wen-Tao Li ◽  
Wei Wang

In many real-world applications, the data have several disjoint sets of features and each set is called as a view. Researchers have developed many multi-view learning methods in the past decade. In this paper, we bring Graph Convolutional Network (GCN) into multi-view learning and propose a novel multi-view semi-supervised learning method Co-GCN by adaptively exploiting the graph information from the multiple views with combined Laplacians. Experimental results on real-world data sets verify that Co-GCN can achieve better performance compared with state-of-the-art multi-view semi-supervised methods.


Author(s):  
Titus Josef Brinker ◽  
Achim Hekler ◽  
Jochen Sven Utikal ◽  
Dirk Schadendorf ◽  
Carola Berking ◽  
...  

BACKGROUND State-of-the-art classifiers based on convolutional neural networks (CNNs) generally outperform the diagnosis of dermatologists and could enable life-saving and fast diagnoses, even outside the hospital via installation on mobile devices. To our knowledge, at present, there is no review of the current work in this research area. OBJECTIVE This study presents the first systematic review of the state-of-the-art research on classifying skin lesions with CNNs. We limit our review to skin lesion classifiers. In particular, methods that apply a CNN only for segmentation or for the classification of dermoscopic patterns are not considered here. Furthermore, this study discusses why the comparability of the presented procedures is very difficult and which challenges must be addressed in the future. METHODS We searched the Google Scholar, PubMed, Medline, Science Direct, and Web of Science databases for systematic reviews and original research articles published in English. Only papers that reported sufficient scientific proceedings are included in this review. RESULTS We found 13 papers that classified skin lesions using CNNs. In principle, classification methods can be differentiated according to three principles. Approaches that use a CNN already trained by means of another large data set and then optimize its parameters to the classification of skin lesions are both the most common methods as well as display the best performance with the currently available limited data sets. CONCLUSIONS CNNs display a high performance as state-of-the-art skin lesion classifiers. Unfortunately, it is difficult to compare different classification methods because some approaches use non-public data sets for training and/or testing, thereby making reproducibility difficult.


2020 ◽  
Vol 34 (04) ◽  
pp. 6430-6437 ◽  
Author(s):  
Xingyu Wu ◽  
Bingbing Jiang ◽  
Kui Yu ◽  
Huanhuan Chen ◽  
Chunyan Miao

Multi-label feature selection has received considerable attentions during the past decade. However, existing algorithms do not attempt to uncover the underlying causal mechanism, and individually solve different types of variable relationships, ignoring the mutual effects between them. Furthermore, these algorithms lack of interpretability, which can only select features for all labels, but cannot explain the correlation between a selected feature and a certain label. To address these problems, in this paper, we theoretically study the causal relationships in multi-label data, and propose a novel Markov blanket based multi-label causal feature selection (MB-MCF) algorithm. MB-MCF mines the causal mechanism of labels and features first, to obtain a complete representation of information about labels. Based on the causal relationships, MB-MCF then selects predictive features and simultaneously distinguishes common features shared by multiple labels and label-specific features owned by single labels. Experiments on real-world data sets validate that MB-MCF could automatically determine the number of selected features and simultaneously achieve the best performance compared with state-of-the-art methods. An experiment in Emotions data set further demonstrates the interpretability of MB-MCF.


Sign in / Sign up

Export Citation Format

Share Document