Evaluating the Impact of Feature Selection on Overall Performance of Sentiment Analysis

AbstractPurposeOnline reviews on tourism attractions provide important references for potential tourists to choose tourism spots. The main goal of this study is conducting sentiment analysis to facilitate users comprehending the large scale of the reviews, based on the comments about Chinese attractions from Japanese tourism website 4Travel.Design/methodology/approachDifferent statistics- and rule-based methods are used to analyze the sentiment of the reviews. Three groups of novel statistics-based methods combining feature selection functions and the traditional term frequency-inverse document frequency (TF-IDF) method are proposed. We also make seven groups of different rules-based methods. The macro-average and micro-average values for the best classification results of the methods are calculated respectively and the performance of the methods are shown.FindingsWe compare the statistics-based and rule-based methods separately and compare the overall performance of the two method. According to the results, it is concluded that the combination of feature selection functions and weightings can strongly improve the overall performance. The emotional vocabulary in the field of tourism (EVT), kaomojis, negative and transitional words can notably improve the performance in all of three categories. The rule-based methods outperform the statistics-based ones with a narrow advantage.Research limitationTwo limitations can be addressed: 1) the empirical studies to verify the validity of the proposed methods are only conducted on Japanese languages; and 2) the deep learning technology is not been incorporated in the methods.Practical implicationsThe results help to elucidate the intrinsic characteristics of the Japanese language and the influence on sentiment analysis. These findings also provide practical usage guidelines within the field of sentiment analysis of Japanese online tourism reviews.Originality/valueOur research is of practicability. Currently, there are no studies that focus on the sentiment analysis of Japanese reviews about Chinese attractions.

Download Full-text

The impact of the 1993 Colombian health sector reform on the overall performance of the health system

10.25148/etd.fi15101359 ◽

2002 ◽

Author(s):

Jesus D. Felizzola

Keyword(s):

Health System ◽

Health Sector ◽

Health Sector Reform ◽

Sector Reform ◽

Overall Performance ◽

The Impact

Download Full-text

Sentiment Analysis of Movie Reviews: A Study of Machine Learning Algorithms with Various Feature Selection Methods

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i9.113121 ◽

2017 ◽

Vol 5 (9) ◽

Cited By ~ 1

Author(s):

Rajwinder Kaur

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Selection Methods

Download Full-text

An Improved Intelligent Approach to Enhance the Sentiment Classifier for Knowledge Discovery Using Machine Learning

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910999200528114552 ◽

2020 ◽

Vol 10 (4) ◽

pp. 582-593

Author(s):

Midde Venkateswarlu Naik ◽

D. Vasumathi ◽

A.P. Siva Kumar

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Global Warming ◽

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Optimization Technique ◽

Particle Swarm ◽

Sentiment Classification ◽

Support Vector ◽

Swarm Optimization

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.

Download Full-text

The Impact of Bar Service Operation Practices on Organizational Performance in Indian Hotel Industry

International Journal of Hospitality and Tourism Systems ◽

10.21863/ijhts/2015.8.1.007 ◽

2015 ◽

Vol 8 (1) ◽

Author(s):

Mohinder C. Dhiman ◽

Abhishek Ghai

Keyword(s):

Organizational Performance ◽

Hotel Industry ◽

Demographic Variables ◽

Study Results ◽

Service Operation ◽

Managerial Implications ◽

Overall Performance ◽

The Impact ◽

The Relationship ◽

Bivariate Test

The paper has a two fold purpose - examine the impact of bar service operation practices (BSOP) on organizational performance (OP) and study the relationship between organizational performance and demographic variables. Based on a survey of 362 bar managers perceptions on the impact of bar service operation practices on organizational performance were assessed by 59 practices and 6 demographic variables. Bivariate test and ANOVA were employed to test the working hypothesis in the study. Results indicated that there is a positive relationship between the bar service operation practices and organizational performance. Further, the results indicate some practical and managerial implications to improve organizational overall performance.

Download Full-text

Feature Selection Methods in Sentiment Analysis

Proceedings of the 3rd International Conference on Networking, Information Systems & Security ◽

10.1145/3386723.3387840 ◽

2020 ◽

Author(s):

Nurilhami Izzatie Khairi ◽

Azlinah Mohamed ◽

Nor Nadiah Yusof

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Selection Methods

Download Full-text

The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

Procedia Computer Science ◽

10.1016/j.procs.2021.03.026 ◽

2021 ◽

Vol 184 ◽

pp. 148-155

Author(s):

Abdul Munem Nerabie ◽

Manar AlKhatib ◽

Sujith Samuel Mathew ◽

May El Barachi ◽

Farhad Oroumchian

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Learning Approach ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

The Impact ◽

Speech Tagging

Download Full-text

Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain

Applied Sciences ◽

10.3390/app11020796 ◽

2021 ◽

Vol 11 (2) ◽

pp. 796

Author(s):

Alhanoof Althnian ◽

Duaa AlSaeed ◽

Heyam Al-Baity ◽

Amani Samha ◽

Alanoud Bin Dris ◽

...

Keyword(s):

Empirical Evaluation ◽

Classification Performance ◽

Support Vector ◽

Robust Model ◽

Original Distribution ◽

C4.5 Decision Tree ◽

Dataset Size ◽

Overall Performance ◽

Medical Domain ◽

The Impact

Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naïve Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.

Download Full-text

A comprehensive investigation of the impact of feature selection techniques on crashing fault residence prediction models

Information and Software Technology ◽

10.1016/j.infsof.2021.106652 ◽

2021 ◽

pp. 106652

Author(s):

Kunsong Zhao ◽

Zhou Xu ◽

Meng Yan ◽

Tao Zhang ◽

Dan Yang ◽

...

Keyword(s):

Feature Selection ◽

Prediction Models ◽

Comprehensive Investigation ◽

The Impact ◽

Feature Selection Techniques

Download Full-text

Evaluating the Impact of Feature Selection on Overall Performance of Sentiment Analysis

Examining the Impact of Feature Selection on Sentiment Analysis for the Greek Language

Sentiment Analysis of Japanese Tourism Online Reviews

The impact of the 1993 Colombian health sector reform on the overall performance of the health system

Sentiment Analysis of Movie Reviews: A Study of Machine Learning Algorithms with Various Feature Selection Methods

An Improved Intelligent Approach to Enhance the Sentiment Classifier for Knowledge Discovery Using Machine Learning

The Impact of Bar Service Operation Practices on Organizational Performance in Indian Hotel Industry

Feature Selection Methods in Sentiment Analysis

The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain

A comprehensive investigation of the impact of feature selection techniques on crashing fault residence prediction models

Export Citation Format