The Accuracy Improvement of Text Mining Classification on Hospital Review through The Alteration in The Preprocessing Stage

Triyas Hevianto Saputro; Arief Hermawan

doi:10.24203/ijcit.v10i4.138

The Accuracy Improvement of Text Mining Classification on Hospital Review through The Alteration in The Preprocessing Stage

International Journal of Computer and Information Technology(2279-0764) ◽

10.24203/ijcit.v10i4.138 ◽

2021 ◽

Vol 10 (4) ◽

Author(s):

Triyas Hevianto Saputro ◽

Arief Hermawan

Keyword(s):

Machine Learning ◽

Text Mining ◽

Sentiment Analysis ◽

Text Classification ◽

Classification Model ◽

Training Process ◽

Accuracy Improvement ◽

Spelling Correction ◽

Preprocessing Technique ◽

Selection Of

Sentiment analysis is a part of text mining used to dig up information from a sentence or document. This study focuses on text classification for the purpose of a sentiment analysis on hospital review by customers through criticism and suggestion on Google Maps Review. The data of texts collected still contain a lot of nonstandard words. These nonstandard words cause problem in the preprocessing stage. Thus, the selection and combination of techniques in the preprocessing stage emerge as something crucial for the accuracy improvement in the computation of machine learning. However, not all of the techniques in the preprocessing stage can contribute to improve the accuracy on classification machine. The objective of this study is to improve the accuracy of classification model on hospital review by customers for a sentiment analysis modeling. Through the implementation of the preprocessing technique combination, it can produce a highly accurate classification model. This study experimented with several preprocessing techniques: (1) tokenization, (2) case folding, (3) stop words removal, (4) stemming, and (5) removing punctuation and number. The experiment was done by adding the preprocessing methods: (1) spelling correction and (2) Slang. The result shows that spelling correction and Slang method can assist for improving the accuracy value. Furthermore, the selection of suitable preprocessing technique combination can fasten the training process to produce the more ideal text classification model.

Download Full-text

Text Classification for Organizational Researchers

Organizational Research Methods ◽

10.1177/1094428117719322 ◽

2017 ◽

Vol 21 (3) ◽

pp. 766-799 ◽

Cited By ~ 18

Author(s):

Vladimer B. Kobayashi ◽

Stefan T. Mol ◽

Hannah A. Berkers ◽

Gábor Kismihók ◽

Deanne N. Den Hartog

Keyword(s):

Machine Learning ◽

Text Mining ◽

Text Classification ◽

Training Data ◽

Classification Model ◽

Data Preparation ◽

Organizational Research ◽

Job Vacancy ◽

Text Classifiers ◽

Effective Use

Organizations are increasingly interested in classifying texts or parts thereof into categories, as this enables more effective use of their information. Manual procedures for text classification work well for up to a few hundred documents. However, when the number of documents is larger, manual procedures become laborious, time-consuming, and potentially unreliable. Techniques from text mining facilitate the automatic assignment of text strings to categories, making classification expedient, fast, and reliable, which creates potential for its application in organizational research. The purpose of this article is to familiarize organizational researchers with text mining techniques from machine learning and statistics. We describe the text classification process in several roughly sequential steps, namely training data preparation, preprocessing, transformation, application of classification techniques, and validation, and provide concrete recommendations at each step. To help researchers develop their own text classifiers, the R code associated with each step is presented in a tutorial. The tutorial draws from our own work on job vacancy mining. We end the article by discussing how researchers can validate a text classification model and the associated output.

Download Full-text

Text Mining Based Approach to Customer Sentiment Analysis Using Machine Learning

Journal of Advances and Scholarly Researches in Allied Education ◽

10.29070/15/57680 ◽

2018 ◽

Vol 15 (6) ◽

pp. 58-65

Author(s):

Gurjeet Kaur

Keyword(s):

Machine Learning ◽

Text Mining ◽

Sentiment Analysis

Download Full-text

Deep Learning for text in limted data settings

10.36227/techrxiv.12100692 ◽

2020 ◽

Author(s):

Pathikkumar Patel ◽

Bhargav Lad ◽

Jinan Fiaidhi

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Sentiment Analysis ◽

Transfer Learning ◽

Text Classification ◽

State Of The Art ◽

Time Series Forecasting ◽

Text Data ◽

Performance Levels

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.

Download Full-text

Detection of Economy-Related Turkish Tweets Based on Machine Learning Approaches

10.4018/978-1-7998-8413-2.ch008 ◽

2022 ◽

pp. 171-195

Author(s):

Jale Bektaş

Keyword(s):

Machine Learning ◽

Text Mining ◽

Text Classification ◽

Integration Method ◽

Classification Problem ◽

Feature Representation ◽

Learning Approaches ◽

Machine Learning Methods ◽

Linguistic Approach ◽

Turkish Language

Conducting NLP for Turkish is a lot harder than other Latin-based languages such as English. In this study, by using text mining techniques, a pre-processing frame is conducted in which TF-IDF values are calculated in accordance with a linguistic approach on 7,731 tweets shared by 13 famous economists in Turkey, retrieved from Twitter. Then, the classification results are compared with four common machine learning methods (SVM, Naive Bayes, LR, and integration LR with SVM). The features represented by the TF-IDF are experimented in different N-grams. The findings show the success of a text classification problem is relative with the feature representation methods, and the performance superiority of SVM is better compared to other ML methods with unigram feature representation. The best results are obtained via the integration method of SVM with LR with the Acc of 82.9%. These results show that these methodologies are satisfying for the Turkish language.

Download Full-text

Twitter sentiment analysis for the estimation of voting intention in the 2017 Chilean elections

Intelligent Data Analysis ◽

10.3233/ida-194768 ◽

2020 ◽

Vol 24 (5) ◽

pp. 1141-1160

Author(s):

Tomás Alegre Sepúlveda ◽

Brian Keith Norambuena

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Sentiment Analysis ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Traditional Methods ◽

Actual Result ◽

Learning Techniques ◽

Vector Machines

In this paper, we apply sentiment analysis methods in the context of the first round of the 2017 Chilean elections. The purpose of this work is to estimate the voting intention associated with each candidate in order to contrast this with the results from classical methods (e.g., polls and surveys). The data are collected from Twitter, because of its high usage in Chile and in the sentiment analysis literature. We obtained tweets associated with the three main candidates: Sebastián Piñera (SP), Alejandro Guillier (AG) and Beatriz Sánchez (BS). For each candidate, we estimated the voting intention and compared it to the traditional methods. To do this, we first acquired the data and labeled the tweets as positive or negative. Afterward, we built a model using machine learning techniques. The classification model had an accuracy of 76.45% using support vector machines, which yielded the best model for our case. Finally, we use a formula to estimate the voting intention from the number of positive and negative tweets for each candidate. For the last period, we obtained a voting intention of 35.84% for SP, compared to a range of 34–44% according to traditional polls and 36% in the actual elections. For AG we obtained an estimate of 37%, compared with a range of 15.40% to 30.00% for traditional polls and 20.27% in the elections. For BS we obtained an estimate of 27.77%, compared with the range of 8.50% to 11.00% given by traditional polls and an actual result of 22.70% in the elections. These results are promising, in some cases providing an estimate closer to reality than traditional polls. Some differences can be explained due to the fact that some candidates have been omitted, even though they held a significant number of votes.

Download Full-text

Optimizing the Performance of Machine Learning Based Traffic Classification

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3506 ◽

2013 ◽

Vol 756-759 ◽

pp. 3506-3510

Author(s):

Qiu Chen Wang ◽

Lei Wang ◽

Ji Xiang

Keyword(s):

Machine Learning ◽

Real Life ◽

Classification Model ◽

Traffic Classification ◽

Traffic Flows ◽

Accuracy Improvement ◽

Security Monitoring ◽

Performance Improvements ◽

Critical Technology ◽

Critical Issues

Traffic classification is a critical technology in the areas of network management and security monitoring. Traditional port-based and payload-based classification are no longer effective due to the fact that many applications utilize unpredictable port numbers and packet encryption. Researchers tend to apply machine learning (ML) techniques to identify the traffic flows by recognizing statistical features. Unfortunately, looking back upon the related work, most of the ML-based classification algorithms have similar performance, and what really matters now is how to optimize these techniques. In this paper, we analyzed two critical issues (Feature Selection, Configuration of Parameters) of ML classification, and presented the corresponding viable methods to optimize the classification model. This paper also reported the experimental evaluation to assess the performance improvements introduced by our optimized methods; experimental results on real-life datasets and network traffic show that the classification model successfully achieves significant accuracy improvement.

Download Full-text

PERFECTIONOF CLASSIFICATION ACCURACY IN TEXT CATEGORIZATION

International Journal of Advanced Research ◽

10.21474/ijar01/13437 ◽

2021 ◽

Vol 9 (09) ◽

pp. 484-488

Author(s):

Rajeev Tripathi ◽

Keyword(s):

Sentiment Analysis ◽

Text Classification ◽

Classification Accuracy ◽

Text Categorization ◽

Classification Model ◽

Text Data ◽

Twitter Data ◽

Long Time ◽

Google Alerts ◽

Email Spam

Problems and strategies for text classification have already been known for a long time. Theyre widely utilised by companies like Google and Yahoo for email spam screening, sentiment analysis of Twitter data, and automatic news categories in Google alerts. Were still working on getting the findings to be as accurate as possible. When dealing with large amounts of text data, however, the models performance and accuracy become a difficulty. The type of words utilised in the corpus and the type of features produced for classification have a big impact on the performance of a text classification model.

Download Full-text

A REVIEW ON SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING TEXT MINING AND MACHINE LEARNING.

International Journal of Advanced Research ◽

10.21474/ijar01/526 ◽

2016 ◽

Vol 4 (5) ◽

pp. 772-775

Author(s):

GURPREET KAUR ◽

◽

MANOJ KUMAR ◽

Keyword(s):

Machine Learning ◽

Social Media ◽

Text Mining ◽

Sentiment Analysis ◽

Social Media Data ◽

Media Data

Download Full-text

A Comprehensive Analysis of Approaches for Sentiment Analysis Using Twitter Data on COVID-19 Vaccines

Journal of Informatics Electrical and Electronics Engineering (JIEEE) ◽

10.54060/jieee/002.02.009 ◽

2021 ◽

Vol 2 (2) ◽

pp. 1-10

Author(s):

Amrita Mishra ◽

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Text Classification ◽

Comprehensive Analysis ◽

Social Media Data ◽

The Public ◽

Opinion Analysis ◽

Twitter Data ◽

Media Data

Sentiment Analysis has paved routes for opinion analysis of masses over unrestricted territorial limits. With the advent and growth of social media like Twitter, Facebook, WhatsApp, Snapchat in today’s world, stakeholders and the public often takes to expressing their opinion on them and drawing conclusions. While these social media data are extremely informative and well connected, the major challenge lies in incorporating efficient Text Classification strategies which not only overcomes the unstructured and humongous nature of data but also generates correct polarity of opinions (i.e. positive, negative, and neutral). This paper is a thorough effort to provide a brief study about various approaches to SA including Machine Learning, Lexicon Based, and Automatic Approaches. The paper also highlights the comparison of positive, negative, and neutral tweets of the Sputnik V, Moderna, and Covaxin vaccines used for preventive and emergency use of COVID-19 disease.

Download Full-text

HMATC: Hierarchical multi-label Arabic text classification model using machine learning

Egyptian Informatics Journal ◽

10.1016/j.eij.2020.08.004 ◽

2020 ◽

Author(s):

Nawal Aljedani ◽

Reem Alotaibi ◽

Mounira Taileb

Keyword(s):

Machine Learning ◽

Text Classification ◽

Classification Model ◽

Arabic Text ◽

Arabic Text Classification

Download Full-text