Ensemble of Classifiers and Term Weighting Schemes for Sentiment Analysis in Turkish

Aytuğ Onan;

doi:10.52460/src.2021.004

Ensemble of Classifiers and Term Weighting Schemes for Sentiment Analysis in Turkish

Mapping Intimacies ◽

10.52460/src.2021.004 ◽

2021 ◽

Vol 1 (1) ◽

pp. 1-12

Author(s):

Aytuğ Onan ◽

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Nearest Neighbor ◽

Text Messages ◽

Support Vector ◽

K Nearest Neighbor ◽

Term Weighting ◽

Text Documents ◽

Weighting Schemes ◽

Short Text

With the advancement of information and communication technology, social networking and microblogging sites have become a vital source of information. Individuals can express their opinions, grievances, feelings, and attitudes about a variety of topics. Through microblogging platforms, they can express their opinions on current events and products. Sentiment analysis is a significant area of research in natural language processing because it aims to define the orientation of the sentiment contained in source materials. Twitter is one of the most popular microblogging sites on the internet, with millions of users daily publishing over one hundred million text messages (referred to as tweets). Choosing an appropriate term representation scheme for short text messages is critical. Term weighting schemes are critical representation schemes for text documents in the vector space model. We present a comprehensive analysis of Turkish sentiment analysis using nine supervised and unsupervised term weighting schemes in this paper. The predictive efficiency of term weighting schemes is investigated using four supervised learning algorithms (Naive Bayes, support vector machines, the k-nearest neighbor algorithm, and logistic regression) and three ensemble learning methods (AdaBoost, Bagging, and Random Subspace). The empirical evidence suggests that supervised term weighting models can outperform unsupervised term weighting models.

Download Full-text

Supervised Machine Learning Algorithms for Sentiment Analysis of Bangla Newspaper

International Journal of Innovative Computing ◽

10.11113/ijic.v11n2.321 ◽

2021 ◽

Vol 11 (2) ◽

pp. 15-23

Author(s):

Sabrina Jahan Maisha ◽

Nuren Nafisa ◽

Abdul Kadar Muhammad Masum

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Nearest Neighbor ◽

Online News ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

K Nearest Neighbor ◽

Aged People

We can state undoubtedly that Bangla language is rich enough to work with and implement various Natural Language Processing (NLP) tasks. Though it needs proper attention, hardly NLP field has been explored with it. In this age of digitalization, large amount of Bangla news contents are generated in online platforms. Some of the contents are inappropriate for the children or aged people. With the motivation to filter out news contents easily, the aim of this work is to perform document level sentiment analysis (SA) on Bangla online news. In this respect, the dataset is created by collecting news from online Bangla newspaper archive. Further, the documents are manually annotated into positive and negative classes. Composite process technique of “Pipeline” class including Count Vectorizer, transformer (TF-IDF) and machine learning (ML) classifiers are employed to extract features and to train the dataset. Six supervised ML classifiers (i.e. Multinomial Naive Bayes (MNB), K-Nearest Neighbor (K-NN), Random Forest (RF), (C4.5) Decision Tree (DT), Logistic Regression (LR) and Linear Support Vector Machine (LSVM)) are used to analyze the best classifier for the proposed model. There has been very few works on SA of Bangla news. So, this work is a small attempt to contribute in this field. This model showed remarkable efficiency through better results in both the validation process of percentage split method and 10-fold cross validation. Among all six classifiers, RF has outperformed others by 99% accuracy. Even though LSVM has shown lowest accuracy of 80%, it is also considered as good output. However, this work has also exhibited surpassing outcome for recent and critical Bangla news indicating proper feature extraction to build up the model.

Download Full-text

Aspect Term Extraction for Aspect Based Opinion Mining

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2050.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 2228-2233

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Random Fields ◽

Opinion Mining ◽

Nearest Neighbor ◽

Conditional Random Fields ◽

International Workshop ◽

Support Vector ◽

K Nearest Neighbor ◽

Term Extraction

Opinion Mining (OM) is also called as Sentiment Analysis (SA). Aspect Based Opinion Mining (ABOM) is also called as Aspect Based Sentiment Analysis (ABSA). In this paper, three new features are proposed to extract the aspect term for Aspect Based Sentiment Analysis (ABSA). The influence of the proposed features is evaluated on five classifiers namely Decision Tree (DT), Naive Bayes (NB), K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Conditional Random Fields (CRF). The proposed features are evaluated on the Two datasets on Restaurant and Laptop domains available in International Workshop on Semantic Evaluation 2014 i.e. SemEval 2014. The influence of proposed features is evaluated using Precision, Recall and F1 measures. The proposed features are highly influencing for aspect term extraction on classifiers. The performance of SVM and CRF classifiers with proposed features is more influencing for aspect term extraction compared with NB, DT and KNN classifiers.

Download Full-text

Generalized approach to sentiment analysis of short text messages in natural language processing

Information and Control Systems ◽

10.31799/1684-8853-2020-1-2-14 ◽

2020 ◽

pp. 2-14

Author(s):

Evrenii Polyakov ◽

Leonid Voskov ◽

Pavel Abramov ◽

Sergey Polyakov

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Data Analysis ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Full Range ◽

Text Messages ◽

Basic Solution ◽

Short Text

Introduction: Sentiment analysis is a complex problem whose solution essentially depends on the context, field of study andamount of text data. Analysis of publications shows that the authors often do not use the full range of possible data transformationsand their combinations. Only a part of the transformations is used, limiting the ways to develop high-quality classification models.Purpose: Developing and exploring a generalized approach to building a model, which consists in sequentially passing throughthe stages of exploratory data analysis, obtaining a basic solution, vectorization, preprocessing, hyperparameter optimization, andmodeling. Results: Comparative experiments conducted using a generalized approach for classical machine learning and deeplearning algorithms in order to solve the problem of sentiment analysis of short text messages in natural language processinghave demonstrated that the classification quality grows from one stage to another. For classical algorithms, such an increasein quality was insignificant, but for deep learning, it was 8% on average at each stage. Additional studies have shown that theuse of automatic machine learning which uses classical classification algorithms is comparable in quality to manual modeldevelopment; however, it takes much longer. The use of transfer learning has a small but positive effect on the classificationquality. Practical relevance: The proposed sequential approach can significantly improve the quality of models under developmentin natural language processing problems.

Download Full-text

Twitter sentiment analysis of the relocation of Indonesia's capital city

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v9i4.2352 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1620-1630

Author(s):

Edi Sutoyo ◽

Ahmad Almaarif

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Point Of View ◽

Capital City ◽

Support Vector ◽

K Nearest Neighbor ◽

Evaluation Algorithm ◽

The Government ◽

The Many

Indonesia has a capital city which is one of the many big cities in the world called Jakarta. Jakarta's role in the dynamics that occur in Indonesia is very central because it functions as a political and government center, and is a business and economic center that drives the economy. Recently the discourse of the government to relocate the capital city has invited various reactions from the community. Therefore, in this study, sentiment analysis of the relocation of the capital city was carried out. The analysis was performed by doing a classification to describe the public sentiment sourced from twitter data, the data is classified into 2 classes, namely positive and negative sentiments. The algorithms used in this study include Naïve Bayes classifier, logistic regression, support vector machine, and K-nearest neighbor. The results of the performance evaluation algorithm showed that support vector machine outperformed as compared to 3 algorithms with the results of Accuracy, Precision, Recall, and F-measure are 97.72%, 96.01%, 99.18%, and 97.57%, respectively. Sentiment analysis of the discourse of relocation of the capital city is expected to provide an overview to the government of public opinion from the point of view of data coming from social media.

Download Full-text

Evaluating Annotated Dataset of Customer Reviews for Aspect Based Sentiment Analysis

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2122 ◽

2021 ◽

Author(s):

Dimple Chehal ◽

Parul Gupta ◽

Payal Gulati

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Supervised Machine Learning ◽

Support Vector ◽

Product Reviews ◽

K Nearest Neighbor ◽

Customer Reviews ◽

Percent Accuracy

Sentiment analysis of product reviews on e-commerce platforms aids in determining the preferences of customers. Aspect-based sentiment analysis (ABSA) assists in identifying the contributing aspects and their corresponding polarity, thereby allowing for a more detailed analysis of the customer’s inclination toward product aspects. This analysis helps in the transition from the traditional rating-based recommendation process to an improved aspect-based process. To automate ABSA, a labelled dataset is required to train a supervised machine learning model. As the availability of such dataset is limited due to the involvement of human efforts, an annotated dataset has been provided here for performing ABSA on customer reviews of mobile phones. The dataset comprising of product reviews of Apple-iPhone11 has been manually annotated with predefined aspect categories and aspect sentiments. The dataset’s accuracy has been validated using state-of-the-art machine learning techniques such as Naïve Bayes, Support Vector Machine, Logistic Regression, Random Forest, K-Nearest Neighbor and Multi Layer Perceptron, a sequential model built with Keras API. The MLP model built through Keras Sequential API for classifying review text into aspect categories produced the most accurate result with 67.45 percent accuracy. K- nearest neighbor performed the worst with only 49.92 percent accuracy. The Support Vector Machine had the highest accuracy for classifying review text into aspect sentiments with an accuracy of 79.46 percent. The model built with Keras API had the lowest 76.30 percent accuracy. The contribution is beneficial as a benchmark dataset for ABSA of mobile phone reviews.

Download Full-text

Study on XML Retrieval Results Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.1773 ◽

2012 ◽

Vol 263-266 ◽

pp. 1773-1777

Author(s):

Hong Yu ◽

Xiao Lei Huang ◽

Zhi Ling Wei ◽

Chen Xia Yang

Keyword(s):

Nearest Neighbor ◽

Classification Performance ◽

Feedback Mechanism ◽

Support Vector ◽

Svm Classifier ◽

K Nearest Neighbor ◽

Xml Retrieval ◽

Text Documents ◽

Plain Text ◽

Xml Documents

Mining (classify or clustering) retrieval results to serve relevance feedback mechanism of search engine is an important solution to improve effectiveness of retrieval. Unlike plain text documents, since the XML documents are semi-structured data, for XML retrieval results classification, consider exploiting structure features of XML documents, such as tag paths and edges etc. We propose to use Support Vector Machine (SVM) classifier to classify XML retrieval results exploiting both their content and structure features. We implemented the classification method on XML retrieval results based on the IEEE SC corpus. Compared with k-nearest neighbor classification (KNN) on the same dataset in our application, SVM perform better. The experiment results have also shown that the use of structure features, especially tag paths and edges, can improve the classification performance significantly.

Download Full-text

Product Review Based Customer Sentiment Analysis using an Ensemble of mRMR and Forest Optimization Algorithm (FOA)

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2022010107 ◽

2022 ◽

Vol 13 (1) ◽

pp. 0-0

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Optimization Algorithm ◽

Nearest Neighbor ◽

Hybrid Approach ◽

Support Vector ◽

K Nearest Neighbor ◽

Feature Selection Technique ◽

Feature Selection Problem

This research presents a way of feature selection problem for classification of sentiments that use ensemble-based classifier. This includes a hybrid approach of minimum redundancy and maximum relevance (mRMR) technique and Forest Optimization Algorithm (FOA) (i.e. mRMR-FOA) based feature selection. Before applying the FOA on sentiment analysis, it has been used as feature selection technique applied on 10 different classification datasets publically available on UCI machine learning repository. The classifiers for example k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and Naïve Bayes used the ensemble based algorithm for available datasets. The mRMR-FOA uses the Blitzer’s dataset (customer reviews on electronic products survey) to select the significant features. The classification of sentiments has noticed to improve by 12 to 18%. The evaluated results are further enhanced by the ensemble of k-NN, NB and SVM with an accuracy of 88.47% for the classification of sentiment analysis task.

Download Full-text

Multi - Class Document Classification: Effective and Systematized Method to Categorize Documents

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207117 ◽

2020 ◽

pp. 118-123 ◽

Cited By ~ 1

Author(s):

Kaushika Pal ◽

Biraj V. Patel

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

English Language ◽

Nearest Neighbor ◽

Research Work ◽

Support Vector ◽

Indian Languages ◽

K Nearest Neighbor

A large section of World Wide Web is full of Documents, content; Data, Big data, unformatted data, formatted data, unstructured and unorganized data and we need information infrastructure, which is useful and easily accessible as an when required. This research work is combining approach of Natural Language Processing and Machine Learning for content-based classification of documents. Natural Language Processing is used which will divide the problem of understanding entire document at once into smaller chucks and give us only with useful tokens responsible for Feature Extraction, which is machine learning technique to create Feature Set which helps to train classifier to predict label for new document and place it at appropriate location. Machine Learning subset of Artificial Intelligence is enriched with sophisticated algorithms like Support Vector Machine, K – Nearest Neighbor, Naïve Bayes, which works well with many Indian Languages and Foreign Language content’s for classification. This Model is successful in classifying documents with more than 70% of accuracy for major Indian Languages and more than 80% accuracy for English Language.

Download Full-text

Sentiment Analysis on Social Media using Machine Learning Approach

10.22541/au.163620143.37655829/v1 ◽

2021 ◽

Author(s):

Erick Omuya ◽

George Okeyo ◽

Michael Kimwele

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Language Processing ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approach ◽

K Nearest Neighbor ◽

Machine Learning Approach

Social media has been embraced by different people as a convenient and official medium of communication. People write messages and attach images and videos on Twitter, Facebook and other social media which they share. Social media therefore generates a lot of data that is rich in sentiments from these updates. Sentiment analysis has been used to determine opinions of clients, for instance, relating to a particular product or company. Knowledge based approach and Machine learning approach are among the strategies that have been used to analyze these sentiments. The performance of sentiment analysis is however distorted by noise, the curse of dimensionality, the data domains and size of data used for training and testing. This research aims at developing a model for sentiment analysis in which dimensionality reduction and the use of different parts of speech improves sentiment analysis performance. It uses natural language processing for filtering, storing and performing sentiment analysis on the data from social media. The model is tested using Naïve Bayes, Support Vector Machines and K-Nearest neighbor machine learning algorithms and its performance compared with that of two other Sentiment Analysis models. Experimental results show that the model improves sentiment analysis performance using machine learning techniques.

Download Full-text

Analisis Sentimen Masyarakat Terhadap COVID-19 Pada Media Sosial Twitter

Journal of Dinda : Data Science, Information Technology, and Data Analytics ◽

10.20895/dinda.v1i1.180 ◽

2021 ◽

Vol 1 (1) ◽

pp. 42-51

Author(s):

Ardianne Luthfika Fairuz ◽

Rima Dias Ramadhani ◽

Nia Annisa Ferani Tanjung

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor

Akhir tahun 2019 lalu dunia digemparkan oleh munculnya suatu penyakit yang disebabkan oleh virus SARS-CoV-2 yang merupakan jenis virus terbaru dari coronavirus. Penyakit ini dikenal dengan nama COVID-19. Penyebaran penyakit ini terbilang cukup luas dan cepat. Dalam waktu singkat penyakit ini mulai menyebar ke segala penjuru dunia tak terkecuali Indonesia. Dengan tingkat penyebaran yang begitu tinggi dan belum ditemukannya vaksin untuk COVID-19, menyebabkan kekacauan di tengah masyarakat. Hal ini mempengaruhi banyak sektor kehidupan masyarakat. Tak sedikit masyarakat yang aktif bersosial media dan menuliskan pendapat, opini serta pemikirannya di platform media sosial seperti Twitter. Terjadinya pandemi ini mendorong masyarakat untuk menuliskan opini, pemikiran serta pendapatnya terhadap COVID-19 pada media sosial Twitter. Dibutuhkan suatu model sentiment analysis untuk mengklasifikasi tweet masyarakat di Twitter menjadi positif dan negatif. Sentiment analysis merupakan bagian dari Natural Language Processing yang membuat sebuah sistem guna mengenali serta mengekstraksi opini dalam bentuk teks. Pada penelitian ini digunakan algoritma Naive Bayes dan K-Nearest Neighbor untuk digunakan dalam membangun model sentiment analysis terhadap tweet pengguna Twitter terhadap COVID-19. Didapatkan akurasi sebesar 85% untuk algoritma Naïve Bayes dan 82% untuk algoritma K-Nearest Neighbor pada nilai k=6, 8, dan 14.

Download Full-text