KNN classifier based approach for multi-class sentiment  analysis of twitter data

‘Sentiment’ literally means ‘Emotions’. Sentiment analysis, synonymous to opinion mining, is a type of data mining that refers to the analy-sis of data obtained from microblogging sites, social media updates, online news reports, user reviews etc., in order to study the sentiments of the people towards an event, organization, product, brand, person etc. In this work, sentiment classification is done into multiple classes. The proposed methodology based on KNN classification algorithm shows an improvement over one of the existing methodologies which is based on SVM classification algorithm. The data used for analysis has been taken from Twitter, this being the most popular microblogging site. The source data has been extracted from Twitter using Python’s Tweepy. N-Gram modeling technique has been used for feature extraction and the supervised machine learning algorithm k-nearest neighbor has been used for sentiment classification. The performance of proposed and existing techniques is compared in terms of accuracy, precision and recall. It is analyzed and concluded that the proposed technique performs better in terms of all the standard evaluation parameters.

Download Full-text

Classification Approach for Sentiment Analysis Using Machine Learning

Applications of Artificial Neural Networks for Nonlinear Data - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-4042-8.ch005 ◽

2021 ◽

pp. 94-115

Author(s):

Satyen M. Parikh ◽

Mitali K. Shah

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Supervised Machine Learning ◽

Computational Semantics ◽

K Nearest Neighbor ◽

Modeling Methodology ◽

N Gram

A utilization of the computational semantics is known as natural language processing or NLP. Any opinion through attitude, feelings, and thoughts can be identified as sentiment. The overview of people against specific events, brand, things, or association can be recognized through sentiment analysis. Positive, negative, and neutral are each of the premises that can be grouped into three separate categories. Twitter, the most commonly used microblogging tool, is used to gather information for research. Tweepy is used to access Twitter's source of information. Python language is used to execute the classification algorithm on the information collected. Two measures are applied in sentiment analysis, namely feature extraction and classification. Using n-gram modeling methodology, the feature is extracted. Through a supervised machine learning algorithm, the sentiment is graded as positive, negative, and neutral. Support vector machine (SVM) and k-nearest neighbor (KNN) classification models are used and demonstrated both comparisons.

Download Full-text

Supervised Machine Learning Algorithms for Sentiment Analysis of Bangla Newspaper

International Journal of Innovative Computing ◽

10.11113/ijic.v11n2.321 ◽

2021 ◽

Vol 11 (2) ◽

pp. 15-23

Author(s):

Sabrina Jahan Maisha ◽

Nuren Nafisa ◽

Abdul Kadar Muhammad Masum

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Nearest Neighbor ◽

Online News ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

K Nearest Neighbor ◽

Aged People

We can state undoubtedly that Bangla language is rich enough to work with and implement various Natural Language Processing (NLP) tasks. Though it needs proper attention, hardly NLP field has been explored with it. In this age of digitalization, large amount of Bangla news contents are generated in online platforms. Some of the contents are inappropriate for the children or aged people. With the motivation to filter out news contents easily, the aim of this work is to perform document level sentiment analysis (SA) on Bangla online news. In this respect, the dataset is created by collecting news from online Bangla newspaper archive. Further, the documents are manually annotated into positive and negative classes. Composite process technique of “Pipeline” class including Count Vectorizer, transformer (TF-IDF) and machine learning (ML) classifiers are employed to extract features and to train the dataset. Six supervised ML classifiers (i.e. Multinomial Naive Bayes (MNB), K-Nearest Neighbor (K-NN), Random Forest (RF), (C4.5) Decision Tree (DT), Logistic Regression (LR) and Linear Support Vector Machine (LSVM)) are used to analyze the best classifier for the proposed model. There has been very few works on SA of Bangla news. So, this work is a small attempt to contribute in this field. This model showed remarkable efficiency through better results in both the validation process of percentage split method and 10-fold cross validation. Among all six classifiers, RF has outperformed others by 99% accuracy. Even though LSVM has shown lowest accuracy of 80%, it is also considered as good output. However, this work has also exhibited surpassing outcome for recent and critical Bangla news indicating proper feature extraction to build up the model.

Download Full-text

Aspect Term Extraction for Aspect Based Opinion Mining

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2050.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 2228-2233

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Random Fields ◽

Opinion Mining ◽

Nearest Neighbor ◽

Conditional Random Fields ◽

International Workshop ◽

Support Vector ◽

K Nearest Neighbor ◽

Term Extraction

Opinion Mining (OM) is also called as Sentiment Analysis (SA). Aspect Based Opinion Mining (ABOM) is also called as Aspect Based Sentiment Analysis (ABSA). In this paper, three new features are proposed to extract the aspect term for Aspect Based Sentiment Analysis (ABSA). The influence of the proposed features is evaluated on five classifiers namely Decision Tree (DT), Naive Bayes (NB), K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Conditional Random Fields (CRF). The proposed features are evaluated on the Two datasets on Restaurant and Laptop domains available in International Workshop on Semantic Evaluation 2014 i.e. SemEval 2014. The influence of proposed features is evaluated using Precision, Recall and F1 measures. The proposed features are highly influencing for aspect term extraction on classifiers. The performance of SVM and CRF classifiers with proposed features is more influencing for aspect term extraction compared with NB, DT and KNN classifiers.

Download Full-text

Evaluating Annotated Dataset of Customer Reviews for Aspect Based Sentiment Analysis

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2122 ◽

2021 ◽

Author(s):

Dimple Chehal ◽

Parul Gupta ◽

Payal Gulati

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Supervised Machine Learning ◽

Support Vector ◽

Product Reviews ◽

K Nearest Neighbor ◽

Customer Reviews ◽

Percent Accuracy

Sentiment analysis of product reviews on e-commerce platforms aids in determining the preferences of customers. Aspect-based sentiment analysis (ABSA) assists in identifying the contributing aspects and their corresponding polarity, thereby allowing for a more detailed analysis of the customer’s inclination toward product aspects. This analysis helps in the transition from the traditional rating-based recommendation process to an improved aspect-based process. To automate ABSA, a labelled dataset is required to train a supervised machine learning model. As the availability of such dataset is limited due to the involvement of human efforts, an annotated dataset has been provided here for performing ABSA on customer reviews of mobile phones. The dataset comprising of product reviews of Apple-iPhone11 has been manually annotated with predefined aspect categories and aspect sentiments. The dataset’s accuracy has been validated using state-of-the-art machine learning techniques such as Naïve Bayes, Support Vector Machine, Logistic Regression, Random Forest, K-Nearest Neighbor and Multi Layer Perceptron, a sequential model built with Keras API. The MLP model built through Keras Sequential API for classifying review text into aspect categories produced the most accurate result with 67.45 percent accuracy. K- nearest neighbor performed the worst with only 49.92 percent accuracy. The Support Vector Machine had the highest accuracy for classifying review text into aspect sentiments with an accuracy of 79.46 percent. The model built with Keras API had the lowest 76.30 percent accuracy. The contribution is beneficial as a benchmark dataset for ABSA of mobile phone reviews.

Download Full-text

A Novel Algorithm for Sentiment Analysis of Online Movie Reviews

Advances in Business Information Systems and Analytics - Social Network Analytics for Contemporary Business Organizations ◽

10.4018/978-1-5225-5097-6.ch007 ◽

2018 ◽

pp. 106-140

Author(s):

Bisma Shah ◽

Farheen Siddiqui

Keyword(s):

Sentiment Analysis ◽

Nearest Neighbor ◽

Supervised Machine Learning ◽

Learning Approaches ◽

World Knowledge ◽

K Nearest Neighbor ◽

Customer Feedback ◽

Novel Approach ◽

The One ◽

Novel Algorithm

Others' opinions can be decisive while choosing among various options, especially when those choices involve worthy resources like spending time and money buying products or services. Customers relying on their peers' past reviews on e-commerce websites or social media have drawn a considerable interest to sentiment analysis due to realization of its commercial and business benefits. Sentiment analysis can be exercised on movie reviews, blogs, customer feedback, etc. This chapter presents a novel approach to perform sentiment analysis of movie reviews given by users on different websites. Also, challenges like presence of thwarted words, world knowledge, and subjectivity detection in sentiments are addressed in this chapter. The results are validated by using two supervised machine learning approaches, k-nearest neighbor and naive Bayes, both on method of sentiment analysis without addressing aforementioned challenges and on proposed method of sentiment analysis with all challenges addressed. Empirical results show that proposed method outperformed the one that left challenges unaddressed.

Download Full-text

Sentiment Analysis of Tweets Using Naïve Bayes, KNN, and Decision Tree

International Journal of Organizational and Collective Intelligence ◽

10.4018/ijoci.2020100103 ◽

2020 ◽

Vol 10 (4) ◽

pp. 35-49

Author(s):

Kadda Zerrouki ◽

Reda Mohamed Hamou ◽

Abdellatif Rahmoun

Keyword(s):

Decision Tree ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

K Nearest Neighbor ◽

Use Of Social Media ◽

The Masses

Making use of social media for analyzing the perceptions of the masses over a product, event, or a person has gained momentum in recent times. Out of a wide array of social networks, the authors chose Twitter for their analysis as the opinions expressed there are concise and bear a distinctive polarity. Sentiment analysis is an approach to analyze data and retrieve sentiment that it embodies. The paper elaborately discusses three supervised machine learning algorithms—naïve bayes, k-nearest neighbor (KNN), and decision tree—and compares their overall accuracy, precision, as well as recall values, f-measure, number of tweets correctly classified, number of tweets incorrectly classified, and execution time.

Download Full-text

Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v3i2.945 ◽

2019 ◽

Vol 3 (2) ◽

pp. 196-201 ◽

Cited By ~ 2

Author(s):

Anis Nikmatul Kasanah ◽

Muladi Muladi ◽

Utomo Pujianto

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Sampling Technique ◽

Online News ◽

Classification Algorithm ◽

Average Increase ◽

K Nearest Neighbor ◽

Amount Of Information ◽

Special System ◽

Average Decrease

Amount of information in the form of online news needs to be balanced with the ability of readers to sort or classify subjective or objective news. So that a special system is needed that can be used for online news objectivity classification so that it can help readers to pick up subjective or objective news. This research proposes the development of techniques in machine learning to help sort out news objectivity automatically based on the content of the news. The algorithm proposed is K-Nearest Neighbor (KNN) algorithm. News samples obtained from kompas.com by scrapping occur imbalance classes where the number of objective news and subjective news are not balanced. So that it can affect the performance of the classification algorithm. One technique to overcome the imbalance class is to apply the Synthetic Minority Over-sampling Technique (SMOTE) technique.. SMOTE is the generation of minority data as much as the majority data. This study compares the performance of KNN algorithm without SMOTE and the performance of KNN algorithm with SMOTE. Based on the results of the study by applying a variety of neighboring k values, namely 1, 3, 5, 7 and 9, it was found that the application of SMOTE could improve the accuracy of the KNN algorithm at values k = 1 and k = 3 with an average increase of 3.36. At values k 5, 7 and 9 the algorithm experiences an average decrease in accuracy of 6.67.

Download Full-text

Machine Learning in Sentiment Analysis Over Twitter

Advanced Deep Learning Applications in Big Data Analytics - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-2791-7.ch007 ◽

2021 ◽

pp. 126-144

Author(s):

Kadda Zerrouki

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

K Nearest Neighbor ◽

Aspect Extraction ◽

Different Levels ◽

Document Level ◽

F Measure

Social networks are the main resources to gather information about people's opinions and sentiments towards different topics as they spend hours daily on social media and share their opinions. Twitter is a platform widely used by people to express their opinions and display sentiments on different occasions. Sentiment analysis's (SA) task is to label people's opinions as different categories such as positive and negative from a given piece of text. Another task is to decide whether a given text is subjective, expressing the writer's opinions, or objective. These tasks were performed at different levels of analysis ranging from the document level to the sentence and phrase level. Another task is aspect extraction, which originated from aspect-based sentiment analysis in phrase level. All these tasks are under the umbrella of SA. In recent years, a large number of methods, techniques, and enhancements have been proposed for the problem of SA in different tasks at different levels. Sentiment analysis is an approach to analyze data and retrieve sentiment that it embodies. Twitter sentiment analysis is an application of sentiment analysis on data from Twitter (tweets) in order to extract sentiments conveyed by the user. In the past decades, the research in this field has consistently grown. The reason behind this is the challenging format of the tweets, which makes the processing difficult. The tweet format is very small, which generates a whole new dimension of problems like use of slang, abbreviations, etc. The chapter elaborately discusses three supervised machine learning algorithms—naïve Bayes, k-nearest neighbor (KNN), and decision tree—and compares their overall accuracy, precisions, as well as recall values; f-measure; number of tweets correctly classified; number of tweets incorrectly classified; and execution time.

Download Full-text

Quantum K-Nearest-Neighbor Image Classification Algorithm Based on K-L Transform

International Journal of Theoretical Physics ◽

10.1007/s10773-021-04747-7 ◽

2021 ◽

Author(s):

Nan-Run Zhou ◽

Xiu-Xun Liu ◽

Yu-Ling Chen ◽

Ni-Suo Du

Keyword(s):

Image Classification ◽

Nearest Neighbor ◽

Classification Algorithm ◽

K Nearest Neighbor

Download Full-text

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210458 ◽

2021 ◽

pp. 1-17

Author(s):

Ahmed Al-Tarawneh ◽

Ja’afer Al-Saraireh

Keyword(s):

Machine Learning ◽

Complex Networks ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Efficient Detection ◽

Suggested Keywords

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.

Download Full-text