Supervised Machine Learning Algorithms for Sentiment Analysis of Bangla Newspaper

Sabrina Jahan Maisha; Nuren Nafisa; Abdul Kadar Muhammad Masum

doi:10.11113/ijic.v11n2.321

Supervised Machine Learning Algorithms for Sentiment Analysis of Bangla Newspaper

International Journal of Innovative Computing ◽

10.11113/ijic.v11n2.321 ◽

2021 ◽

Vol 11 (2) ◽

pp. 15-23

Author(s):

Sabrina Jahan Maisha ◽

Nuren Nafisa ◽

Abdul Kadar Muhammad Masum

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Nearest Neighbor ◽

Online News ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

K Nearest Neighbor ◽

Aged People

We can state undoubtedly that Bangla language is rich enough to work with and implement various Natural Language Processing (NLP) tasks. Though it needs proper attention, hardly NLP field has been explored with it. In this age of digitalization, large amount of Bangla news contents are generated in online platforms. Some of the contents are inappropriate for the children or aged people. With the motivation to filter out news contents easily, the aim of this work is to perform document level sentiment analysis (SA) on Bangla online news. In this respect, the dataset is created by collecting news from online Bangla newspaper archive. Further, the documents are manually annotated into positive and negative classes. Composite process technique of “Pipeline” class including Count Vectorizer, transformer (TF-IDF) and machine learning (ML) classifiers are employed to extract features and to train the dataset. Six supervised ML classifiers (i.e. Multinomial Naive Bayes (MNB), K-Nearest Neighbor (K-NN), Random Forest (RF), (C4.5) Decision Tree (DT), Logistic Regression (LR) and Linear Support Vector Machine (LSVM)) are used to analyze the best classifier for the proposed model. There has been very few works on SA of Bangla news. So, this work is a small attempt to contribute in this field. This model showed remarkable efficiency through better results in both the validation process of percentage split method and 10-fold cross validation. Among all six classifiers, RF has outperformed others by 99% accuracy. Even though LSVM has shown lowest accuracy of 80%, it is also considered as good output. However, this work has also exhibited surpassing outcome for recent and critical Bangla news indicating proper feature extraction to build up the model.

Download Full-text

Evaluating Annotated Dataset of Customer Reviews for Aspect Based Sentiment Analysis

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2122 ◽

2021 ◽

Author(s):

Dimple Chehal ◽

Parul Gupta ◽

Payal Gulati

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Supervised Machine Learning ◽

Support Vector ◽

Product Reviews ◽

K Nearest Neighbor ◽

Customer Reviews ◽

Percent Accuracy

Sentiment analysis of product reviews on e-commerce platforms aids in determining the preferences of customers. Aspect-based sentiment analysis (ABSA) assists in identifying the contributing aspects and their corresponding polarity, thereby allowing for a more detailed analysis of the customer’s inclination toward product aspects. This analysis helps in the transition from the traditional rating-based recommendation process to an improved aspect-based process. To automate ABSA, a labelled dataset is required to train a supervised machine learning model. As the availability of such dataset is limited due to the involvement of human efforts, an annotated dataset has been provided here for performing ABSA on customer reviews of mobile phones. The dataset comprising of product reviews of Apple-iPhone11 has been manually annotated with predefined aspect categories and aspect sentiments. The dataset’s accuracy has been validated using state-of-the-art machine learning techniques such as Naïve Bayes, Support Vector Machine, Logistic Regression, Random Forest, K-Nearest Neighbor and Multi Layer Perceptron, a sequential model built with Keras API. The MLP model built through Keras Sequential API for classifying review text into aspect categories produced the most accurate result with 67.45 percent accuracy. K- nearest neighbor performed the worst with only 49.92 percent accuracy. The Support Vector Machine had the highest accuracy for classifying review text into aspect sentiments with an accuracy of 79.46 percent. The model built with Keras API had the lowest 76.30 percent accuracy. The contribution is beneficial as a benchmark dataset for ABSA of mobile phone reviews.

Download Full-text

Sentiment Analysis on Social Media using Machine Learning Approach

10.22541/au.163620143.37655829/v1 ◽

2021 ◽

Author(s):

Erick Omuya ◽

George Okeyo ◽

Michael Kimwele

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Language Processing ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approach ◽

K Nearest Neighbor ◽

Machine Learning Approach

Social media has been embraced by different people as a convenient and official medium of communication. People write messages and attach images and videos on Twitter, Facebook and other social media which they share. Social media therefore generates a lot of data that is rich in sentiments from these updates. Sentiment analysis has been used to determine opinions of clients, for instance, relating to a particular product or company. Knowledge based approach and Machine learning approach are among the strategies that have been used to analyze these sentiments. The performance of sentiment analysis is however distorted by noise, the curse of dimensionality, the data domains and size of data used for training and testing. This research aims at developing a model for sentiment analysis in which dimensionality reduction and the use of different parts of speech improves sentiment analysis performance. It uses natural language processing for filtering, storing and performing sentiment analysis on the data from social media. The model is tested using Naïve Bayes, Support Vector Machines and K-Nearest neighbor machine learning algorithms and its performance compared with that of two other Sentiment Analysis models. Experimental results show that the model improves sentiment analysis performance using machine learning techniques.

Download Full-text

Leveraging Machine Learning Algorithms For Zero-Day Ransomware Attack

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8694.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 4104-4107

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

K Nearest Neighbor ◽

Supervised Learning Algorithms ◽

Microsoft Windows

Current global huge cyber protection attacks resulting from Infected Encryption ransomware structures over all international locations and businesses with millions of greenbacks lost in paying compulsion abundance. This type of malware encrypts consumer files, extracts consumer files, and charges higher ransoms to be paid for decryption of keys. An attacker could use different types of ransomware approach to steal a victim's files. Some of ransomware attacks like Scareware, Mobile ransomware, WannaCry, CryptoLocker, Zero-Day ransomware attack etc. A zero-day vulnerability is a software program security flaw this is regarded to the software seller however doesn’t have patch in vicinity to restore a flaw. Despite the fact that machine learning algorithms are already used to find encryption Ransomware. This is based on the analysis of a large number of PE file data Samples (benign software and ransomware utility) makes use of supervised machine learning algorithms for ascertain Zero-day attacks. This work was done on a Microsoft Windows operating system (the most attacked os through encryption ransomware) and estimated it. We have used four Supervised learning Algorithms, Random Forest Classifier , K-Nearest Neighbor, Support Vector Machine and Logistic Regression. Tests using machine learning algorithms evaluate almost null false positives with a 99.5% accuracy with a random forest algorithm.

Download Full-text

A REVIEW ON MACHINE LEARNING TECHNIQUES FOR ADVANCED HEALTH CARE SYSTEMS

June-2020 - International Journal of Engineering Sciences & Research Technology ◽

10.29121/ijesrt.v9.i11.2020.1 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1-7

Keyword(s):

Machine Learning ◽

Health Care ◽

Logistic Regression ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor

Artificial intelligence is the technology that lets a machine mimic the thinking ability of a human being. Machine learning is the subset of AI, that makes this machine exhibit human behavior by making it learn from the known data, without the need of explicitly programming it. The health care sector has adopted this technology, for the development of medical procedures, maintaining huge patient’s records, assist physicians in the prediction, detection, and treatment of diseases and many more. In this paper, a comparative study of six supervised machine learning algorithms namely Logistic Regression(LR),support vector machine(SVM),Decision Tree(DT).Random Forest(RF),k-nearest neighbor(k-NN),Naive Bayes (NB) are made for the classification and prediction of diseases. Result shows out of compared supervised learning algorithms here, logistic regression is performing best with an accuracy of 81.4 % and the least performing is k-NN with just an accuracy of 69.01% in the classification and prediction of diseases.

Download Full-text

An Analysis of Computational Complexity and Accuracy of Two Supervised Machine Learning Algorithms—K-Nearest Neighbor and Support Vector Machine

Data Management, Analytics and Innovation - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-15-5616-6_24 ◽

2020 ◽

pp. 335-347

Author(s):

Susmita Ray

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Computational Complexity ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

K Nearest Neighbor

Download Full-text

Machine Learning in Sentiment Analysis Over Twitter

Advanced Deep Learning Applications in Big Data Analytics - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-2791-7.ch007 ◽

2021 ◽

pp. 126-144

Author(s):

Kadda Zerrouki

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

K Nearest Neighbor ◽

Aspect Extraction ◽

Different Levels ◽

Document Level ◽

F Measure

Social networks are the main resources to gather information about people's opinions and sentiments towards different topics as they spend hours daily on social media and share their opinions. Twitter is a platform widely used by people to express their opinions and display sentiments on different occasions. Sentiment analysis's (SA) task is to label people's opinions as different categories such as positive and negative from a given piece of text. Another task is to decide whether a given text is subjective, expressing the writer's opinions, or objective. These tasks were performed at different levels of analysis ranging from the document level to the sentence and phrase level. Another task is aspect extraction, which originated from aspect-based sentiment analysis in phrase level. All these tasks are under the umbrella of SA. In recent years, a large number of methods, techniques, and enhancements have been proposed for the problem of SA in different tasks at different levels. Sentiment analysis is an approach to analyze data and retrieve sentiment that it embodies. Twitter sentiment analysis is an application of sentiment analysis on data from Twitter (tweets) in order to extract sentiments conveyed by the user. In the past decades, the research in this field has consistently grown. The reason behind this is the challenging format of the tweets, which makes the processing difficult. The tweet format is very small, which generates a whole new dimension of problems like use of slang, abbreviations, etc. The chapter elaborately discusses three supervised machine learning algorithms—naïve Bayes, k-nearest neighbor (KNN), and decision tree—and compares their overall accuracy, precisions, as well as recall values; f-measure; number of tweets correctly classified; number of tweets incorrectly classified; and execution time.

Download Full-text

Classification Approach for Sentiment Analysis Using Machine Learning

Applications of Artificial Neural Networks for Nonlinear Data - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-4042-8.ch005 ◽

2021 ◽

pp. 94-115

Author(s):

Satyen M. Parikh ◽

Mitali K. Shah

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Supervised Machine Learning ◽

Computational Semantics ◽

K Nearest Neighbor ◽

Modeling Methodology ◽

N Gram

A utilization of the computational semantics is known as natural language processing or NLP. Any opinion through attitude, feelings, and thoughts can be identified as sentiment. The overview of people against specific events, brand, things, or association can be recognized through sentiment analysis. Positive, negative, and neutral are each of the premises that can be grouped into three separate categories. Twitter, the most commonly used microblogging tool, is used to gather information for research. Tweepy is used to access Twitter's source of information. Python language is used to execute the classification algorithm on the information collected. Two measures are applied in sentiment analysis, namely feature extraction and classification. Using n-gram modeling methodology, the feature is extracted. Through a supervised machine learning algorithm, the sentiment is graded as positive, negative, and neutral. Support vector machine (SVM) and k-nearest neighbor (KNN) classification models are used and demonstrated both comparisons.

Download Full-text

Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods

Academic Pathology ◽

10.1177/2374289519873088 ◽

2019 ◽

Vol 6 ◽

pp. 237428951987308 ◽

Cited By ~ 25

Author(s):

Hooman H. Rashidi ◽

Nam K. Tran ◽

Elham Vali Betts ◽

Lydia P. Howell ◽

Ralph Green

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Nearest Neighbor ◽

Health Care Research ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Basic Knowledge ◽

Support Vector ◽

K Nearest Neighbor ◽

Supervised Methods

Increased interest in the opportunities provided by artificial intelligence and machine learning has spawned a new field of health-care research. The new tools under development are targeting many aspects of medical practice, including changes to the practice of pathology and laboratory medicine. Optimal design in these powerful tools requires cross-disciplinary literacy, including basic knowledge and understanding of critical concepts that have traditionally been unfamiliar to pathologists and laboratorians. This review provides definitions and basic knowledge of machine learning categories (supervised, unsupervised, and reinforcement learning), introduces the underlying concept of the bias-variance trade-off as an important foundation in supervised machine learning, and discusses approaches to the supervised machine learning study design along with an overview and description of common supervised machine learning algorithms (linear regression, logistic regression, Naive Bayes, k-nearest neighbor, support vector machine, random forest, convolutional neural networks).

Download Full-text

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210458 ◽

2021 ◽

pp. 1-17

Author(s):

Ahmed Al-Tarawneh ◽

Ja’afer Al-Saraireh

Keyword(s):

Machine Learning ◽

Complex Networks ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Efficient Detection ◽

Suggested Keywords

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.

Download Full-text

Financial Context News Sentiment Analysis for the Lithuanian Language

Applied Sciences ◽

10.3390/app11104443 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4443

Author(s):

Rokas Štrimaitis ◽

Pavel Stefanovič ◽

Simona Ramanauskaitė ◽

Asta Slotkienė

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Experimental Investigations ◽

Support Vector ◽

Applied Machine Learning ◽

Bayes Algorithm ◽

Website Content

Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise operations, status, etc. Therefore, it is worth analyzing the news portal article text. Sentiment analysis in English texts and financial area texts exist, and are accurate, the complexity of Lithuanian language is mostly concentrated on sentiment analysis of comment texts, and does not provide high accuracy. Therefore in this paper, the supervised machine learning model was implemented to assign sentiment analysis on financial context news, gathered from Lithuanian language websites. The analysis was made using three commonly used classification algorithms in the field of sentiment analysis. The hyperparameters optimization using the grid search was performed to discover the best parameters of each classifier. All experimental investigations were made using the newly collected datasets from four Lithuanian news websites. The results of the applied machine learning algorithms show that the highest accuracy is obtained using a non-balanced dataset, via the multinomial Naive Bayes algorithm (71.1%). The other algorithm accuracies were slightly lower: a long short-term memory (71%), and a support vector machine (70.4%).

Download Full-text