scholarly journals Sentiment Analysis on Social Media using Machine Learning Approach

Author(s):  
Erick Omuya ◽  
George Okeyo ◽  
Michael Kimwele

Social media has been embraced by different people as a convenient and official medium of communication. People write messages and attach images and videos on Twitter, Facebook and other social media which they share. Social media therefore generates a lot of data that is rich in sentiments from these updates. Sentiment analysis has been used to determine opinions of clients, for instance, relating to a particular product or company. Knowledge based approach and Machine learning approach are among the strategies that have been used to analyze these sentiments. The performance of sentiment analysis is however distorted by noise, the curse of dimensionality, the data domains and size of data used for training and testing. This research aims at developing a model for sentiment analysis in which dimensionality reduction and the use of different parts of speech improves sentiment analysis performance. It uses natural language processing for filtering, storing and performing sentiment analysis on the data from social media. The model is tested using Naïve Bayes, Support Vector Machines and K-Nearest neighbor machine learning algorithms and its performance compared with that of two other Sentiment Analysis models. Experimental results show that the model improves sentiment analysis performance using machine learning techniques.

2018 ◽  
Vol 7 (2.8) ◽  
pp. 284 ◽  
Author(s):  
R Ragupathy ◽  
Lakshmana Phaneendra Maguluri

Sentiment analysis deals with identifying and classifying opinions or sentiments expressed in main text. It mainly refers to a text classification. Social media is generating a vast amount of sentiment rich data in the form of tweets, blog posts, comments, status updates, news etc. Sentiment analysis of this user generated data is very useful in knowing the opinion of the public. Knowledge base approach and Machine learning approach are the two strategies used for analyzing sentiments from the text. In this paper, Machine learning approach has been used for the sentiment analysis of movie review dataset and is analysed by Naïve Bayes, Decision tree, KNN, and SVM classifiers. Commencing the most efficient classification technique is the moto of the paper. Efficiency of the classifier is decided based on some regular parameters that are outputs of the classification techniques.


Author(s):  
Amit Purohit

Sentiment analysis is defined as the process of mining of data, view, review or sentence to Predict the emotion of the sentence through natural language processing (NLP) or Machine Learning Techniques. The sentiment analysis involve classification of text into three phase “Positive”, “Negative” or “Neutral”. The process of finding user Opinion about the topic or Product or problem is called as opinion mining. Analyzing the emotions from the extracted Opinions are defined as Sentiment Analysis. The goal of opinion mining and Sentiment Analysis is to make computer able to recognize and express emotion. Using social media, E-commerce website, movies reviews such as Face book, twitter, Amazon, Flipkart etc. user share their views, feelings in a convenient way. Sentiment analysis in a machine learning approach in which machines classify and analyze the human’s sentiments, emotions, opinions etc. about the products. Out of the various classification models, Naïve Bayes, Support Vector Machine (SVM) and Decision Tree are used maximum times for the product analysis. The proposed approach will do better result as compare to other machine learning techniques.


2021 ◽  
Vol 11 (2) ◽  
pp. 15-23
Author(s):  
Sabrina Jahan Maisha ◽  
Nuren Nafisa ◽  
Abdul Kadar Muhammad Masum

We can state undoubtedly that Bangla language is rich enough to work with and implement various Natural Language Processing (NLP) tasks. Though it needs proper attention, hardly NLP field has been explored with it. In this age of digitalization, large amount of Bangla news contents are generated in online platforms. Some of the contents are inappropriate for the children or aged people. With the motivation to filter out news contents easily, the aim of this work is to perform document level sentiment analysis (SA) on Bangla online news. In this respect, the dataset is created by collecting news from online Bangla newspaper archive.  Further, the documents are manually annotated into positive and negative classes. Composite process technique of “Pipeline” class including Count Vectorizer, transformer (TF-IDF) and machine learning (ML) classifiers are employed to extract features and to train the dataset. Six supervised ML classifiers (i.e. Multinomial Naive Bayes (MNB), K-Nearest Neighbor (K-NN), Random Forest (RF), (C4.5) Decision Tree (DT), Logistic Regression (LR) and Linear Support Vector Machine (LSVM)) are used to analyze the best classifier for the proposed model. There has been very few works on SA of Bangla news. So, this work is a small attempt to contribute in this field. This model showed remarkable efficiency through better results in both the validation process of percentage split method and 10-fold cross validation. Among all six classifiers, RF has outperformed others by 99% accuracy. Even though LSVM has shown lowest accuracy of 80%, it is also considered as good output. However, this work has also exhibited surpassing outcome for recent and critical Bangla news indicating proper feature extraction to build up the model.


2021 ◽  
Vol 24 (4) ◽  
pp. 52-58
Author(s):  
Mohammed W. Habib ◽  
◽  
Zainab N. Sultani ◽  

One of the active sciences or studies whose importance is rising is the science of sentiment analysis. The reason is due to the increasing sources of data that require investigation. Among the most valuable sources is Twitter, in addition to Facebook and other social media platforms. The objective of sentiment analysis is to classify sentiment/opinions of users as positive, negative, or neutral from textual data. This analysis is valuable for many applications that require understanding people's or users' opinions and emotions about a particular topic, product, or service. Several researchers tackle the problem of sentiment analysis using machine learning algorithms. In this paper, a comparative study is presented of various researches conducted a sentiment analysis on social media and especially on Tweets. The survey carried out in this paper provides an overview of preprocessing steps, machine learning algorithms, and approaches used for sentiment classification during the period 2015-2020.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Samina Amin ◽  
Muhammad Irfan Uddin ◽  
Duaa H. alSaeed ◽  
Atif Khan ◽  
Muhammad Adnan

Seasonal outbreaks have several different periods that occur primarily during winter in temperate regions, while influenza may occur throughout the year in tropical regions, triggering outbreaks more irregularly. Similarly, dengue occurs in the star of the rainy season in early May and reaches its peak in late June. Dengue and flu brought an impact on various countries in the years 2017–2019 and streaming Twitter data reveals the status of dengue and flu outbreaks in the most affected regions. This research work presents that Social Media Analysis (SMA) can be used as a detector of the epidemic outbreak and to understand the sentiment of social media users regarding various diseases. Providing awareness about seasonal outbreaks through SMA is an effective approach for researchers and healthcare responders to detect the early outbreaks. The proposed model aims to find the sentiment about the disease in tweets, and the seasonal outbreaks-related tweets are classified into two classes as disease positive and disease negative. This work proposes a machine-learning-based approach to detect dengue and flu outbreaks in social media platform Twitter, using four machine learning algorithms: Random Forest (RF), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Decision Tree (DT), with the help of Term Frequency and Inverse Document Frequency (TF-IDF). For experimental analysis, two datasets (dengue and flu) are analyzed individually. The experimental results show that the RF classifier has outperformed the comparison models in terms of improved accuracy, precision, recall, F1-measure, and Receiver Operating Characteristic (ROC) curve. The proposed work offers favorable performance with total precision, accuracy, recall, and F1-measure ranging from 84% to 88% for conventional machine learning techniques.


2021 ◽  
pp. 1-17
Author(s):  
Ahmed Al-Tarawneh ◽  
Ja’afer Al-Saraireh

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.


2018 ◽  
Vol 34 (3) ◽  
pp. 569-581 ◽  
Author(s):  
Sujata Rani ◽  
Parteek Kumar

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.


Machine Learning is empowering many aspects of day-to-day lives from filtering the content on social networks to suggestions of products that we may be looking for. This technology focuses on taking objects as image input to find new observations or show items based on user interest. The major discussion here is the Machine Learning techniques where we use supervised learning where the computer learns by the input data/training data and predict result based on experience. We also discuss the machine learning algorithms: Naïve Bayes Classifier, K-Nearest Neighbor, Random Forest, Decision Tress, Boosted Trees, Support Vector Machine, and use these classifiers on a dataset Malgenome and Drebin which are the Android Malware Dataset. Android is an operating system that is gaining popularity these days and with a rise in demand of these devices the rise in Android Malware. The traditional techniques methods which were used to detect malware was unable to detect unknown applications. We have run this dataset on different machine learning classifiers and have recorded the results. The experiment result provides a comparative analysis that is based on performance, accuracy, and cost.


Author(s):  
Mokhtar Al-Suhaiqi ◽  
Muneer A. S. Hazaa ◽  
Mohammed Albared

Due to rapid growth of research articles in various languages, cross-lingual plagiarism detection problem has received increasing interest in recent years. Cross-lingual plagiarism detection is more challenging task than monolingual plagiarism detection. This paper addresses the problem of cross-lingual plagiarism detection (CLPD) by proposing a method that combines keyphrases extraction, monolingual detection methods and machine learning approach. The research methodology used in this study has facilitated to accomplish the objectives in terms of designing, developing, and implementing an efficient Arabic – English cross lingual plagiarism detection. This paper empirically evaluates five different monolingual plagiarism detection methods namely i)N-Grams Similarity, ii)Longest Common Subsequence, iii)Dice Coefficient, iv)Fingerprint based Jaccard Similarity  and v) Fingerprint based Containment Similarity. In addition, three machine learning approaches namely i) naïve Bayes, ii) Support Vector Machine, and iii) linear logistic regression classifiers are used for Arabic-English Cross-language plagiarism detection. Several experiments are conducted to evaluate the performance of the key phrases extraction methods. In addition, Several experiments to investigate the performance of machine learning techniques to find the best method for Arabic-English Cross-language plagiarism detection. According to the experiments of Arabic-English Cross-language plagiarism detection, the highest result was obtained using SVM   classifier with 92% f-measure. In addition, the highest results were obtained by all classifiers are achieved, when most of the monolingual plagiarism detection methods are used. 


Life ◽  
2020 ◽  
Vol 10 (9) ◽  
pp. 181
Author(s):  
Christopher T. Mandrell ◽  
Torrey E. Holland ◽  
James F. Wheeler ◽  
Sakineh M. A. Esmaeili ◽  
Kshitij Amar ◽  
...  

A machine learning approach is applied to Raman spectra of cells from the MIA PaCa-2 human pancreatic cancer cell line to distinguish between tumor repopulating cells (TRCs) and parental control cells, and to aid in the identification of molecular signatures. Fifty-one Raman spectra from the two types of cells are analyzed to determine the best combination of data type, dimension size, and classification technique to differentiate the cell types. An accuracy of 0.98 is obtained from support vector machine (SVM) and k-nearest neighbor (kNN) classifiers with various dimension reduction and feature selection tools. We also identify some possible biomolecules that cause the spectral peaks that led to the best results.


Sign in / Sign up

Export Citation Format

Share Document