Text pre-processing of multilingual for sentiment analysis based on social network data

<span>Sentiment analysis (SA) is an enduring area for research especially in the field of text analysis. Text pre-processing is an important aspect to perform SA accurately. This paper presents a text processing model for SA, using natural language processing techniques for twitter data. The basic phases for machine learning are text collection, text cleaning, pre-processing, feature extractions in a text and then categorize the data according to the SA techniques. Keeping the focus on twitter data, the data is extracted in domain specific manner. In data cleaning phase, noisy data, missing data, punctuation, tags and emoticons have been considered. For pre-processing, tokenization is performed which is followed by stop word removal (SWR). The proposed article provides an insight of the techniques, that are used for text pre-processing, the impact of their presence on the dataset. The accuracy of classification techniques has been improved after applying text pre-processing and dimensionality has been reduced. The proposed corpus can be utilized in the area of market analysis, customer behaviour, polling analysis, and brand monitoring. The text pre-processing process can serve as the baseline to apply predictive analysis, machine learning and deep learning algorithms which can be extended according to problem definition.</span>

Download Full-text

Sentiment Analysis on Twitter Data of World Cup Soccer Tournament Using Machine Learning

IoT ◽

10.3390/iot1020014 ◽

2020 ◽

Vol 1 (2) ◽

pp. 218-239 ◽

Cited By ~ 2

Author(s):

Ravikumar Patel ◽

Kalpdrum Passi

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Random Forest ◽

Natural Language ◽

Language Processing ◽

Machine Learning Algorithms ◽

World Cup ◽

Part Of Speech ◽

Twitter Data ◽

Processing Techniques

In the derived approach, an analysis is performed on Twitter data for World Cup soccer 2014 held in Brazil to detect the sentiment of the people throughout the world using machine learning techniques. By filtering and analyzing the data using natural language processing techniques, sentiment polarity was calculated based on the emotion words detected in the user tweets. The dataset is normalized to be used by machine learning algorithms and prepared using natural language processing techniques like word tokenization, stemming and lemmatization, part-of-speech (POS) tagger, name entity recognition (NER), and parser to extract emotions for the textual data from each tweet. This approach is implemented using Python programming language and Natural Language Toolkit (NLTK). A derived algorithm extracts emotional words using WordNet with its POS (part-of-speech) for the word in a sentence that has a meaning in the current context, and is assigned sentiment polarity using the SentiWordNet dictionary or using a lexicon-based method. The resultant polarity assigned is further analyzed using naïve Bayes, support vector machine (SVM), K-nearest neighbor (KNN), and random forest machine learning algorithms and visualized on the Weka platform. Naïve Bayes gives the best accuracy of 88.17% whereas random forest gives the best area under the receiver operating characteristics curve (AUC) of 0.97.

Download Full-text

Sentiment Analysis on Twitter Data: A Comparative Approach

International Journal of Computer Science and Mobile Applications ◽

10.47760/ijcsma.2021.v09i10.001 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1-12

Author(s):

Subhadip Chandra ◽

Randrita Sarkar ◽

Sayon Islam ◽

Soham Nandi ◽

Avishto Banerjee ◽

...

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Computational Linguistics ◽

Language Processing ◽

Research Work ◽

Comparative Approach ◽

Support Vector ◽

Affective States ◽

Twitter Data ◽

Social Media Platforms

Sentiment analysis is the methodical recognition, extraction, quantification, and learning of affective states and subjective information using natural language processing, text analysis, computational linguistics, and biometrics. People frequently use Twitter, one of numerous popular social media platforms, to convey their thoughts and opinions about a business, a product, or a service. Analysis of tweet sentiments is particularly useful in detecting if people have a good, negative, or neutral opinion. This study assesses public opinion about an individual, activity, commodity, or organization. The Twitter API is utilised in this article to directly get tweets from Twitter and develop a sentiment categorization for the tweets. This paper has used Twitter data for two separate approaches, viz., Lexicon & Machine Learning. Lexicon based approach further categorized in Corpus-based and Dictionary-based. And various Machine learning-based approaches like Support Vector Machine (SVM), Naïve Bayes, Maximum entropy are used to analyse Twitter data. Neural Network (NN), Decision tree-based sentiment analysis is also covered in this research work, to find out better accuracy of the approaches in the various data range. Graphs and confusion matrices are used to visualise the results of the analysis for positive, negative, and neutral remarks regarding their opinions.

Download Full-text

Approach for social media content-based analysis for vacation resorts

Journal of Communications Software and Systems ◽

10.24138/jcomss.v15i3.712 ◽

2019 ◽

Vol 15 (3) ◽

Author(s):

Snezhana Sulova ◽

Boris Bankov

Keyword(s):

Social Media ◽

Language Processing ◽

Text Processing ◽

Automated Analysis ◽

New Approach ◽

Useful Knowledge ◽

Media Messages ◽

Customer Feedback ◽

Processing Techniques ◽

The Impact

The impact of social networks on our liveskeeps increasing because they provide content,generated and controlled by users, that is constantly evolving. They aid us in spreading news, statements, ideas and comments very quickly. Social platforms are currently one of the richest sources of customer feedback on a variety of topics. A topic that is frequently discussed is the resort and holiday villages and the tourist services offered there. Customer comments are valuable to both travel planners and tour operators. The accumulation of opinions in the web space is a prerequisite for using and applying appropriate tools for their computer processing and for extracting useful knowledge from them. While working with unstructured data, such as social media messages, there isn’t a universal text processing algorithm because each social network and its resources have their own characteristics. In this article, we propose a new approach for an automated analysis of a static set of historical data of user messages about holiday and vacation resorts, published on Twitter. The approach is based on natural language processing techniques and the application of machine learning methods. The experiments are conducted using softwareproduct RapidMiner.

Download Full-text

Whether the Weather Will Help Us Weather the COVID-19 Pandemic: Using Machine Learning to Measure Twitter Users' Perceptions

10.1101/2020.07.29.20164814 ◽

2020 ◽

Author(s):

Marichi Gupta ◽

Adity Bansal ◽

Bhav Jain ◽

Jillian Rochelle ◽

Atharv Oak ◽

...

Keyword(s):

Public Health ◽

Machine Learning ◽

Language Processing ◽

Scientific Evidence ◽

The Public ◽

Potential Impact ◽

Twitter Users ◽

Processing Techniques ◽

The Impact ◽

Weather’S Impact

Objective: The potential ability for weather to affect SARS-CoV-2 transmission has been an area of controversial discussion during the COVID-19 pandemic. Individuals' perceptions of the impact of weather can inform their adherence to public health guidelines; however, there is no measure of their perceptions. We quantified Twitter users' perceptions of the effect of weather and analyzed how they evolved with respect to real-world events and time. Materials and Methods: We collected 166,005 tweets posted between January 23 and June 22, 2020 and employed machine learning/natural language processing techniques to filter for relevant tweets, classify them by the type of effect they claimed, and identify topics of discussion. Results: We identified 28,555 relevant tweets and estimate that 40.4% indicate uncertainty about weather's impact, 33.5% indicate no effect, and 26.1% indicate some effect. We tracked changes in these proportions over time. Topic modeling revealed major latent areas of discussion. Discussion: There is no consensus among the public for weather's potential impact. Earlier months were characterized by tweets that were uncertain of weather's effect or claimed no effect; later, the portion of tweets claiming some effect of weather increased. Tweets claiming no effect of weather comprised the largest class by June. Major topics of discussion included comparisons to influenza's seasonality, President Trump's comments on weather's effect, and social distancing. Conclusion: There is a major gap between scientific evidence and public opinion of weather's impacts on COVID-19. We provide evidence of public's misconceptions and topics of discussion, which can inform public health communications.

Download Full-text

Arabic Sentiment Analysis on Chewing Khat Leaves using Machine Learning and Ensemble Methods

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.4026 ◽

2021 ◽

Vol 11 (2) ◽

pp. 6845-6848

Author(s):

W. M. S. Yafooz ◽

E. A. Hizam ◽

W. A. Alromema

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Social Issues ◽

Ensemble Methods ◽

Support Vector ◽

Khat Chewing ◽

User Comments ◽

Arabic Sentiment Analysis ◽

Processing Techniques

Sentiment analysis plays an important role in obtaining speakers' opinions or feelings towards events, products, topics, or services, helping businesses to improve their products. Moreover, governments and organizations investigate and solve current social issues by analyzing perspectives and feelings. This study evaluated the habit of chewing Khat (qat) leaves among the Yemeni society. Chewing Khat plant leaves, is a common habit in Yemen and East Africa. This paper proposes a model to detect information about the Khat chewing habit, how people explore it, and the preference for Khat leaves among Arabic people. A dataset consisting of user comments on 18 youtube videos was prepared through several natural language processing techniques. Several experiments were conducted using six machine learning classifiers and four ensemble methods. Support Vector Machine and Linear Regression had almost 80% accuracy, whereas xgboot was the most accurate ensemble method reaching 77%.

Download Full-text

Design of text sentiment analysis tool using feature extraction based on fusing machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189478 ◽

2020 ◽

pp. 1-9

Author(s):

P. Ajitha ◽

A. Sivasangari ◽

R. Immanuel Rajkumar ◽

S. Poonguzhali

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Analysis Tool ◽

Sentiment Mining ◽

Processing Techniques ◽

Text Sentiment Analysis ◽

Insight Into

Text Sentiment Analysis is a system where text feeling polarity is positive or negative or neutral from a series of texts or documents or public opinions on a particular product or general subject. Using machine learning and natural language processing techniques, the current work aims to gain insight into sentiment mining on tweets. Text classification is accomplished using Machine Learning Algorithm-based fusion technique. This research suggested a system for grading feelings based on a lexicon. Bag-of-words (BOW) or lexicon-based methodology is currently the main standard way of modeling text for machine learning in sentiment analysis approaches. Marketers can use sentiment analysis to analyze their business and services, public opinion, or to evaluate customer satisfaction. Organizations can even use this analysis to gather significant feedback on issues related to newly released products. The main objective of this is to resolve the data overload problem.

Download Full-text

Analysis of Harassment Complaints to Detect Witness Intervention by Machine Learning and Soft Computing Techniques

Applied Sciences ◽

10.3390/app11178007 ◽

2021 ◽

Vol 11 (17) ◽

pp. 8007

Author(s):

Marina Alonso-Parra ◽

Cristina Puente ◽

Ana Laguna ◽

Rafael Palacios

Keyword(s):

Machine Learning ◽

Soft Computing ◽

Language Processing ◽

Public Awareness ◽

Free Text ◽

Soft Computing Techniques ◽

Awareness Raising ◽

Processing Techniques ◽

The Impact ◽

The City

This research is aimed to analyze textual descriptions of harassment situations collected anonymously by the Hollaback! project. Hollaback! is an international movement created to end harassment in all of its forms. Its goal is to collect stories of harassment through the web and a free app all around the world to elevate victims’ individual voices to find a societal solution. Hollaback! pretends to analyze the impact of a bystander during a harassment in order to launch a public awareness-raising campaign to equip everyday people with tools to undo harassment. Thus, the analysis presented in this paper is a first step in Hollaback!’s purpose: the automatic detection of a witness intervention inferred from the victim’s own report. In a first step, natural language processing techniques were used to analyze the victim’s free-text descriptions. For this part, we used the whole dataset with all its countries and locations. In addition, classification models, based on machine learning and soft computing techniques, were developed in the second part of this study to classify the descriptions into those that have bystander presence and those that do not. For this machine learning part, we selected the city of Madrid as an example, in order to establish a criterion of the witness behavior procedure.

Download Full-text

Twitter Sentiment Analysis towards COVID-19 Vaccines in the Philippines Using Naïve Bayes

Information ◽

10.3390/info12050204 ◽

2021 ◽

Vol 12 (5) ◽

pp. 204

Author(s):

Charlyn Villavicencio ◽

Julio Jerison Macrohon ◽

X. Alphonse Inbaraj ◽

Jyh-Horng Jeng ◽

Jer-Guang Hsieh

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Data Science ◽

Naive Bayes ◽

The Philippines ◽

Naïve Bayes ◽

Social Networking Site ◽

Bayes Model ◽

The Government ◽

Processing Techniques

A year into the COVID-19 pandemic and one of the longest recorded lockdowns in the world, the Philippines received its first delivery of COVID-19 vaccines on 1 March 2021 through WHO’s COVAX initiative. A month into inoculation of all frontline health professionals and other priority groups, the authors of this study gathered data on the sentiment of Filipinos regarding the Philippine government’s efforts using the social networking site Twitter. Natural language processing techniques were applied to understand the general sentiment, which can help the government in analyzing their response. The sentiments were annotated and trained using the Naïve Bayes model to classify English and Filipino language tweets into positive, neutral, and negative polarities through the RapidMiner data science software. The results yielded an 81.77% accuracy, which outweighs the accuracy of recent sentiment analysis studies using Twitter data from the Philippines.

Download Full-text

A Natural Language Processing Approach to Measuring Treatment Adherence and Consistency Using Semantic Similarity

AERA Open ◽

10.1177/23328584211028615 ◽

2021 ◽

Vol 7 ◽

pp. 233285842110286

Author(s):

Kylie L. Anglin ◽

Vivian C. Wong ◽

Arielle Boguslav

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Intervention Implementation ◽

Proof Of Concept ◽

Coaching Intervention ◽

Processing Techniques ◽

Teacher Coaching ◽

The Impact

Though there is widespread recognition of the importance of implementation research, evaluators often face intense logistical, budgetary, and methodological challenges in their efforts to assess intervention implementation in the field. This article proposes a set of natural language processing techniques called semantic similarity as an innovative and scalable method of measuring implementation constructs. Semantic similarity methods are an automated approach to quantifying the similarity between texts. By applying semantic similarity to transcripts of intervention sessions, researchers can use the method to determine whether an intervention was delivered with adherence to a structured protocol, and the extent to which an intervention was replicated with consistency across sessions, sites, and studies. This article provides an overview of semantic similarity methods, describes their application within the context of educational evaluations, and provides a proof of concept using an experimental study of the impact of a standardized teacher coaching intervention.

Download Full-text

Exploring Impact of Age and Gender on Sentiment Analysis Using Machine Learning

Electronics ◽

10.3390/electronics9020374 ◽

2020 ◽

Vol 9 (2) ◽

pp. 374 ◽

Cited By ~ 2

Author(s):

Sudhanshu Kumar ◽

Monika Gahalawat ◽

Partha Pratim Roy ◽

Debi Prosad Dogra ◽

Byung-Gyu Kim

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Age Groups ◽

Modern World ◽

Support Vector ◽

Digital Information ◽

Age And Gender ◽

And Gender ◽

The Impact

Sentiment analysis is a rapidly growing field of research due to the explosive growth in digital information. In the modern world of artificial intelligence, sentiment analysis is one of the essential tools to extract emotion information from massive data. Sentiment analysis is applied to a variety of user data from customer reviews to social network posts. To the best of our knowledge, there is less work on sentiment analysis based on the categorization of users by demographics. Demographics play an important role in deciding the marketing strategies for different products. In this study, we explore the impact of age and gender in sentiment analysis, as this can help e-commerce retailers to market their products based on specific demographics. The dataset is created by collecting reviews on books from Facebook users by asking them to answer a questionnaire containing questions about their preferences in books, along with their age groups and gender information. Next, the paper analyzes the segmented data for sentiments based on each age group and gender. Finally, sentiment analysis is done using different Machine Learning (ML) approaches including maximum entropy, support vector machine, convolutional neural network, and long short term memory to study the impact of age and gender on user reviews. Experiments have been conducted to identify new insights into the effect of age and gender for sentiment analysis.

Download Full-text