A Review of Urdu Sentiment Analysis with Multilingual Perspective: A Case of Urdu and Roman Urdu Language

Research efforts in the field of sentiment analysis have exponentially increased in the last few years due to its applicability in areas such as online product purchasing, marketing, and reputation management. Social media and online shopping sites have become a rich source of user-generated data. Manufacturing, sales, and marketing organizations are progressively turning their eyes to this source to get worldwide feedback on their activities and products. Millions of sentences in Urdu and Roman Urdu are posted daily on social sites, such as Facebook, Instagram, Snapchat, and Twitter. Disregarding people’s opinions in Urdu and Roman Urdu and considering only resource-rich English language leads to the vital loss of this vast amount of data. Our research focused on collecting research papers related to Urdu and Roman Urdu language and analyzing them in terms of preprocessing, feature extraction, and classification techniques. This paper contains a comprehensive study of research conducted on Roman Urdu and Urdu text for a product review. This study is divided into categories, such as collection of relevant corpora, data preprocessing, feature extraction, classification platforms and approaches, limitations, and future work. The comparison was made based on evaluating different research factors, such as corpus, lexicon, and opinions. Each reviewed paper was evaluated according to some provided benchmarks and categorized accordingly. Based on results obtained and the comparisons made, we suggested some helpful steps in a future study.

Download Full-text

A Check on Annotation in Sentiment Research

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1065.0882s819 ◽

2019 ◽

Vol 8 (2S8) ◽

pp. 1346-1350

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

English Language ◽

Research Area ◽

Research Literature ◽

Manual Annotation ◽

Automatic Annotation ◽

Annotation Method ◽

New Concepts

The research literature on sentiment analysis methodologies has exponentially grown in recent years. In any research area, where new concepts and techniques are constantly introduced, it is, therefore, of interest to analyze the latest trends in this literature. In particular, we have chosen to primarily focus on the literature of the last five years, on annotation methodologies, including frequently used datasets and from which they were obtained. Based on the survey, it appears that researchers do more manual annotation in the formation of sentiment corpus. As for the dataset, there are still many uses of English language taken from social media such as Twitter. In this area of research, there are still many that need to be explored, such as the use of semi-automatic annotation method that is still very rarely used by researchers. Also, less popular languages, such as Malay, Korean, Japanese, and so on, still require corpus for sentiment analysis research.

Download Full-text

Polarity Classification Tool for Sentiment Analysis in Malay Language

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v8.i3.pp259-263 ◽

2019 ◽

Vol 8 (3) ◽

pp. 259

Author(s):

Normi Sham Awang Abu Bakar ◽

Ros Aziehan Rahmat ◽

Umar Faruq Othman

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

English Language ◽

Sentiment Lexicon ◽

The Social ◽

Polarity Classification ◽

Media Channels ◽

Classification Tool

<p>The popularity of the social media channels has increased the interest among researchers in the sentiment analysis(SA) area. One aspect of the SA research is the determination of the polarity of the comments in the social media, i.e. positive, negative, and neutral. However, there is a scarcity of Malay sentiment analysis tools because most of the work in the literature discuss the polarity classification tool in English. This paper presents the development of a polarity classification tool called Malay Polarity Classification Tool(MaCT). This tool is developed based on the AFINN sentiment lexicon for English language. We have attempted to translate each word in AFINN to its Malay equivalent and later, use the lexicon to collect the sentiment data from Twitter. The Twitter data are then classified into positive, negative, and neutral. For the validation purpose, we collect 400 positive tweets, 400 negative tweets, and 200 neutral tweets, and later, run the tweets through our sentiment lexicon and found 90% score for precision, recall and accuracy. Our main contribution in the research is the new AFINN translation for Malay language and also the classification of the sentiment data.</p>

Download Full-text

Let’s play on Facebook: using sentiment analysis and social media metrics to measure the success of YouTube gamers’ post types

Personal and Ubiquitous Computing ◽

10.1007/s00779-019-01361-7 ◽

2019 ◽

Cited By ~ 2

Author(s):

Flora Poecze ◽

Claus Ebster ◽

Christine Strauss

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Language Processing ◽

Nearest Neighbor ◽

Consumer Feedback ◽

Youtube Videos ◽

The Masses ◽

Processing Techniques ◽

Facebook Pages ◽

Future Work

AbstractThis paper discusses the analysis results of successful self-marketing techniques on Facebook pages in the cases of three YouTube gamers: PewDiePie, Markiplier, and Kwebbelkop. The research focus was to identify significant differences in terms of the gamers’ user-generated Facebook metrics and commentary sentiments. Analysis of variance (ANOVA) and k-nearest neighbor sentiment analysis were employed as core research methods. ANOVA of the classified post categories revealed that photos tended to show significantly more user-generated interactions than other post types, while, on the other hand, re-posted YouTube videos gained significantly fewer numbers in the retrieved metrics than other content types. K-nearest neighbor sentiment analysis pointed out underlying follower negativity in cases where user-generated activity was relatively low, thereby improving the understanding of the opinion of the masses previously hidden behind metrics such as the number of likes, comments, and shares. The paper at hand highlights the methodological design of the study as well as a detailed discussion of key findings and their implications, and future work. The results per se indicate the need to utilize natural language processing techniques to optimize brand communication on social media and highlight the importance of considering machine learning sentiment analysis techniques for a better understanding of consumer feedback.

Download Full-text

SENTIMENT ANALYSIS ON TWITTER BY USING MAXIMUM ENTROPY AND SUPPORT VECTOR MACHINE METHOD

SINERGI ◽

10.22441/sinergi.2020.2.002 ◽

2020 ◽

Vol 24 (2) ◽

pp. 87

Author(s):

Mona Cindo ◽

Dian Palupi Rini ◽

Ermatita Ermatita

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Extraction ◽

Product Quality ◽

Sentiment Analysis ◽

Maximum Entropy ◽

The Other ◽

Classification Method ◽

Support Vector ◽

General Opinion

With the advancement of social media and its growth, there is a lot of data that can be presented for research in social mining. Twitter is a microblogging that can be used. In this event, a lot of companies used the data on Twitter to analyze the satisfaction of their customer about product quality. On the other hand, a lot of users use social media to express their daily emotions. The case can be developed into a research study that can be used both to improve product quality, as well as to analyze the opinion on certain events. The research is often called sentiment analysis or opinion mining. While The previous research does a particularly useful feature for sentiment analysis, but it is still a lack of performance. Furthermore, they used Support Vector Machine as a classification method. On the other hand, most researchers found another classification method, which is considered more efficient such as Maximum Entropy. So, this research used two types of a dataset, the general opinion data, and the airline's opinion data. For feature extraction, we employ four feature extraction, such as pragmatic, lexical-grams, pos-grams, and sentiment lexical. For the classification, we use both of Support Vector Machine and Maximum Entropy to find the best result. In the end, the best result is performed by Maximum Entropy with 85,8% accuracy on general opinion data, and 92,6% accuracy on airlines opinion data.

Download Full-text

Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2021040104 ◽

2021 ◽

Vol 17 (2) ◽

pp. 59-78

Author(s):

Shailendra Kumar Singh ◽

Manoj Kumar Sachan

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Social Networking Sites ◽

Large Scale ◽

English Language ◽

Product Reviews ◽

Internet Users ◽

Social Media Text ◽

Analysis System

The rapid growth of internet facilities has increased the comments, posts, blogs, feedback, etc., on a large scale on social networking sites. These social media data are available in an unstructured form, which includes images, text, and videos. The processing of these data is difficult, but some sentiment analysis, information retrieval, and recommender systems are used to process these unstructured data. To extract the opinion and sentiment of internet users from their written social media text, a sentiment analysis system is required to develop, which can work on both monolingual and bilingual phonetic text. Therefore, a sentiment analysis (SA) system is developed, which performs well on different domain datasets. The system performance is tested on four different datasets and achieved better accuracy of 3% on social media datasets, 1.5% on movie reviews, 1.35% on Amazon product reviews, and 4.56% on large Amazon product reviews than the state-of-art techniques. Also, the stemmer (StemVerb) for verbs of the English language is proposed, which improves the SA system's performance.

Download Full-text

Literature Review of The influence of Integrating Social Media as E-Learning Tools into Teaching and Learning the English Language

Advances in Social Sciences Research Journal ◽

10.14738/assrj.77.8593 ◽

2020 ◽

Vol 7 (7) ◽

pp. 172-177

Author(s):

Nada Alsheehri ◽

Hayfa Ali Alenezi ◽

Awatif AlMutairi ◽

Riam K. Almaqrn

Keyword(s):

Social Media ◽

Literature Review ◽

Teaching And Learning ◽

English Language ◽

Learning Tools ◽

New Strategy ◽

Language Classrooms ◽

E Learning ◽

Future Work ◽

The Impact

This study as a literature review aimed to explore the impact of the integration of social media (SM) tools into universities’ English language classrooms. Currently, the English language plays an essential and fundamental role in education. Therefore, it is crucial for learners to obtain a good understanding of the English language in the academic environment. Teaching students by traditional methods might not help to improve their English language skills. One technique that can be applied to improve learning skills quickly is utilising E-learning tools and SM in teaching strategies. Hence, this paper examines the influence of the integration of E-learning tools and strategies such as SM in teaching and learning English language in Saudi university classes. This new strategy has been chosen as it allows learners to access information anywhere without restrictions of time, and it can support them to obtain an in-depth understanding of English language content. Finally, this study will provide some recommendations and suggestions for future work.

Download Full-text

Sentiment Analysis of Cyberbullying on Instagram User Comments

Journal of Data Science and Its Applications ◽

10.21108/jdsa.2019.2.20 ◽

2019 ◽

Vol 2 (1) ◽

pp. 88-98 ◽

Cited By ~ 1

Author(s):

Muhammad Zidny Naf'an ◽

Alhamda Adisoka Bimantara ◽

Afiatari Larasati ◽

Ezar Mega Risondang ◽

Novanda Alim Setya Nugraha

Keyword(s):

Social Media ◽

Feature Extraction ◽

Sentiment Analysis ◽

Cross Validation ◽

Experimental Results ◽

Training Data ◽

Bayes Classifier ◽

User Comments ◽

One Act ◽

Fold Cross Validation

Instagram is a social media for sharing images, photos and videos. Instagram has many active users from various circles. In addition to sharing submissions, Instagram users can also give likes and comments to other users' posts. However, the comment feature is often misused, for example it is used for cyberbullying which includes one act against the law. But until now, Instagram still does not provide a feature to detect cyberbullying. Therefore, this study aims to create a system that can classify comments whether they contain elements of cyberbullying or not. The results of the classification will be used to detect cyberbullying comments. The algorithm used for classification is Naïve Bayes Classifier. Then for each comment will pass the preprocessing and feature extraction stages with the TF-IDF method. For evaluation and testing using the K-Fold Cross Validation method. The experiment is divided into two, namely using stemming and without stemming. The training data used is 455 data. The best experimental results obtained an accuracy of 84% both with stemming, and without stemming.

Download Full-text

Ensemble Classifiers for Arabic Sentiment Analysis of Social Network (Twitter Data) towards COVID-19-Related Conspiracy Theories

Applied Computational Intelligence and Soft Computing ◽

10.1155/2022/6614730 ◽

2022 ◽

Vol 2022 ◽

pp. 1-10

Author(s):

Abdullah Al-Hashedi ◽

Belal Al-Fuhaidi ◽

Abdulqader M. Mohsen ◽

Yousef Ali ◽

Hasan Ali Gamal Al-Kaf ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

English Language ◽

Word Embedding ◽

Ensemble Classifiers ◽

Conspiracy Theories ◽

Machine Learning Model ◽

Arabic Sentiment Analysis ◽

The Impact

Sentiment analysis has recently become increasingly important with a massive increase in online content. It is associated with the analysis of textual data generated by social media that can be easily accessed, obtained, and analyzed. With the emergence of COVID-19, most published studies related to COVID-19’s conspiracy theories were surveys on the people's sentiments and opinions and studied the impact of the pandemic on their lives. Just a few studies utilized sentiment analysis of social media using a machine learning approach. These studies focused more on sentiment analysis of Twitter tweets in the English language and did not pay more attention to other languages such as Arabic. This study proposes a machine learning model to analyze the Arabic tweets from Twitter. In this model, we apply Word2Vec for word embedding which formed the main source of features. Two pretrained continuous bag-of-words (CBOW) models are investigated, and Naïve Bayes was used as a baseline classifier. Several single-based and ensemble-based machine learning classifiers have been used with and without SMOTE (synthetic minority oversampling technique). The experimental results show that applying word embedding with an ensemble and SMOTE achieved good improvement on average of F1 score compared to the baseline classifier and other classifiers (single-based and ensemble-based) without SMOTE.

Download Full-text

A Comprehensive Study on Lexicon Based Approaches for Sentiment Analysis

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2019.8.s2.2037 ◽

2019 ◽

Vol 8 (S2) ◽

pp. 1-6

Author(s):

Venkateswarlu Bonta ◽

Nandhini Kumaresh ◽

N. Janardhan

Keyword(s):

Social Media ◽

Public Opinion ◽

Sentiment Analysis ◽

Human Activities ◽

Analysis Tool ◽

Political Systems ◽

Cornell University ◽

Polarity Score ◽

The Web ◽

Comprehensive Study

In recent years, it is seen that the opinion-based postings in social media are helping to reshape business and public sentiments, and emotions have an impact on our social and political systems. Opinions are central to mostly all human activities as they are the key influencers of our behaviour. Whenever we need to make a decision, we generally want to know others opinion. Every organization and business always wants to find customer or public opinion about their products and services. Thus, it is necessary to grab and study the opinions on the Web. However, finding and monitoring sites on the web and distilling the reviews remains a big task because each site typically contains a huge volume of opinion text and the average human reader will have difficulty in identifying the polarity of each review and summarizing the opinions in them. Hence, it needs the automated sentiment analysis to find the polarity score and classify the reviews as positive or negative. This article uses NLTK, Text blob and VADER Sentiment analysis tool to classify the movie reviews which are downloaded from the website www.rottentomatoes.com that is provided by the Cornell University, and makes a comparison on these tools to find the efficient one for sentiment classification. The experimental results of this work confirm that VADER outperforms the Text blob.

Download Full-text

Language Identification for Multilingual Sentiment Examination

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1444.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3571-3576

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

English Language ◽

Language Identification ◽

Parts Of Speech ◽

Analysis Task ◽

E Learning ◽

Media Platform ◽

Speech Tagging ◽

Text Sentiment Analysis

Social media is most popular platform on which users can share their views, reviews and knowledge about various topics, news, products etc. Identifying sentiments or opinions of users is valuable for many e-commerce companies, Hotels, e-learning etc. This opinion analysis is useful for companies to improve their service and products. Due to increase in web users across globe, users happen to post their views freely over the internet. Many different languages are spoken across globe, supporting multilingual nature of social media makes analysis of such text difficult. Sentiment analysis can be conducted using videos, image, text, where text sentiment analysis is most popular form because of freely available contents in the form of blogs, reviews, comments etc. Because of development of social media platform, people can post comment in any language, creates the need for Multilingual sentiment analysis. Sentiment analysis task needs phases such as data collection, pre-processing, sentiment classification and polarity identification. The Multilingual nature needs Script Identification on the input text by labelling the different words used in text along with scripts used to denote them. Various languages used in the text are identified and the Hindi language text written in Romanized script is transliterated to Devanagari script. Text is then completely translated into English language and POS(Parts of Speech) tagging is performed on the obtained text. The aim and purpose of this study is to survey different techniques of multilingual sentiment analysis, and language identification of source text, where n-grams model outperforms all.

Download Full-text