An Enhancement of Malay Social Media Text Normalization for Lexicon-Based Sentiment Analysis

Author(s):  
Muhammad Fakhrur Razi Abu Bakar ◽  
Norisma Idris ◽  
Liyana Shuib

In this digitized world, the Internet has become a prominent source to glean various kinds of information. In today’s scenario, people prefer virtual reality instead of one to one communication. The Majority of the population prefers social networking sites to voice themselves through posts, blogs, comments, likes, dislikes. Their sentiments can be found/traced using opinion mining or Sentiment analysis. Sentiment analysis of social media text is a useful technique for identifying peoples’ positive, negative or neutral emotions/sentiments/opinions. Sentiment analysis has gained special attention by researchers from last few years. Traditionally many machine learning algorithms were used to implement it like navie bays, Support Vector Machine and many more. But to overcome the drawbacks of ML in terms of complex classification algorithms different deep learning-based algorithms are introduced like CNN, RNN, and HNN. In this paper, we have studied different deep learning algorithms and intended to propose a deep learning-based model to analyze the behavior of an individual using social media text. Results given by the proposed model can utilize in a range of different fields like business, education, industry, politics, psychology, security, etc.


2020 ◽  
Vol 13 (4) ◽  
pp. 407-435
Author(s):  
Jagroop Kaur ◽  
Jaswinder Singh

PurposeNormalization is an important step in all the natural language processing applications that are handling social media text. The text from social media poses a different kind of problems that are not present in regular text. Recently, a considerable amount of work has been done in this direction, but mostly in the English language. People who do not speak English code mixed the text with their native language and posted text on social media using the Roman script. This kind of text further aggravates the problem of normalizing. This paper aims to discuss the concept of normalization with respect to code-mixed social media text, and a model has been proposed to normalize such text.Design/methodology/approachThe system is divided into two phases – candidate generation and most probable sentence selection. Candidate generation task is treated as machine translation task where the Roman text is treated as source language and Gurmukhi text is treated as the target language. Character-based translation system has been proposed to generate candidate tokens. Once candidates are generated, the second phase uses the beam search method for selecting the most probable sentence based on hidden Markov model.FindingsCharacter error rate (CER) and bilingual evaluation understudy (BLEU) score are reported. The proposed system has been compared with Akhar software and RB\_R2G system, which are also capable of transliterating Roman text to Gurmukhi. The performance of the system outperforms Akhar software. The CER and BLEU scores are 0.268121 and 0.6807939, respectively, for ill-formed text.Research limitations/implicationsIt was observed that the system produces dialectical variations of a word or the word with minor errors like diacritic missing. Spell checker can improve the output of the system by correcting these minor errors. Extensive experimentation is needed for optimizing language identifier, which will further help in improving the output. The language model also seeks further exploration. Inclusion of wider context, particularly from social media text, is an important area that deserves further investigation.Practical implicationsThe practical implications of this study are: (1) development of parallel dataset containing Roman and Gurmukhi text; (2) development of dataset annotated with language tag; (3) development of the normalizing system, which is first of its kind and proposes translation based solution for normalizing noisy social media text from Roman to Gurmukhi. It can be extended for any pair of scripts. (4) The proposed system can be used for better analysis of social media text. Theoretically, our study helps in better understanding of text normalization in social media context and opens the doors for further research in multilingual social media text normalization.Originality/valueExisting research work focus on normalizing monolingual text. This study contributes towards the development of a normalization system for multilingual text.


2021 ◽  
Vol 2070 (1) ◽  
pp. 012079
Author(s):  
V Jagadishwari ◽  
A Indulekha ◽  
Kiran Raghu ◽  
P Harshini

Abstract Social Media is an arena in recent times for people to share their perspectives on a variety of topics. Most of the social interactions are through the Social Media. Though all the Online Social Networks allow users to express their views and opinions in many forms like audio, video, text etc, the most popular form of expression is text, Emoticons and Emojis. The work presented in this paper aims at detecting the sentiments expressed in the Social Media posts. The Machine Learning Models namely Bernoulli Bayes, Multinomial Bayes, Regression and SVM were implemented. All these models were trained and tested with Twitter Data sets. Users on Twitter express their opinions in the form of tweets with limited characters. Tweets also contain Emoticons and Emojis therefore Twitter data sets are best suited for the sentiment analysis. The effect of emoticons present in the tweet is also analyzed. The models are first trained only with the text and then they are trained with text and emoticon in the tweet. The performance of all the four models in both cases are tested and the results are presented in the paper.


Sign in / Sign up

Export Citation Format

Share Document