Identifying vulgarity in Bengali social media textual content

PeerJ Computer Science ◽

10.7717/peerj-cs.665 ◽

2021 ◽

Vol 7 ◽

pp. e665

Author(s):

Salim Sazzed

Keyword(s):

Social Media ◽

Gradient Descent ◽

Short Term Memory ◽

Stochastic Gradient Descent ◽

Media Content ◽

Short Term ◽

Long Short Term Memory ◽

Highly Correlated ◽

Negative Sentiment ◽

Textual Content

The presence of abusive and vulgar language in social media has become an issue of increasing concern in recent years. However, research pertaining to the prevalence and identification of vulgar language has remained largely unexplored in low-resource languages such as Bengali. In this paper, we provide the first comprehensive analysis on the presence of vulgarity in Bengali social media content. We develop two benchmark corpora consisting of 7,245 reviews collected from YouTube and manually annotate them into vulgar and non-vulgar categories. The manual annotation reveals the ubiquity of vulgar and swear words in Bengali social media content (i.e., in two corpora), ranging from 20% to 34%. To automatically identify vulgarity, we employ various approaches, such as classical machine learning (CML) classifiers, Stochastic Gradient Descent (SGD) optimizer, a deep learning (DL) based architecture, and lexicon-based methods. Although small in size, we find that the swear/vulgar lexicon is effective at identifying the vulgar language due to the high presence of some swear terms in Bengali social media. We observe that the performances of machine leanings (ML) classifiers are affected by the class distribution of the dataset. The DL-based BiLSTM (Bidirectional Long Short Term Memory) model yields the highest recall scores for identifying vulgarity in both datasets (i.e., in both original and class-balanced settings). Besides, the analysis reveals that vulgarity is highly correlated with negative sentiment in social media comments.

Download Full-text

Credibility of Social-Media Content Using Bidirectional Long Short-Term Memory-Recurrent Neural Networks

10.1109/icetci51973.2021.9574061 ◽

2021 ◽

Author(s):

Sai Parichit Akula ◽

Nagendra Kamati

Keyword(s):

Neural Networks ◽

Social Media ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Media Content ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Fake news in social media recognition using Modified Long Short-Term Memory network

Security in IoT Social Networks ◽

10.1016/b978-0-12-821599-9.00009-1 ◽

2021 ◽

pp. 205-227

Author(s):

Sivaranjani Reddi ◽

G.V. Eswar

Keyword(s):

Social Media ◽

Short Term Memory ◽

Fake News ◽

Short Term ◽

Term Memory ◽

Memory Network ◽

Long Short Term Memory

Download Full-text

Detecting Fake News Over Job Posts via Bi-Directional Long Short-Term Memory (BIDLSTM)

International Journal of Web-Based Learning and Teaching Technologies ◽

10.4018/ijwltt.287096 ◽

2021 ◽

Vol 16 (6) ◽

pp. 1-18

Author(s):

T. V. Divya ◽

Barnali Gupta Banik

Keyword(s):

Social Media ◽

Performance Metrics ◽

Short Term Memory ◽

Word Embedding ◽

Support Vector ◽

Fake News ◽

Short Term ◽

Term Memory ◽

Online Social Media ◽

Long Short Term Memory

Fake news detection on job advertisements has grabbed the attention of many researchers over past decade. Various classifiers such as Support Vector Machine (SVM), XGBoost Classifier and Random Forest (RF) methods are greatly utilized for fake and real news detection pertaining to job advertisement posts in social media. Bi-Directional Long Short-Term Memory (Bi-LSTM) classifier is greatly utilized for learning word representations in lower-dimensional vector space and learning significant words word embedding or terms revealed through Word embedding algorithm. The fake news detection is greatly achieved along with real news on job post from online social media is achieved by Bi-LSTM classifier and thereby evaluating corresponding performance. The performance metrics such as Precision, Recall, F1-score, and Accuracy are assessed for effectiveness by fraudulency based on job posts. The outcome infers the effectiveness and prominence of features for detecting false news. .

Download Full-text

Comparative Analysis of Deep Learning Techniques for the Classification of Hate Speech

NIGERIAN ANNALS OF PURE AND APPLIED SCIENCES ◽

10.46912/napas.227 ◽

2021 ◽

Vol 4 (1) ◽

pp. 121-128

Author(s):

A Iorliam ◽

S Agber ◽

MP Dzungwe ◽

DK Kwaghtyo ◽

S Bum

Keyword(s):

Neural Network ◽

Social Media ◽

Deep Learning ◽

Hate Speech ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Learning Techniques ◽

Or Groups ◽

Long Short Term Memory

Social media provides opportunities for individuals to anonymously communicate and express hateful feelings and opinions at the comfort of their rooms. This anonymity has become a shield for many individuals or groups who use social media to express deep hatred for other individuals or groups, tribes or race, religion, gender, as well as belief systems. In this study, a comparative analysis is performed using Long Short-Term Memory and Convolutional Neural Network deep learning techniques for Hate Speech classification. This analysis demonstrates that the Long Short-Term Memory classifier achieved an accuracy of 92.47%, while the Convolutional Neural Network classifier achieved an accuracy of 92.74%. These results showed that deep learning techniques can effectively classify hate speech from normal speech.

Download Full-text

Chrome Extension For Malicious URLs detection in Social Media Applications Using Artificial Neural Networks And Long Short Term Memory Networks

2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI) ◽

10.1109/icacci.2018.8554647 ◽

2018 ◽

Cited By ~ 2

Author(s):

S. Shivangi ◽

Pratyush Debnath ◽

K. Sajeevan ◽

D. Annapurna

Keyword(s):

Neural Networks ◽

Social Media ◽

Artificial Neural Networks ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

Artificial Neural ◽

Media Applications

Download Full-text

Recognizing Continuous and Discontinuous Adverse Drug Reaction Mentions from Social Media Using LSTM-CRF

Wireless Communications and Mobile Computing ◽

10.1155/2018/2379208 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Buzhou Tang ◽

Jianglu Hu ◽

Xiaolong Wang ◽

Qingcai Chen

Keyword(s):

Neural Networks ◽

Social Media ◽

Medical Information ◽

Conditional Random Fields ◽

Short Term Memory ◽

Knowledge Bases ◽

Short Term ◽

External Knowledge ◽

Long Short Term Memory ◽

First Time

Social media in medicine, where patients can express their personal treatment experiences by personal computers and mobile devices, usually contains plenty of useful medical information, such as adverse drug reactions (ADRs); mining this useful medical information from social media has attracted more and more attention from researchers. In this study, we propose a deep neural network (called LSTM-CRF) combining long short-term memory (LSTM) neural networks (a type of recurrent neural networks) and conditional random fields (CRFs) to recognize ADR mentions from social media in medicine and investigate the effects of three factors on ADR mention recognition. The three factors are as follows: (1) representation for continuous and discontinuous ADR mentions: two novel representations, that is, “BIOHD” and “Multilabel,” are compared; (2) subject of posts: each post has a subject (i.e., drug here); and (3) external knowledge bases. Experiments conducted on a benchmark corpus, that is, CADEC, show that LSTM-CRF achieves better F-score than CRF; “Multilabel” is better in representing continuous and discontinuous ADR mentions than “BIOHD”; both subjects of comments and external knowledge bases are individually beneficial to ADR mention recognition. To the best of our knowledge, this is the first time to investigate deep neural networks to mine continuous and discontinuous ADRs from social media.

Download Full-text

Malicious Text Identification: Deep Learning from Public Comments and Emails

Information ◽

10.3390/info11060312 ◽

2020 ◽

Vol 11 (6) ◽

pp. 312 ◽

Cited By ~ 1

Author(s):

Asma Baccouche ◽

Sadaf Ahmed ◽

Daniel Sierra-Sosa ◽

Adel Elmaghraby

Keyword(s):

Social Media ◽

Deep Learning ◽

Language Processing ◽

Short Term Memory ◽

Good Alternative ◽

Classification Problems ◽

Short Term ◽

Independent Dataset ◽

Proposed Model ◽

Long Short Term Memory

Identifying internet spam has been a challenging problem for decades. Several solutions have succeeded to detect spam comments in social media or fraudulent emails. However, an adequate strategy for filtering messages is difficult to achieve, as these messages resemble real communications. From the Natural Language Processing (NLP) perspective, Deep Learning models are a good alternative for classifying text after being preprocessed. In particular, Long Short-Term Memory (LSTM) networks are one of the models that perform well for the binary and multi-label text classification problems. In this paper, an approach merging two different data sources, one intended for Spam in social media posts and the other for Fraud classification in emails, is presented. We designed a multi-label LSTM model and trained it on the joint datasets including text with common bigrams, extracted from each independent dataset. The experiment results show that our proposed model is capable of identifying malicious text regardless of the source. The LSTM model trained with the merged dataset outperforms the models trained independently on each dataset.

Download Full-text

Spam detection in social media using convolutional and long short term memory neural network

Annals of Mathematics and Artificial Intelligence ◽

10.1007/s10472-018-9612-z ◽

2019 ◽

Vol 85 (1) ◽

pp. 21-44 ◽

Cited By ~ 26

Author(s):

Gauri Jain ◽

Manisha Sharma ◽

Basant Agarwal

Keyword(s):

Neural Network ◽

Social Media ◽

Short Term Memory ◽

Spam Detection ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Depression Analysis from Social Media Data in Bangla Language using Long Short Term Memory (LSTM) Recurrent Neural Network Technique

2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2) ◽

10.1109/ic4me247184.2019.9036528 ◽

2019 ◽

Cited By ~ 2

Author(s):

Abdul Hasib Uddin ◽

Durjoy Bapery ◽

Abu Shamim Mohammad Arif

Keyword(s):

Neural Network ◽

Social Media ◽

Recurrent Neural Network ◽

Short Term Memory ◽

Short Term ◽

Social Media Data ◽

Term Memory ◽

Long Short Term Memory ◽

Media Data

Download Full-text

ClickbaitTR: Dataset for clickbait detection from Turkish news sites and social media with a comparative analysis via machine learning algorithms

Journal of Information Science ◽

10.1177/01655515211007746 ◽

2021 ◽

pp. 016555152110077

Author(s):

Şura Genç ◽

Elif Surer

Keyword(s):

Machine Learning ◽

Social Media ◽

Logistic Regression ◽

Random Forest ◽

Short Term Memory ◽

Ensemble Classifier ◽

Machine Learning Algorithms ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Clickbait is a strategy that aims to attract people’s attention and direct them to specific content. Clickbait titles, created by the information that is not included in the main content or using intriguing expressions with various text-related features, have become very popular, especially in social media. This study expands the Turkish clickbait dataset that we had constructed for clickbait detection in our proof-of-concept study, written in Turkish. We achieve a 48,060 sample size by adding 8859 tweets and release a publicly available dataset – ClickbaitTR – with its open-source data analysis library. We apply machine learning algorithms such as Artificial Neural Network (ANN), Logistic Regression, Random Forest, Long Short-Term Memory Network (LSTM), Bidirectional Long Short-Term Memory (BiLSTM) and Ensemble Classifier on 48,060 news headlines extracted from Twitter. The results show that the Logistic Regression algorithm has 85% accuracy; the Random Forest algorithm has a performance of 86% accuracy; the LSTM has 93% accuracy; the ANN has 93% accuracy; the Ensemble Classifier has 93% accuracy; and finally, the BiLSTM has 97% accuracy. A thorough discussion is provided for the psychological aspects of clickbait strategy focusing on curiosity and interest arousal. In addition to a successful clickbait detection performance and the detailed analysis of clickbait sentences in terms of language and psychological aspects, this study also contributes to clickbait detection studies with the largest clickbait dataset in Turkish.

Download Full-text