Hatred and trolling detection transliteration framework using hierarchical LSTM in code-mixed social media text

AbstractThe paper describes the usage of self-learning Hierarchical LSTM technique for classifying hatred and trolling contents in social media code-mixed data. The Hierarchical LSTM-based learning is a novel learning architecture inspired from the neural learning models. The proposed HLSTM model is trained to identify the hatred and trolling words available in social media contents. The proposed HLSTM systems model is equipped with self-learning and predicting mechanism for annotating hatred words in transliteration domain. The Hindi–English data are ordered into Hindi, English, and hatred labels for classification. The mechanism of word embedding and character-embedding features are used here for word representation in the sentence to detect hatred words. The method developed based on HLSTM model helps in recognizing the hatred word context by mining the intention of the user for using that word in the sentence. Wide experiments suggests that the HLSTM-based classification model gives the accuracy of 97.49% when evaluated against the standard parameters like BLSTM, CRF, LR, SVM, Random Forest and Decision Tree models especially when there are some hatred and trolling words in the social media data.

Download Full-text

Urban Region Function Mining Service Based on Social Media Text Analysis

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021400088 ◽

2021 ◽

Vol 31 (04) ◽

pp. 563-586

Author(s):

Yanchun Sun ◽

Hang Yin ◽

Jiu Wen ◽

Zhiyu Sun

Keyword(s):

Social Media ◽

Text Analysis ◽

Human Activities ◽

Web Application ◽

Classification Model ◽

Urban Region ◽

Urban Regions ◽

Social Media Text ◽

Urban Function ◽

High Level

Urban region functions are the types of potential activities in an urban region, such as residence, commerce, transportation, entertainment, etc. A service which mines urban region functions is of great value for various applications, including urban planning and transportation management, etc. Many studies have been carried out to dig out different regions’ functions, but few studies are based on social media text analysis. Considering that the semantic information embedded in social media texts is very useful to infer an urban region’s main functions, we design a service which extracts human activities using Sina Weibo ( www.weibo.com ; the largest microblog system in Chinese, similar to Twitter) with location information and further describes a region’s main functions with a function vector based on the human activities. First, we predefine a variety of human activities to get the related activities corresponding to each Weibo post using an urban function classification model. Second, urban regions’ function vectors are generated, with which we can easily do some high-level work such as similar place recommendation. At last, with the function vectors generated, we develop a Web application for urban region function querying. We also conduct a case study among the urban regions in Beijing, and the experiment results demonstrate the feasibility of our method.

Download Full-text

Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213020500141 ◽

2020 ◽

Vol 29 (05) ◽

pp. 2050014

Author(s):

Anupam Jamatia ◽

Steve Durairaj Swamy ◽

Björn Gambäck ◽

Amitava Das ◽

Swapan Debbarma

Keyword(s):

Social Media ◽

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Mixed Data ◽

Code Mixing ◽

Language Representation ◽

The Social ◽

Social Media Text ◽

Traditional Approaches

Sentiment analysis is a circumstantial analysis of text, identifying the social sentiment to better understand the source material. The article addresses sentiment analysis of an English-Hindi and English-Bengali code-mixed textual corpus collected from social media. Code-mixing is an amalgamation of multiple languages, which previously mainly was associated with spoken language. However, social media users also deploy it to communicate in ways that tend to be somewhat casual. The coarse nature of social media text poses challenges for many language processing applications. Here, the focus is on the low predictive nature of traditional machine learners when compared to Deep Learning counterparts, including the contextual language representation model BERT (Bidirectional Encoder Representations from Transformers), on the task of extracting user sentiment from code-mixed texts. Three deep learners (a BiLSTM CNN, a Double BiLSTM and an Attention-based model) attained accuracy 20–60% greater than traditional approaches on code-mixed data, and were for comparison also tested on monolingual English data.

Download Full-text

Improving Chinese Word Representation in Social Media Text Using Four Corners Features

IEEE Transactions on Big Data ◽

10.1109/tbdata.2021.3106582 ◽

2021 ◽

pp. 1-1

Author(s):

Hai Jin ◽

Zhaobo Zhang ◽

Pingpeng Yuan

Keyword(s):

Social Media ◽

Chinese Word ◽

Four Corners ◽

Word Representation ◽

Social Media Text

Download Full-text

Comparison of SVM, RF and SGD Methods for Determination of Programmer's Performance Classification Model in Social Media Activities

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1770 ◽

2020 ◽

Vol 4 (2) ◽

pp. 329-335

Author(s):

Rusydi Umar ◽

Imam Riadi ◽

Purwono

Keyword(s):

Social Media ◽

Gradient Descent ◽

Classification Model ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Svm Algorithm ◽

Vector Machines ◽

Performance Patterns ◽

A Company

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.

Download Full-text

Modeling Factuality Judgments in Social Media Text

10.3115/v1/p14-2068 ◽

2014 ◽

Cited By ~ 9

Author(s):

Sandeep Soni ◽

Tanushree Mitra ◽

Eric Gilbert ◽

Jacob Eisenstein

Keyword(s):

Social Media ◽

Social Media Text

Download Full-text

Urban Region Function Mining Service Based on Social Media Text Analysis

2020 International Conference on Service Science (ICSS) ◽

10.1109/icss50103.2020.00034 ◽

2020 ◽

Author(s):

Yanchun Sun ◽

Hang Yin ◽

Jiu Wen ◽

Zhiyu Sun

Keyword(s):

Social Media ◽

Text Analysis ◽

Urban Region ◽

Social Media Text

Download Full-text

Bi-LSTM and Ensemble based Bilingual Sentiment Analysis for a Code-mixed Hindi-English Social Media Text

2020 IEEE 17th India Council International Conference (INDICON) ◽

10.1109/indicon49873.2020.9342241 ◽

2020 ◽

Author(s):

Konark Yadav ◽

Aashish Lamba ◽

Dhruv Gupta ◽

Ansh Gupta ◽

Purnendu Karmakar ◽

...

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Social Media Text

Download Full-text

Anaphora Resolution from Social Media Text in Indian Languages (SocAnaRes-IL)- Overview

Forum for Information Retrieval Evaluation ◽

10.1145/3441501.3441512 ◽

2020 ◽

Author(s):

Sobha Lalitha Devi

Keyword(s):

Social Media ◽

Anaphora Resolution ◽

Indian Languages ◽

Social Media Text

Download Full-text

Intelligent Detection of False Information in Arabic Tweets Utilizing Hybrid Harris Hawks Based Feature Selection and Machine Learning Models

Symmetry ◽

10.3390/sym13040556 ◽

2021 ◽

Vol 13 (4) ◽

pp. 556

Author(s):

Thaer Thaher ◽

Mahmoud Saheb ◽

Hamza Turabieh ◽

Hamouda Chantar

Keyword(s):

Machine Learning ◽

Social Media ◽

Feature Selection ◽

Language Processing ◽

User Profile ◽

Vital Role ◽

Classification Model ◽

Fake News ◽

False Information ◽

Social Media Platforms

Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset.

Download Full-text

Perceiving Residents’ Festival Activities Based on Social Media Data: A Case Study in Beijing, China

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070474 ◽

2021 ◽

Vol 10 (7) ◽

pp. 474

Author(s):

Bingqing Wang ◽

Bin Meng ◽

Juan Wang ◽

Siyu Chen ◽

Jian Liu

Keyword(s):

Social Media ◽

Language Processing ◽

Topic Model ◽

Central Area ◽

Classification Model ◽

Social Media Data ◽

Ring Road ◽

Different Types ◽

Spatial Differences ◽

Media Data

Social media data contains real-time expressed information, including text and geographical location. As a new data source for crowd behavior research in the era of big data, it can reflect some aspects of the behavior of residents. In this study, a text classification model based on the BERT and Transformers framework was constructed, which was used to classify and extract more than 210,000 residents’ festival activities based on the 1.13 million Sina Weibo (Chinese “Twitter”) data collected from Beijing in 2019 data. On this basis, word frequency statistics, part-of-speech analysis, topic model, sentiment analysis and other methods were used to perceive different types of festival activities and quantitatively analyze the spatial differences of different types of festivals. The results show that traditional culture significantly influences residents’ festivals, reflecting residents’ motivation to participate in festivals and how residents participate in festivals and express their emotions. There are apparent spatial differences among residents in participating in festival activities. The main festival activities are distributed in the central area within the Fifth Ring Road in Beijing. In contrast, expressing feelings during the festival is mainly distributed outside the Fifth Ring Road in Beijing. The research integrates natural language processing technology, topic model analysis, spatial statistical analysis, and other technologies. It can also broaden the application field of social media data, especially text data, which provides a new research paradigm for studying residents’ festival activities and adds residents’ perception of the festival. The research results provide a basis for the design and management of the Chinese festival system.

Download Full-text