scholarly journals Cyberbullying detection: advanced preprocessing techniques & deep learning architecture for Roman Urdu data

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Amirita Dewani ◽  
Mohsin Ali Memon ◽  
Sania Bhatti

AbstractSocial media have become a very viable medium for communication, collaboration, exchange of information, knowledge, and ideas. However, due to anonymity preservation, the incidents of hate speech and cyberbullying have been diversified across the globe. This intimidating problem has recently sought the attention of researchers and scholars worldwide and studies have been undertaken to formulate solution strategies for automatic detection of cyberaggression and hate speech, varying from machine learning models with vast features to more complex deep neural network models and different SN platforms. However, the existing research is directed towards mature languages and highlights a huge gap in newly embraced resource poor languages. One such language that has been recently adopted worldwide and more specifically by south Asian countries for communication on social media is Roman Urdu i-e Urdu language written using Roman scripting. To address this research gap, we have performed extensive preprocessing on Roman Urdu microtext. This typically involves formation of Roman Urdu slang- phrase dictionary and mapping slangs after tokenization. We have also eliminated cyberbullying domain specific stop words for dimensionality reduction of corpus. The unstructured data were further processed to handle encoded text formats and metadata/non-linguistic features. Furthermore, we performed extensive experiments by implementing RNN-LSTM, RNN-BiLSTM and CNN models varying epochs executions, model layers and tuning hyperparameters to analyze and uncover cyberbullying textual patterns in Roman Urdu. The efficiency and performance of models were evaluated using different metrics to present the comparative analysis. Results highlight that RNN-LSTM and RNN-BiLSTM performed best and achieved validation accuracy of 85.5 and 85% whereas F1 score was 0.7 and 0.67 respectively over aggression class.

2020 ◽  
Vol 34 (05) ◽  
pp. 9282-9289
Author(s):  
Qingyang Wu ◽  
Lei Li ◽  
Hao Zhou ◽  
Ying Zeng ◽  
Zhou Yu

Many social media news writers are not professionally trained. Therefore, social media platforms have to hire professional editors to adjust amateur headlines to attract more readers. We propose to automate this headline editing process through neural network models to provide more immediate writing support for these social media news writers. To train such a neural headline editing model, we collected a dataset which contains articles with original headlines and professionally edited headlines. However, it is expensive to collect a large number of professionally edited headlines. To solve this low-resource problem, we design an encoder-decoder model which leverages large scale pre-trained language models. We further improve the pre-trained model's quality by introducing a headline generation task as an intermediate task before the headline editing task. Also, we propose Self Importance-Aware (SIA) loss to address the different levels of editing in the dataset by down-weighting the importance of easily classified tokens and sentences. With the help of Pre-training, Adaptation, and SIA, the model learns to generate headlines in the professional editor's style. Experimental results show that our method significantly improves the quality of headline editing comparing against previous methods.


2020 ◽  
Vol 19 (02) ◽  
pp. 447-468
Author(s):  
Oğuzhan Kivrak ◽  
Cüneyt Akar

The main goal of this study is to investigate whether social media, as a recent communication channel, has an impact on customer lifetime value (CLV). No studies have been done in Turkey with similar purposes in the telecommunication sector. To reach this goal, there has been an attempt to develop both artificial neural network models and sector-specific applicable models. Four years of data between 2011 and 2014 belonging to customers in the telecommunication sector who have a Twitter account are used in this study. The CLV is modeled through radial basis function (RBF), multilayer perceptron (MLP), and Elman neural network approaches, and the performance of such models is compared. According to the findings, calculated CLV error values are at an acceptable range in all formed models. Additionally, it is determined that the CLV was calculated with a lower error value in models where social media variables were used. The Elman neural network is determined to perform better compared to RBF and MLP.


2018 ◽  
Vol 373 (1740) ◽  
pp. 20170043 ◽  
Author(s):  
Marco Zorzi ◽  
Alberto Testolin

The finding that human infants and many other animal species are sensitive to numerical quantity has been widely interpreted as evidence for evolved, biologically determined numerical capacities across unrelated species, thereby supporting a ‘nativist’ stance on the origin of number sense. Here, we tackle this issue within the ‘emergentist’ perspective provided by artificial neural network models, and we build on computer simulations to discuss two different approaches to think about the innateness of number sense. The first, illustrated by artificial life simulations, shows that numerical abilities can be supported by domain-specific representations emerging from evolutionary pressure. The second assumes that numerical representations need not be genetically pre-determined but can emerge from the interplay between innate architectural constraints and domain-general learning mechanisms, instantiated in deep learning simulations. We show that deep neural networks endowed with basic visuospatial processing exhibit a remarkable performance in numerosity discrimination before any experience-dependent learning, whereas unsupervised sensory experience with visual sets leads to subsequent improvement of number acuity and reduces the influence of continuous visual cues. The emergent neuronal code for numbers in the model includes both numerosity-sensitive (summation coding) and numerosity-selective response profiles, closely mirroring those found in monkey intraparietal neurons. We conclude that a form of innatism based on architectural and learning biases is a fruitful approach to understanding the origin and development of number sense. This article is part of a discussion meeting issue ‘The origins of numerical abilities'.


2021 ◽  
Vol 38 (1) ◽  
pp. 1-11
Author(s):  
Hafzullah İş ◽  
Taner Tuncer

It is highly important to detect malicious account interaction in social networks with regard to political, social and economic aspects. This paper analyzed the profile structure of social media users using their data interactions. A total of 10 parameters including diameter, density, reciprocity, centrality and modularity were used to comprehensively characterize the interactions of Twitter users. Moreover, a new data set was formed by visualizing the data obtained with these parameters. User profiles were classified using Convolutional Neural Network models with deep learning. Users were divided into active, passive and malicious classes. Success rates for the algorithms used in the classification were estimated based on the hyper parameters and application platforms. The best model had a success rate of 98.67%. The methodology demonstrated that Twitter user profiles can be classified successfully through user interaction-based parameters. It is expected that this paper will contribute to published literature in terms of behavioral analysis and the determination of malicious accounts in social networks.


2021 ◽  
Vol 11 (14) ◽  
pp. 6579
Author(s):  
Nailia Gabdrakhmanova ◽  
Maria Pilgun

The relevance of this study is determined by the need to develop technologies for effective urban systems management and resolution of urban planning conflicts. The paper presents an algorithm for analyzing urban planning conflicts. The material for the study was data from social networks, microblogging, blogs, instant messaging, forums, reviews, video hosting services, thematic portals, online media, print media and TV related to the construction of the North-Eastern Chord (NEC) in Moscow (RF). To analyze the content of social media, a multimodal approach was used. The paper presents the results of research on the development of methods and approaches for constructing mathematical and neural network models for analyzing the social media users’ perceptions based on their digital footprints. Artificial neural networks, differential equations, and mathematical statistics were involved in building the models. Differential equations of dynamic systems were based on observations enabled by machine learning. Mathematical models were developed to quickly detect, prevent, and address conflicts in urban planning in order to manage urban systems efficiently. In combination with mathematical and neural network model the developed approaches, made it possible to draw a conclusion about the tense situation around the construction of the NEC, identify complaints of residents to constructors and city authorities, and propose recommendations to resolve and prevent conflicts. Research data could be of use in solving similar problems in sociology, ecology, and economics.


2020 ◽  
Vol 10 (23) ◽  
pp. 8614 ◽  
Author(s):  
Raghad Alshalan ◽  
Hend Al-Khalifa

With the rise of hate speech phenomena in the Twittersphere, significant research efforts have been undertaken in order to provide automatic solutions for detecting hate speech, varying from simple machine learning models to more complex deep neural network models. Despite this, research works investigating hate speech problem in Arabic are still limited. This paper, therefore, aimed to investigate several neural network models based on convolutional neural network (CNN) and recurrent neural network (RNN) to detect hate speech in Arabic tweets. It also evaluated the recent language representation model bidirectional encoder representations from transformers (BERT) on the task of Arabic hate speech detection. To conduct our experiments, we firstly built a new hate speech dataset that contained 9316 annotated tweets. Then, we conducted a set of experiments on two datasets to evaluate four models: CNN, gated recurrent units (GRU), CNN + GRU, and BERT. Our experimental results in our dataset and an out-domain dataset showed that the CNN model gave the best performance, with an F1-score of 0.79 and area under the receiver operating characteristic curve (AUROC) of 0.89.


Sign in / Sign up

Export Citation Format

Share Document