Cyberbullying detection in social media text based on character‐level convolutional neural network with shortcuts

Author(s):  
Nijia Lu ◽  
Guohua Wu ◽  
Zhen Zhang ◽  
Yitao Zheng ◽  
Yizhi Ren ◽  
...  
Author(s):  
Gauri Jain ◽  
Manisha Sharma ◽  
Basant Agarwal

This article describes how spam detection in the social media text is becoming increasing important because of the exponential increase in the spam volume over the network. It is challenging, especially in case of text within the limited number of characters. Effective spam detection requires more number of efficient features to be learned. In the current article, the use of a deep learning technology known as a convolutional neural network (CNN) is proposed for spam detection with an added semantic layer on the top of it. The resultant model is known as a semantic convolutional neural network (SCNN). A semantic layer is composed of training the random word vectors with the help of Word2vec to get the semantically enriched word embedding. WordNet and ConceptNet are used to find the word similar to a given word, in case it is missing in the word2vec. The architecture is evaluated on two corpora: SMS Spam dataset (UCI repository) and Twitter dataset (Tweets scrapped from public live tweets). The authors' approach outperforms the-state-of-the-art results with 98.65% accuracy on SMS spam dataset and 94.40% accuracy on Twitter dataset.


2021 ◽  
Author(s):  
Mayank Mishra ◽  
Tanupriya Choudhury ◽  
Tanmay Sarkar

Abstract In our work, we look to classify images that make their way into our smartphone devices through various social-media text-messaging platforms. We aim at classifying images into three broad categories: document-based images, quote-based images, and photographs. People, especially students, share many document-based images that include snapshots of essential emails, handwritten notes, articles, etc. Quote based images, consisting of birthday wishes, motivational messages, festival greetings, etc., are among the highly shared images on social media platforms. A significant share of images constitutes photographs of people, including group photographs, selfies, portraits, etc. We train various convolutional neural network (CNN) based models on our self-made dataset and compare their results to find our task’s optimum model.


Author(s):  
Gauri Jain ◽  
Manisha Sharma ◽  
Basant Agarwal

This article describes how spam detection in the social media text is becoming increasing important because of the exponential increase in the spam volume over the network. It is challenging, especially in case of text within the limited number of characters. Effective spam detection requires more number of efficient features to be learned. In the current article, the use of a deep learning technology known as a convolutional neural network (CNN) is proposed for spam detection with an added semantic layer on the top of it. The resultant model is known as a semantic convolutional neural network (SCNN). A semantic layer is composed of training the random word vectors with the help of Word2vec to get the semantically enriched word embedding. WordNet and ConceptNet are used to find the word similar to a given word, in case it is missing in the word2vec. The architecture is evaluated on two corpora: SMS Spam dataset (UCI repository) and Twitter dataset (Tweets scrapped from public live tweets). The authors' approach outperforms the-state-of-the-art results with 98.65% accuracy on SMS spam dataset and 94.40% accuracy on Twitter dataset.


Author(s):  
Mohammad Javad Shooshtari ◽  
Hossein Etemadfard ◽  
Rouzbeh Shad

The widespread deployment of social media has helped researchers access an enormous amount of data in various domains, including the pandemic caused by the COVID-19 spread. This study presents a heuristic approach to classify Commercial Instagram Posts (CIPs) and explores how the businesses around the Holy Shrine – a sacred complex in Mashhad, Iran, surrounded by numerous shopping centers – were impacted by the pandemic. Two datasets of Instagram posts (one gathered data from March 14th to April 10th, 2020, when Holy Shrine and nearby shops were closed, and one extracted data from the same period in 2019), two word embedding models – aimed at vectorizing associated caption of each post, and two neural networks – multi-layer perceptron and convolutional neural network – were employed to classify CIPs in 2019. Among the scenarios defined for the 2019 CIPs classification, the results revealed that the combination of MLP and CBoW achieved the best performance, which was then used for the 2020 CIPs classification. It is found out that the fraction of CIPs to total Instagram posts has increased from 5.58% in 2019 to 8.08% in 2020, meaning that business owners were using Instagram to increase their sales and continue their commercial activities to compensate for the closure of their stores during the pandemic. Moreover, the portion of non-commercial Instagram posts (NCIPs) in total posts has decreased from 94.42% in 2019 to 91.92% in 2020, implying the fact that since the Holy Shrine was closed, Mashhad citizens and tourists could not visit it and take photos to post on their Instagram accounts.


Author(s):  
Feng Qian ◽  
Chengyue Gong ◽  
Karishma Sharma ◽  
Yan Liu

Fake news on social media is a major challenge and studies have shown that fake news can propagate exponentially quickly in early stages. Therefore, we focus on early detection of fake news, and consider that only news article text is available at the time of detection, since additional information such as user responses and propagation patterns can be obtained only after the news spreads. However, we find historical user responses to previous articles are available and can be treated as soft semantic labels, that enrich the binary label of an article, by providing insights into why the article must be labeled as fake. We propose a novel Two-Level Convolutional Neural Network with User Response Generator (TCNN-URG) where TCNN captures semantic information from article text by representing it at the sentence and word level, and URG learns a generative model of user response to article text from historical user responses which it can use to generate responses to new articles in order to assist fake news detection. We conduct experiments on one available dataset and a larger dataset collected by ourselves. Experimental results show that TCNN-URG outperforms the baselines based on prior approaches that detect fake news from article text alone.


Sign in / Sign up

Export Citation Format

Share Document