scholarly journals Detection of Cyberbullying Through BERT and Weighted Ensemble of Classifiers

Author(s):  
Christopher Graney-Ward ◽  
Biju Issac ◽  
LIDA KETSBAIA ◽  
Seibu Mary Jacob

Due to the recent popularity and growth of social media platforms such as Facebook and Twitter, cyberbullying is becoming more and more prevalent. The current research on cyberbullying and the NLP techniques being used to classify this kind of online behaviour was initially studied. This paper discusses the experimentation with combined Twitter datasets by Maryland and Cornell universities using different classification approaches like classical machine learning, RNN, CNN, and pretrained transformer-based classifiers. A state of the art (SOTA) solution was achieved by optimising BERTweet on a Onecycle policy with a Decoupled weight decay optimiser (AdamW), improving the previous F1-score by up to 8.4%, resulting in 64.8% macro F1. Particle Swarm Optimisation was later used to optimise the ensemble model. The ensemble developed from the optimised BERTweet model and a collection of models with varying data representations, outperformed the standalone BERTweet model by 0.53% resulting in 65.33% macro F1 for TweetEval dataset and by 0.55% for combined datasets, resulting in 68.1% macro F1.

2022 ◽  
Author(s):  
Christopher Graney-Ward ◽  
Biju Issac ◽  
LIDA KETSBAIA ◽  
Seibu Mary Jacob

Due to the recent popularity and growth of social media platforms such as Facebook and Twitter, cyberbullying is becoming more and more prevalent. The current research on cyberbullying and the NLP techniques being used to classify this kind of online behaviour was initially studied. This paper discusses the experimentation with combined Twitter datasets by Maryland and Cornell universities using different classification approaches like classical machine learning, RNN, CNN, and pretrained transformer-based classifiers. A state of the art (SOTA) solution was achieved by optimising BERTweet on a Onecycle policy with a Decoupled weight decay optimiser (AdamW), improving the previous F1-score by up to 8.4%, resulting in 64.8% macro F1. Particle Swarm Optimisation was later used to optimise the ensemble model. The ensemble developed from the optimised BERTweet model and a collection of models with varying data representations, outperformed the standalone BERTweet model by 0.53% resulting in 65.33% macro F1 for TweetEval dataset and by 0.55% for combined datasets, resulting in 68.1% macro F1.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 556
Author(s):  
Thaer Thaher ◽  
Mahmoud Saheb ◽  
Hamza Turabieh ◽  
Hamouda Chantar

Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset.


2021 ◽  
pp. 1-13
Author(s):  
C S Pavan Kumar ◽  
L D Dhinesh Babu

Sentiment analysis is widely used to retrieve the hidden sentiments in medical discussions over Online Social Networking platforms such as Twitter, Facebook, Instagram. People often tend to convey their feelings concerning their medical problems over social media platforms. Practitioners and health care workers have started to observe these discussions to assess the impact of health-related issues among the people. This helps in providing better care to improve the quality of life. Dementia is a serious disease in western countries like the United States of America and the United Kingdom, and the respective governments are providing facilities to the affected people. There is much chatter over social media platforms concerning the patients’ care, healthy measures to be followed to avoid disease, check early indications. These chatters have to be carefully monitored to help the officials take necessary precautions for the betterment of the affected. A novel Feature engineering architecture that involves feature-split for sentiment analysis of medical chatter over online social networks with the pipeline is proposed that can be used on any Machine Learning model. The proposed model used the fuzzy membership function in refining the outputs. The machine learning model has obtained sentiment score is subjected to fuzzification and defuzzification by using the trapezoid membership function and center of sums method, respectively. Three datasets are considered for comparison of the proposed and the regular model. The proposed approach delivered better results than the normal approach and is proved to be an effective approach for sentiment analysis of medical discussions over online social networks.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0256696
Author(s):  
Anna Keuchenius ◽  
Petter Törnberg ◽  
Justus Uitermark

Despite the prevalence of disagreement between users on social media platforms, studies of online debates typically only look at positive online interactions, represented as networks with positive ties. In this paper, we hypothesize that the systematic neglect of conflict that these network analyses induce leads to misleading results on polarized debates. We introduce an approach to bring in negative user-to-user interaction, by analyzing online debates using signed networks with positive and negative ties. We apply this approach to the Dutch Twitter debate on ‘Black Pete’—an annual Dutch celebration with racist characteristics. Using a dataset of 430,000 tweets, we apply natural language processing and machine learning to identify: (i) users’ stance in the debate; and (ii) whether the interaction between users is positive (supportive) or negative (antagonistic). Comparing the resulting signed network with its unsigned counterpart, the retweet network, we find that traditional unsigned approaches distort debates by conflating conflict with indifference, and that the inclusion of negative ties changes and enriches our understanding of coalitions and division within the debate. Our analysis reveals that some groups are attacking each other, while others rather seem to be located in fragmented Twitter spaces. Our approach identifies new network positions of individuals that correspond to roles in the debate, such as leaders and scapegoats. These findings show that representing the polarity of user interactions as signs of ties in networks substantively changes the conclusions drawn from polarized social media activity, which has important implications for various fields studying online debates using network analysis.


It is evident that there has been enormous growth in terrorist attacks in recent years. The idea of online terrorism has also been growing its roots in the internet world. These types of activities have been growing along with the growth in internet technology. These types of events include social media threats such as hate speeches and comments provoking terror on social media platforms such as twitter, Facebook, etc. These activities must be prevented before it makes an impact. In this paper, we will make various classifiers that will group and predict various terrorism activities using k-NN algorithm and random forest algorithm. The purpose of this project is to use Global Terrorism Database as a dataset to detect terrorism. We will be using GTD which stands for Global Terrorism Database which is a publicly available database which contains information on terrorist event far and wide from 1970 through 2017 to train a machine learning-based intelligent system to predict any future events that could bring threat to the society.


2022 ◽  
pp. 20-39
Author(s):  
Elliot Mbunge ◽  
Benhildah Muchemwa

Social media platforms play a tremendous role in the tourism and hospitality industry. Social media platforms are increasingly becoming a source of information. The complexity and increasing size of tourists' online data make it difficult to extract meaningful insights using traditional models. Therefore, this scoping and comprehensive review aimed to analyze machine learning and deep learning models applied to model tourism data. The study revealed that deep learning and machine learning models are used for forecasting and predicting tourism demand using data from search query data, Google trends, and social media platforms. Also, the study revealed that data-driven models can assist managers and policymakers in mapping and segmenting tourism hotspots and attractions and predicting revenue that is likely to be generated, exploring targeting marketing, segmenting tourists based on their spending patterns, lifestyle, and age group. However, hybrid deep learning models such as inceptionV3, MobilenetsV3, and YOLOv4 are not yet explored in the tourism and hospitality industry.


2021 ◽  
pp. 68-80
Author(s):  
Muhammad Umer Hashmi ◽  
Ngoc Duy Nguyen ◽  
Michael Johnstone ◽  
Kathryn Backholer ◽  
Asim Bhatti

2020 ◽  
Vol 8 (4) ◽  
pp. 47-62
Author(s):  
Francisca Oladipo ◽  
Ogunsanya, F. B ◽  
Musa, A. E. ◽  
Ogbuju, E. E ◽  
Ariwa, E.

The social media space has evolved into a large labyrinth of information exchange platform and due to the growth in the adoption of different social media platforms, there has been an increasing wave of interests in sentiment analysis as a paradigm for the mining and analysis of users’ opinions and sentiments based on their posts. In this paper, we present a review of contextual sentiment analysis on social media entries with a specific focus on Twitter. The sentimental analysis consists of two broad approaches which are machine learning which uses classification techniques to classify text and is further categorized into supervised learning and unsupervised learning; and the lexicon-based approach which uses a dictionary without using any test or training data set, unlike the machine learning approach.  


Author(s):  
Sonali Gaikwad ◽  
Tejashri Borate ◽  
Nandpriya Ashtekar ◽  
Umadevi Lade

Social Media Platforms involve not millions but billions of users around the globe. Interactions on these easily available social media sites like Twitter have a huge impact on people. Nowadays, there is undesirable negative impact for daily life. These hugely used major platforms of communication have now become a great source of dispersing unwanted data and irrelevant information, Twitter being one of the most extravagant social media platform in our times, the topmost popular microblogging services is now used as a weapon to share unethical, unreasonable amount of opinions, media. In this proposed work the dishonouring comments, tweets towards people are categorized into 9 types. The tweets are further classifies into one of these types or non-shaming tweets towards people. Observation says out of the multitude of taking an interested clients who posts remarks on a specific occasion, lions share are probably going to modify the person in question. Moreover, it is not the nonshaming devotee who checks the increment quicker but of shaming in twitter.


Sign in / Sign up

Export Citation Format

Share Document