scholarly journals A Systematic Review of Machine Learning Algorithms in Cyberbullying Detection: Future Directions and Challenges

2021 ◽  
Vol 4 (1) ◽  
pp. 01-26
Author(s):  
Muhammad Arif

Social media networks are becoming an essential part of life for most of the world’s population. Detecting cyberbullying using machine learning and natural language processing algorithms is getting the attention of researchers. There is a growing need for automatic detection and mitigation of cyberbullying events on social media. In this study, research directions and the theoretical foundation in this area are investigated. A systematic review of the current state-of-the-art research in this area is conducted. A framework considering all possible actors in the cyberbullying event must be designed, including various aspects of cyberbullying and its effect on the participating actors. Furthermore, future directions and challenges are also discussed.

2021 ◽  
Vol 28 (1) ◽  
pp. e100262
Author(s):  
Mustafa Khanbhai ◽  
Patrick Anyadi ◽  
Joshua Symons ◽  
Kelsey Flott ◽  
Ara Darzi ◽  
...  

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Ari Z. Klein ◽  
Abeed Sarker ◽  
Davy Weissenbacher ◽  
Graciela Gonzalez-Hernandez

Abstract Social media has recently been used to identify and study a small cohort of Twitter users whose pregnancies with birth defect outcomes—the leading cause of infant mortality—could be observed via their publicly available tweets. In this study, we exploit social media on a larger scale by developing natural language processing (NLP) methods to automatically detect, among thousands of users, a cohort of mothers reporting that their child has a birth defect. We used 22,999 annotated tweets to train and evaluate supervised machine learning algorithms—feature-engineered and deep learning-based classifiers—that automatically distinguish tweets referring to the user’s pregnancy outcome from tweets that merely mention birth defects. Because 90% of the tweets merely mention birth defects, we experimented with under-sampling and over-sampling approaches to address this class imbalance. An SVM classifier achieved the best performance for the two positive classes: an F1-score of 0.65 for the “defect” class and 0.51 for the “possible defect” class. We deployed the classifier on 20,457 unlabeled tweets that mention birth defects, which helped identify 542 additional users for potential inclusion in our cohort. Contributions of this study include (1) NLP methods for automatically detecting tweets by users reporting their birth defect outcomes, (2) findings that an SVM classifier can outperform a deep neural network-based classifier for highly imbalanced social media data, (3) evidence that automatic classification can be used to identify additional users for potential inclusion in our cohort, and (4) a publicly available corpus for training and evaluating supervised machine learning algorithms.


2021 ◽  
Author(s):  
Md Anawar Hossen Wadud ◽  
Md Ashraf Uddin

Abstract The popularity of social media has exploded worldwide over the last few decades and becomes the most preferred mode of social interaction. The internet also provides a new platform through which adolescents are being bullied. Appropriate means of cyberbullying detection is still partial and in some cases very limited. Moreover, research on cyberbullying detection extensively focuses on surveys and its psychological impacts on victims. However, prevention has not been widely addressed. To bridge the gap, this paper aims to detect cyberbullying efficiently. This paper employs a standard machine learning method and natural language processing technique as a part of the detection process in decentralized Blockchain leveraged architecture. We provide a fog based architecture for cyberbullying detection, aiming at relieving the server's load by placing the detection and the prevention of cyberbullying processes at the fog layer. The proposal might offer a probable solution to save users, particularly adolescents from severe consequences of cyberbullying.


Now a day’s human relations are maintained by social media networks. Traditional relationships now days are obsolete. To maintain in association, sharing ideas, exchange knowledge between we use social media networking sites. Social media networking sites like Twitter, Facebook, LinkedIn etc are available in the communication environment. Through Twitter media users share their opinions, interests, knowledge to others by messages. At the same time some of the user’s misguide the genuine users. These genuine users are also called solicited users and the users who misguidance are called spammers. These spammers post unwanted information to the non spam users. The non spammers may retweet them to others and they follow the spammers. To avoid this spam messages we propose a methodology by us using machine learning algorithms. To develop our approach used a set of content based features. In spam detection model we used Support vector machine algorithm(SVM) and Naive bayes classification algorithm. To measure the performance of our model we used precision, recall and F measure metrics.


Author(s):  
Erick Omuya ◽  
George Okeyo ◽  
Michael Kimwele

Social media has been embraced by different people as a convenient and official medium of communication. People write messages and attach images and videos on Twitter, Facebook and other social media which they share. Social media therefore generates a lot of data that is rich in sentiments from these updates. Sentiment analysis has been used to determine opinions of clients, for instance, relating to a particular product or company. Knowledge based approach and Machine learning approach are among the strategies that have been used to analyze these sentiments. The performance of sentiment analysis is however distorted by noise, the curse of dimensionality, the data domains and size of data used for training and testing. This research aims at developing a model for sentiment analysis in which dimensionality reduction and the use of different parts of speech improves sentiment analysis performance. It uses natural language processing for filtering, storing and performing sentiment analysis on the data from social media. The model is tested using Naïve Bayes, Support Vector Machines and K-Nearest neighbor machine learning algorithms and its performance compared with that of two other Sentiment Analysis models. Experimental results show that the model improves sentiment analysis performance using machine learning techniques.


2019 ◽  
Author(s):  
Aziliz Le Glaz ◽  
Yannis Haralambous ◽  
Deok-Hee Kim-Dufor ◽  
Philippe Lenca ◽  
Romain Billot ◽  
...  

BACKGROUND Machine learning (ML) systems are parts of Artificial Intelligence (AI) that automatically learn models from data in order to make better decisions. Natural Language Processing (NLP), by using corpora and learning approaches, provides good performance in statistical tasks, such as text classification or sentiment mining. OBJECTIVE The primary aim of this systematic review is to summarize and characterize studies that used ML and NLP techniques for mental health, in methodological and technical terms. The secondary aim is to consider the interest of these methods in the mental health clinical practice. METHODS This systematic review follows the PRISMA guidelines and is registered on PROSPERO. The research was conducted on 4 medical databases (Pubmed, Scopus, ScienceDirect and PsycINFO) with the following keywords: machine learning, data mining, psychiatry, mental health, mental disorder. The exclusion criteria are: languages other than English, anonymization process, case studies, conference papers and reviews. No limitations on publication dates were imposed. RESULTS 327 articles were identified, 269 were excluded, and 58 were included in the review. Results were organized through a qualitative perspective. Even though studies had heterogeneous topics and methods, some themes emerged. Population studies could be grouped into three categories: patients included in medical databases, patients who came to the emergency room, and social-media users. The main objectives were symptom extraction, severity of illness classification, comparison of therapy effectiveness, psychopathological clues, and nosography challenging. Data from electronic medical records and that from social media were the two major data sources. With regard to the methods used, preprocessing used the standard methods of NLP and unique identifier extraction dedicated to medical texts. Efficient classifiers were preferred rather than "transparent” functioning classifiers. Python was the most frequently used platform. CONCLUSIONS ML and NLP models have been highly topical issues in medicine in recent years and may be considered a new paradigm in medical research. However, these processes tend to confirm clinical hypotheses rather than developing entirely new knowledge,. and one major category of the population, social-media users, is obviously an imprecise cohort. In addition, some language-specific features can improve the performance of NLP methods, and their extension to other languages should be more closely investigated. However, ML and NLP techniques provide useful information from unexplored data (i.e., patient’s daily habits that are usually inaccessible to care providers). This may be considered to be an additional tool at every step of mental health care: diagnosis, prognosis, treatment efficacy and monitoring. Therefore, ethical issues – like predicting psychiatric troubles or involvement in the physician-patient relationship – remain and should be discussed in a timely manner. ML and NLP methods may offer multiple perspectives in mental health research but should also be considered as tools to support clinical practice. CLINICALTRIAL Number CRD42019107376


Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2810
Author(s):  
Chahat Raj ◽  
Ayush Agarwal ◽  
Gnana Bharathy ◽  
Bhuva Narayan ◽  
Mukesh Prasad

The rise in web and social media interactions has resulted in the efortless proliferation of offensive language and hate speech. Such online harassment, insults, and attacks are commonly termed cyberbullying. The sheer volume of user-generated content has made it challenging to identify such illicit content. Machine learning has wide applications in text classification, and researchers are shifting towards using deep neural networks in detecting cyberbullying due to the several advantages they have over traditional machine learning algorithms. This paper proposes a novel neural network framework with parameter optimization and an algorithmic comparative study of eleven classification methods: four traditional machine learning and seven shallow neural networks on two real world cyberbullying datasets. In addition, this paper also examines the effect of feature extraction and word-embedding-techniques-based natural language processing on algorithmic performance. Key observations from this study show that bidirectional neural networks and attention models provide high classification results. Logistic Regression was observed to be the best among the traditional machine learning classifiers used. Term Frequency-Inverse Document Frequency (TF-IDF) demonstrates consistently high accuracies with traditional machine learning techniques. Global Vectors (GloVe) perform better with neural network models. Bi-GRU and Bi-LSTM worked best amongst the neural networks used. The extensive experiments performed on the two datasets establish the importance of this work by comparing eleven classification methods and seven feature extraction techniques. Our proposed shallow neural networks outperform existing state-of-the-art approaches for cyberbullying detection, with accuracy and F1-scores as high as ~95% and ~98%, respectively.


2020 ◽  
Author(s):  
Md Anawar Hossen Wadud ◽  
Md Ashraf Uddin ◽  
Shamima Parvez ◽  
Mohammad Motiur Rahman ◽  
Ammar Alazab ◽  
...  

Abstract The popularity of social media has exploded worldwide over the last few decades and becomes the most preferred mode of social interaction. The internet also provides a new platform through which adolescents are being bullied. Appropriate means of cyberbullying detection is still partial and in some cases very limited. Moreover, research on cyberbullying detection extensively focuses on surveys and its psychological impacts on victims. However, prevention has not been widely addressed. To bridge the gap, this paper aims to detect cyberbullying efficiently. This paper employs a standard machine learning method and natural language processing technique as a part of the detection process in decentralized Blockchain leveraged architecture. We provide a fog based architecture for cyberbullying detection, aiming at relieving the server's load by placing the detection and the prevention of cyberbullying processes at the fog layer. The proposal might offer a probable solution to save users, particularly adolescents from severe consequences of cyberbullying.


Sign in / Sign up

Export Citation Format

Share Document