A Systematic Review of Machine Learning Algorithms in Cyberbullying Detection: Future Directions and Challenges

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.

Download Full-text

Towards scaling Twitter for digital epidemiology of birth defects

npj Digital Medicine ◽

10.1038/s41746-019-0170-5 ◽

2019 ◽

Vol 2 (1) ◽

Cited By ~ 4

Author(s):

Ari Z. Klein ◽

Abeed Sarker ◽

Davy Weissenbacher ◽

Graciela Gonzalez-Hernandez

Keyword(s):

Machine Learning ◽

Social Media ◽

Language Processing ◽

Birth Defects ◽

Birth Defect ◽

Learning Algorithms ◽

Class Imbalance ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Svm Classifier

Abstract Social media has recently been used to identify and study a small cohort of Twitter users whose pregnancies with birth defect outcomes—the leading cause of infant mortality—could be observed via their publicly available tweets. In this study, we exploit social media on a larger scale by developing natural language processing (NLP) methods to automatically detect, among thousands of users, a cohort of mothers reporting that their child has a birth defect. We used 22,999 annotated tweets to train and evaluate supervised machine learning algorithms—feature-engineered and deep learning-based classifiers—that automatically distinguish tweets referring to the user’s pregnancy outcome from tweets that merely mention birth defects. Because 90% of the tweets merely mention birth defects, we experimented with under-sampling and over-sampling approaches to address this class imbalance. An SVM classifier achieved the best performance for the two positive classes: an F1-score of 0.65 for the “defect” class and 0.51 for the “possible defect” class. We deployed the classifier on 20,457 unlabeled tweets that mention birth defects, which helped identify 542 additional users for potential inclusion in our cohort. Contributions of this study include (1) NLP methods for automatically detecting tweets by users reporting their birth defect outcomes, (2) findings that an SVM classifier can outperform a deep neural network-based classifier for highly imbalanced social media data, (3) evidence that automatic classification can be used to identify additional users for potential inclusion in our cohort, and (4) a publicly available corpus for training and evaluating supervised machine learning algorithms.

Download Full-text

Blockchain Leveraged Cyberbullying Preventing framework

10.21203/rs.3.rs-21075/v2 ◽

2021 ◽

Author(s):

Md Anawar Hossen Wadud ◽

Md Ashraf Uddin

Keyword(s):

Machine Learning ◽

Social Media ◽

Language Processing ◽

Processing Technique ◽

The Internet ◽

Machine Learning Method ◽

Learning Method ◽

Natural Language Processing Technique ◽

Being Bullied ◽

Cyberbullying Detection

Abstract The popularity of social media has exploded worldwide over the last few decades and becomes the most preferred mode of social interaction. The internet also provides a new platform through which adolescents are being bullied. Appropriate means of cyberbullying detection is still partial and in some cases very limited. Moreover, research on cyberbullying detection extensively focuses on surveys and its psychological impacts on victims. However, prevention has not been widely addressed. To bridge the gap, this paper aims to detect cyberbullying efficiently. This paper employs a standard machine learning method and natural language processing technique as a part of the detection process in decentralized Blockchain leveraged architecture. We provide a fog based architecture for cyberbullying detection, aiming at relieving the server's load by placing the detection and the prevention of cyberbullying processes at the fog layer. The proposal might offer a probable solution to save users, particularly adolescents from severe consequences of cyberbullying.

Download Full-text

Content Analysis of Extracted Suicide Texts From Social Media Networks by Using Natural Language Processing and Machine Learning Techniques

2021 IEEE International Conference on Smart Information Systems and Technologies (SIST) ◽

10.1109/sist50301.2021.9466001 ◽

2021 ◽

Author(s):

Bakhtiyor Meraliyev ◽

Kurmangazy Kongratbayev ◽

Nazerke Sultanova

Keyword(s):

Machine Learning ◽

Social Media ◽

Content Analysis ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Machine Learning Techniques ◽

Social Media Networks ◽

Learning Techniques

Download Full-text

Detecting Spam Messages in Twitter Data by Machine learning Algorithms using Cross Validation

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1913.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 2941-2946

Keyword(s):

Machine Learning ◽

Social Media ◽

Cross Validation ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Human Relations ◽

Detection Model ◽

Social Media Networks ◽

Twitter Data

Now a day’s human relations are maintained by social media networks. Traditional relationships now days are obsolete. To maintain in association, sharing ideas, exchange knowledge between we use social media networking sites. Social media networking sites like Twitter, Facebook, LinkedIn etc are available in the communication environment. Through Twitter media users share their opinions, interests, knowledge to others by messages. At the same time some of the user’s misguide the genuine users. These genuine users are also called solicited users and the users who misguidance are called spammers. These spammers post unwanted information to the non spam users. The non spammers may retweet them to others and they follow the spammers. To avoid this spam messages we propose a methodology by us using machine learning algorithms. To develop our approach used a set of content based features. In spam detection model we used Support vector machine algorithm(SVM) and Naive bayes classification algorithm. To measure the performance of our model we used precision, recall and F measure metrics.

Download Full-text

Sentiment Analysis on Social Media using Machine Learning Approach

10.22541/au.163620143.37655829/v1 ◽

2021 ◽

Author(s):

Erick Omuya ◽

George Okeyo ◽

Michael Kimwele

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Language Processing ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approach ◽

K Nearest Neighbor ◽

Machine Learning Approach

Social media has been embraced by different people as a convenient and official medium of communication. People write messages and attach images and videos on Twitter, Facebook and other social media which they share. Social media therefore generates a lot of data that is rich in sentiments from these updates. Sentiment analysis has been used to determine opinions of clients, for instance, relating to a particular product or company. Knowledge based approach and Machine learning approach are among the strategies that have been used to analyze these sentiments. The performance of sentiment analysis is however distorted by noise, the curse of dimensionality, the data domains and size of data used for training and testing. This research aims at developing a model for sentiment analysis in which dimensionality reduction and the use of different parts of speech improves sentiment analysis performance. It uses natural language processing for filtering, storing and performing sentiment analysis on the data from social media. The model is tested using Naïve Bayes, Support Vector Machines and K-Nearest neighbor machine learning algorithms and its performance compared with that of two other Sentiment Analysis models. Experimental results show that the model improves sentiment analysis performance using machine learning techniques.

Download Full-text

Machine learning and natural language processing in mental health: a systematic review (Preprint)

10.2196/preprints.15708 ◽

2019 ◽

Author(s):

Aziliz Le Glaz ◽

Yannis Haralambous ◽

Deok-Hee Kim-Dufor ◽

Philippe Lenca ◽

Romain Billot ◽

...

Keyword(s):

Mental Health ◽

Machine Learning ◽

Systematic Review ◽

Social Media ◽

Natural Language Processing ◽

Clinical Practice ◽

Natural Language ◽

Language Processing ◽

Care Providers ◽

Medical Databases

BACKGROUND Machine learning (ML) systems are parts of Artificial Intelligence (AI) that automatically learn models from data in order to make better decisions. Natural Language Processing (NLP), by using corpora and learning approaches, provides good performance in statistical tasks, such as text classification or sentiment mining. OBJECTIVE The primary aim of this systematic review is to summarize and characterize studies that used ML and NLP techniques for mental health, in methodological and technical terms. The secondary aim is to consider the interest of these methods in the mental health clinical practice. METHODS This systematic review follows the PRISMA guidelines and is registered on PROSPERO. The research was conducted on 4 medical databases (Pubmed, Scopus, ScienceDirect and PsycINFO) with the following keywords: machine learning, data mining, psychiatry, mental health, mental disorder. The exclusion criteria are: languages other than English, anonymization process, case studies, conference papers and reviews. No limitations on publication dates were imposed. RESULTS 327 articles were identified, 269 were excluded, and 58 were included in the review. Results were organized through a qualitative perspective. Even though studies had heterogeneous topics and methods, some themes emerged. Population studies could be grouped into three categories: patients included in medical databases, patients who came to the emergency room, and social-media users. The main objectives were symptom extraction, severity of illness classification, comparison of therapy effectiveness, psychopathological clues, and nosography challenging. Data from electronic medical records and that from social media were the two major data sources. With regard to the methods used, preprocessing used the standard methods of NLP and unique identifier extraction dedicated to medical texts. Efficient classifiers were preferred rather than "transparent” functioning classifiers. Python was the most frequently used platform. CONCLUSIONS ML and NLP models have been highly topical issues in medicine in recent years and may be considered a new paradigm in medical research. However, these processes tend to confirm clinical hypotheses rather than developing entirely new knowledge,. and one major category of the population, social-media users, is obviously an imprecise cohort. In addition, some language-specific features can improve the performance of NLP methods, and their extension to other languages should be more closely investigated. However, ML and NLP techniques provide useful information from unexplored data (i.e., patient’s daily habits that are usually inaccessible to care providers). This may be considered to be an additional tool at every step of mental health care: diagnosis, prognosis, treatment efficacy and monitoring. Therefore, ethical issues – like predicting psychiatric troubles or involvement in the physician-patient relationship – remain and should be discussed in a timely manner. ML and NLP methods may offer multiple perspectives in mental health research but should also be considered as tools to support clinical practice. CLINICALTRIAL Number CRD42019107376

Download Full-text

Detection of social media platform insults using Natural language processing and comparative study of machine learning algorithms

2020 24th International Conference on System Theory, Control and Computing (ICSTCC) ◽

10.1109/icstcc50638.2020.9259730 ◽

2020 ◽

Author(s):

Sruthi Chiramel ◽

Doina Logofatu ◽

Gheorghe Goldenthal

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Comparative Study ◽

Language Processing ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Social Media Platform ◽

Media Platform

Download Full-text

Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques

Electronics ◽

10.3390/electronics10222810 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2810

Author(s):

Chahat Raj ◽

Ayush Agarwal ◽

Gnana Bharathy ◽

Bhuva Narayan ◽

Mukesh Prasad

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Feature Extraction ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Machine Learning Algorithms ◽

Classification Methods ◽

Cyberbullying Detection

The rise in web and social media interactions has resulted in the efortless proliferation of offensive language and hate speech. Such online harassment, insults, and attacks are commonly termed cyberbullying. The sheer volume of user-generated content has made it challenging to identify such illicit content. Machine learning has wide applications in text classification, and researchers are shifting towards using deep neural networks in detecting cyberbullying due to the several advantages they have over traditional machine learning algorithms. This paper proposes a novel neural network framework with parameter optimization and an algorithmic comparative study of eleven classification methods: four traditional machine learning and seven shallow neural networks on two real world cyberbullying datasets. In addition, this paper also examines the effect of feature extraction and word-embedding-techniques-based natural language processing on algorithmic performance. Key observations from this study show that bidirectional neural networks and attention models provide high classification results. Logistic Regression was observed to be the best among the traditional machine learning classifiers used. Term Frequency-Inverse Document Frequency (TF-IDF) demonstrates consistently high accuracies with traditional machine learning techniques. Global Vectors (GloVe) perform better with neural network models. Bi-GRU and Bi-LSTM worked best amongst the neural networks used. The extensive experiments performed on the two datasets establish the importance of this work by comparing eleven classification methods and seven feature extraction techniques. Our proposed shallow neural networks outperform existing state-of-the-art approaches for cyberbullying detection, with accuracy and F1-scores as high as ~95% and ~98%, respectively.

Download Full-text

Blockchain Leveraged Cyberbullying Preventing framework

10.21203/rs.3.rs-21075/v1 ◽

2020 ◽

Author(s):

Md Anawar Hossen Wadud ◽

Md Ashraf Uddin ◽

Shamima Parvez ◽

Mohammad Motiur Rahman ◽

Ammar Alazab ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Language Processing ◽

Processing Technique ◽

The Internet ◽

Machine Learning Method ◽

Learning Method ◽

Natural Language Processing Technique ◽

Being Bullied ◽

Cyberbullying Detection

Abstract The popularity of social media has exploded worldwide over the last few decades and becomes the most preferred mode of social interaction. The internet also provides a new platform through which adolescents are being bullied. Appropriate means of cyberbullying detection is still partial and in some cases very limited. Moreover, research on cyberbullying detection extensively focuses on surveys and its psychological impacts on victims. However, prevention has not been widely addressed. To bridge the gap, this paper aims to detect cyberbullying efficiently. This paper employs a standard machine learning method and natural language processing technique as a part of the detection process in decentralized Blockchain leveraged architecture. We provide a fog based architecture for cyberbullying detection, aiming at relieving the server's load by placing the detection and the prevention of cyberbullying processes at the fog layer. The proposal might offer a probable solution to save users, particularly adolescents from severe consequences of cyberbullying.

Download Full-text