Detection of Cyberbullying on Social Media Using Machine learning

Mitta Roja

doi:10.22214/ijraset.2021.38635

Detection of Cyberbullying on Social Media Using Machine learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38635 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1401-1409

Author(s):

Mitta Roja

Keyword(s):

Machine Learning ◽

Social Media ◽

Feature Extraction ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Hate Speech ◽

Text Data ◽

Model Based

Abstract: Cyberbullying is a major problem encountered on internet that affects teenagers and also adults. It has lead to mishappenings like suicide and depression. Regulation of content on Social media platorms has become a growing need. The following study uses data from two different forms of cyberbullying, hate speech tweets from Twittter and comments based on personal attacks from Wikipedia forums to build a model based on detection of Cyberbullying in text data using Natural Language Processing and Machine learning. Threemethods for Feature extraction and four classifiers are studied to outline the best approach. For Tweet data the model provides accuracies above 90% and for Wikipedia data it givesaccuracies above 80%. Keywords: Cyberbullying, Hate speech, Personal attacks,Machine learning, Feature extraction, Twitter, Wikipedia

Download Full-text

Social Media Content Categorization Using Supervised Based Machine Learning Methods and Natural Language Processing in Bangla Language

2020 11th International Conference on Electrical and Computer Engineering (ICECE) ◽

10.1109/icece51571.2020.9393095 ◽

2020 ◽

Author(s):

Md. Rejaul Alam ◽

Afsana Akter ◽

Minhajul Abedin Shafin ◽

Md. Mehedi Hasan ◽

Antara Mahmud

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Media Content ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100262 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100262

Author(s):

Mustafa Khanbhai ◽

Patrick Anyadi ◽

Joshua Symons ◽

Kelsey Flott ◽

Ara Darzi ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Patient Experience ◽

Language Processing ◽

Performance Metrics ◽

Free Text ◽

Patient Feedback

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.

Download Full-text

Triage and diagnosis of COVID-19 from medical social media (Preprint)

10.2196/preprints.30397 ◽

2021 ◽

Author(s):

Abul Hasan ◽

Mark Levene ◽

David Weston ◽

Renate Fromson ◽

Nicolas Koslover ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Models ◽

Rule Based ◽

Additional Information ◽

Processing Pipeline ◽

Machine Learning Models

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.

Download Full-text

Sentiment Analysis on Twitter Airline Data

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35807 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 3767-3770

Author(s):

Kirti Jain

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Learning Task ◽

Model Based ◽

Sentiment Mining ◽

General Opinion

Sentiment analysis, also known as sentiment mining, is a submachine learning task where we want to determine the overall sentiment of a particular document. With machine learning and natural language processing (NLP), we can extract the information of a text and try to classify it as positive, neutral, or negative according to its polarity. In this project, We are trying to classify Twitter tweets into positive, negative, and neutral sentiments by building a model based on probabilities. Twitter is a blogging website where people can quickly and spontaneously share their feelings by sending tweets limited to 140 characters. Because of its use of Twitter, it is a perfect source of data to get the latest general opinion on anything.

Download Full-text

A Review of Natural Language Processing and Machine Learning Tools Used to Analyze Arabic Social Media

2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT) ◽

10.1109/jeeit.2019.8717369 ◽

2019 ◽

Cited By ~ 8

Author(s):

Tarek Kanan ◽

Odai Sadaqa ◽

Amal Aldajeh ◽

Hanadi Alshwabka ◽

Wassan AL-dolime ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Tools

Download Full-text

Concept of TF-IDF, Common Bag of Word and Word Embedding for Effective Sentiment Classification

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f4582.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2198-2201

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Sentiment Classification ◽

Word Embedding ◽

Text Representation ◽

Human Beings ◽

Text Data

Sentiment Classification is one of the well-known and most popular domain of machine learning and natural language processing. An algorithm is developed to understand the opinion of an entity similar to human beings. This research fining article presents a similar to the mention above. Concept of natural language processing is considered for text representation. Later novel word embedding model is proposed for effective classification of the data. Tf-IDF and Common BoW representation models were considered for representation of text data. Importance of these models are discussed in the respective sections. The proposed is testing using IMDB datasets. 50% training and 50% testing with three random shuffling of the datasets are used for evaluation of the model.

Download Full-text

Using social media, machine learning and natural language processing to map multiple recreational beneficiaries

Ecosystem Services ◽

10.1016/j.ecoser.2019.100958 ◽

2019 ◽

Vol 38 ◽

pp. 100958 ◽

Cited By ~ 13

Author(s):

Arjan S. Gosal ◽

Ilse R. Geijzendorffer ◽

Tomáš Václavík ◽

Brigitte Poulin ◽

Guy Ziv

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text

Content Analysis of Extracted Suicide Texts From Social Media Networks by Using Natural Language Processing and Machine Learning Techniques

2021 IEEE International Conference on Smart Information Systems and Technologies (SIST) ◽

10.1109/sist50301.2021.9466001 ◽

2021 ◽

Author(s):

Bakhtiyor Meraliyev ◽

Kurmangazy Kongratbayev ◽

Nazerke Sultanova

Keyword(s):

Machine Learning ◽

Social Media ◽

Content Analysis ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Machine Learning Techniques ◽

Social Media Networks ◽

Learning Techniques

Download Full-text

Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related Documents

Symmetry ◽

10.3390/sym12030354 ◽

2020 ◽

Vol 12 (3) ◽

pp. 354

Author(s):

Tiberiu-Marian Georgescu

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Web Application ◽

Named Entity Recognition ◽

Relation Extraction ◽

Entity Recognition ◽

Model Based ◽

The Way

This paper describes the development and implementation of a natural language processing model based on machine learning which performs cognitive analysis for cybersecurity-related documents. A domain ontology was developed using a two-step approach: (1) the symmetry stage and (2) the machine adjustment. The first stage is based on the symmetry between the way humans represent a domain and the way machine learning solutions do. Therefore, the cybersecurity field was initially modeled based on the expertise of cybersecurity professionals. A dictionary of relevant entities was created; the entities were classified into 29 categories and later implemented as classes in a natural language processing model based on machine learning. After running successive performance tests, the ontology was remodeled from 29 to 18 classes. Using the ontology, a natural language processing model based on a supervised learning model was defined. We trained the model using sets of approximately 300,000 words. Remarkably, our model obtained an F1 score of 0.81 for named entity recognition and 0.58 for relation extraction, showing superior results compared to other similar models identified in the literature. Furthermore, in order to be easily used and tested, a web application that integrates our model as the core component was developed.

Download Full-text

Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning

Health Information Science and Systems ◽

10.1007/s13755-021-00158-4 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Yang Liu ◽

Christopher Whitfield ◽

Tianyang Zhang ◽

Amanda Hauser ◽

Taeyonn Reynolds ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text