DISCERNMENT OF CYBERBULLYING APPROACH

People in the 21st century are being raised in an Internet enabled world where social media has become an integral part of people’s daily routine, with communication just a click away. As per the latest survey, the number of individuals using social media is over 39.6 million worldwide, with the average user having 8.7 accounts on various networking sites. Social media provides an opportunity to connect with people and share data in the form of posts, text etc, with this package of pros and yet various individuals are trying to misuse it by spreading hatred towards a group, individuals, a topic or an activity. Due to which cyberbullying has come into play, affecting the psychological state of the person. Where prevention is much needed, for which many researchers have come together and established many such technologies and programs for automatically detecting the events of cyber bullying on social media and preventing them by analysing the pattern of the posted comments or images. Thus the purpose of this research is to track and monitor the threats using supervised machine learning and mining.

Download Full-text

Automatic detection of cyberbullying and threatening in Saudi tweets using machine learning

International Journal of ADVANCED AND APPLIED SCIENCES ◽

10.21833/ijaas.2021.10.003 ◽

2021 ◽

Vol 8 (10) ◽

pp. 17-25

Author(s):

Alghamdi et al. ◽

Keyword(s):

Machine Learning ◽

Social Media ◽

Automatic Detection ◽

Arabic Language ◽

Supervised Machine Learning ◽

Psychological State ◽

Support Vector ◽

Feature Extraction Method ◽

Efficient Detection ◽

Use Of Social Media

Social media has become a major factor in people's lives, which affects their communication and psychological state. The widespread use of social media has formed new types of violence, such as cyberbullying. Manual detection and reporting of violent texts in social media applications are challenging due to the increasing number of social media users and the huge amounts of generated data. Automatic detection of violent texts is language-dependent, and it requires an efficient detection approach, which considers the unique features and structures of a specific language or dialect. Only a few studies have focused on the automatic detection and classification of violent texts in the Arabic Language. This paper aims to build a two-level classifier model for classifying Arabic violent texts. The first level classifies text into violent and non-violent. The second level classifies violent text into either cyberbullying or threatening. The dataset used to build the classifier models is collected from Twitter, using specific keywords and trending hashtags in Saudi Arabia. Supervised machine learning is used to build two classifier models, using two different algorithms, which are Support Vector Machine (SVM), and Naive Bayes (NB). Both models are trained in different experimental settings of varying the feature extraction method and whether stop-word removal is applied or not. The performances of the proposed SVM-based and NB-based models have been compared. The SVM-based model outperforms the NB-based model with F1 scores of 76.06%, and 89.18%, and accuracy scores of 73.35% and 87.79% for the first and second levels of classification, respectively.

Download Full-text

Exploring fake news identification using word and sentence embeddings

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189865 ◽

2021 ◽

pp. 1-8

Author(s):

V.T Priyanga ◽

J.P Sanjanasri ◽

Vijay Krishna Menon ◽

E.A Gopalakrishnan ◽

K.P Soman

Keyword(s):

Machine Learning ◽

Social Media ◽

Network Analysis ◽

Supervised Machine Learning ◽

Breeding Ground ◽

Fake News ◽

Data Set ◽

Highly Correlated ◽

Use Of Social Media ◽

The Liar

The widespread use of social media like Facebook, Twitter, Whatsapp, etc. has changed the way News is created and published; accessing news has become easy and inexpensive. However, the scale of usage and inability to moderate the content has made social media, a breeding ground for the circulation of fake news. Fake news is deliberately created either to increase the readership or disrupt the order in the society for political and commercial benefits. It is of paramount importance to identify and filter out fake news especially in democratic societies. Most existing methods for detecting fake news involve traditional supervised machine learning which has been quite ineffective. In this paper, we are analyzing word embedding features that can tell apart fake news from true news. We use the LIAR and ISOT data set. We churn out highly correlated news data from the entire data set by using cosine similarity and other such metrices, in order to distinguish their domains based on central topics. We then employ auto-encoders to detect and differentiate between true and fake news while also exploring their separability through network analysis.

Download Full-text

Attack and Anomaly Detection in IoT Networks Using Supervised Machine Learning Approaches

Revue d intelligence artificielle ◽

10.18280/ria.350102 ◽

2021 ◽

Vol 35 (1) ◽

pp. 11-21

Author(s):

Himani Tyagi ◽

Rajendra Kumar

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Detection System ◽

Feature Reduction ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Testing Time ◽

Learning Approaches ◽

Reduction Techniques ◽

Share Data

IoT is characterized by communication between things (devices) that constantly share data, analyze, and make decisions while connected to the internet. This interconnected architecture is attracting cyber criminals to expose the IoT system to failure. Therefore, it becomes imperative to develop a system that can accurately and automatically detect anomalies and attacks occurring in IoT networks. Therefore, in this paper, an Intrsuion Detection System (IDS) based on extracted novel feature set synthesizing BoT-IoT dataset is developed that can swiftly, accurately and automatically differentiate benign and malicious traffic. Instead of using available feature reduction techniques like PCA that can change the core meaning of variables, a unique feature set consisting of only seven lightweight features is developed that is also IoT specific and attack traffic independent. Also, the results shown in the study demonstrates the effectiveness of fabricated seven features in detecting four wide variety of attacks namely DDoS, DoS, Reconnaissance, and Information Theft. Furthermore, this study also proves the applicability and efficiency of supervised machine learning algorithms (KNN, LR, SVM, MLP, DT, RF) in IoT security. The performance of the proposed system is validated using performance Metrics like accuracy, precision, recall, F-Score and ROC. Though the accuracy of Decision Tree (99.9%) and Randon Forest (99.9%) Classifiers are same but other metrics like training and testing time shows Random Forest comparatively better.

Download Full-text

Cyber Bullying Detection for Twitter Using ML Classification Algorithms

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38701 ◽

2021 ◽

Vol 9 (11) ◽

pp. 24-29

Author(s):

Muskan Patidar

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Cyber Bullying ◽

Machine Learning Algorithms ◽

Support Vector ◽

Classification Algorithms

Abstract: Social networking platforms have given us incalculable opportunities than ever before, and its benefits are undeniable. Despite benefits, people may be humiliated, insulted, bullied, and harassed by anonymous users, strangers, or peers. Cyberbullying refers to the use of technology to humiliate and slander other people. It takes form of hate messages sent through social media and emails. With the exponential increase of social media users, cyberbullying has been emerged as a form of bullying through electronic messages. We have tried to propose a possible solution for the above problem, our project aims to detect cyberbullying in tweets using ML Classification algorithms like Naïve Bayes, KNN, Decision Tree, Random Forest, Support Vector etc. and also we will apply the NLTK (Natural language toolkit) which consist of bigram, trigram, n-gram and unigram on Naïve Bayes to check its accuracy. Finally, we will compare the results of proposed and baseline features with other machine learning algorithms. Findings of the comparison indicate the significance of the proposed features in cyberbullying detection. Keywords: Cyber bullying, Machine Learning Algorithms, Twitter, Natural Language Toolkit

Download Full-text

Towards scaling Twitter for digital epidemiology of birth defects

npj Digital Medicine ◽

10.1038/s41746-019-0170-5 ◽

2019 ◽

Vol 2 (1) ◽

Cited By ~ 4

Author(s):

Ari Z. Klein ◽

Abeed Sarker ◽

Davy Weissenbacher ◽

Graciela Gonzalez-Hernandez

Keyword(s):

Machine Learning ◽

Social Media ◽

Language Processing ◽

Birth Defects ◽

Birth Defect ◽

Learning Algorithms ◽

Class Imbalance ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Svm Classifier

Abstract Social media has recently been used to identify and study a small cohort of Twitter users whose pregnancies with birth defect outcomes—the leading cause of infant mortality—could be observed via their publicly available tweets. In this study, we exploit social media on a larger scale by developing natural language processing (NLP) methods to automatically detect, among thousands of users, a cohort of mothers reporting that their child has a birth defect. We used 22,999 annotated tweets to train and evaluate supervised machine learning algorithms—feature-engineered and deep learning-based classifiers—that automatically distinguish tweets referring to the user’s pregnancy outcome from tweets that merely mention birth defects. Because 90% of the tweets merely mention birth defects, we experimented with under-sampling and over-sampling approaches to address this class imbalance. An SVM classifier achieved the best performance for the two positive classes: an F1-score of 0.65 for the “defect” class and 0.51 for the “possible defect” class. We deployed the classifier on 20,457 unlabeled tweets that mention birth defects, which helped identify 542 additional users for potential inclusion in our cohort. Contributions of this study include (1) NLP methods for automatically detecting tweets by users reporting their birth defect outcomes, (2) findings that an SVM classifier can outperform a deep neural network-based classifier for highly imbalanced social media data, (3) evidence that automatic classification can be used to identify additional users for potential inclusion in our cohort, and (4) a publicly available corpus for training and evaluating supervised machine learning algorithms.

Download Full-text

Applying Unsupervised and Supervised Machine Learning Methodologies in Social Media Textual Traffic Data

Data Analytics: Paving the Way to Sustainable Urban Mobility - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-02305-8_80 ◽

2018 ◽

pp. 665-672

Author(s):

Konstantinos Kokkinos ◽

Eftihia Nathanail ◽

Elpiniki Papageorgiou

Keyword(s):

Machine Learning ◽

Social Media ◽

Supervised Machine Learning ◽

Traffic Data

Download Full-text

Schizophrenia Detection Using Machine Learning Approach from Social Media Content

Sensors ◽

10.3390/s21175924 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5924

Author(s):

Yi Ji Bae ◽

Midan Shim ◽

Won Hee Lee

Keyword(s):

Mental Health ◽

Machine Learning ◽

Social Media ◽

Mental Health Problems ◽

Negative Emotion ◽

Supervised Machine Learning ◽

Control Group ◽

Learning Approaches ◽

Linguistic Features ◽

Media Texts

Schizophrenia is a severe mental disorder that ranks among the leading causes of disability worldwide. However, many cases of schizophrenia remain untreated due to failure to diagnose, self-denial, and social stigma. With the advent of social media, individuals suffering from schizophrenia share their mental health problems and seek support and treatment options. Machine learning approaches are increasingly used for detecting schizophrenia from social media posts. This study aims to determine whether machine learning could be effectively used to detect signs of schizophrenia in social media users by analyzing their social media texts. To this end, we collected posts from the social media platform Reddit focusing on schizophrenia, along with non-mental health related posts (fitness, jokes, meditation, parenting, relationships, and teaching) for the control group. We extracted linguistic features and content topics from the posts. Using supervised machine learning, we classified posts belonging to schizophrenia and interpreted important features to identify linguistic markers of schizophrenia. We applied unsupervised clustering to the features to uncover a coherent semantic representation of words in schizophrenia. We identified significant differences in linguistic features and topics including increased use of third person plural pronouns and negative emotion words and symptom-related topics. We distinguished schizophrenic from control posts with an accuracy of 96%. Finally, we found that coherent semantic groups of words were the key to detecting schizophrenia. Our findings suggest that machine learning approaches could help us understand the linguistic characteristics of schizophrenia and identify schizophrenia or otherwise at-risk individuals using social media texts.

Download Full-text

Microblog credibility indicators regarding misinformation of genetically modified food on Weibo

PLoS ONE ◽

10.1371/journal.pone.0252392 ◽

2021 ◽

Vol 16 (6) ◽

pp. e0252392

Author(s):

Jiaojiao Ji ◽

Naipeng Chao ◽

Shitong Wei ◽

George A. Barnett

Keyword(s):

Machine Learning ◽

Social Media ◽

Science Communication ◽

Predictive Power ◽

Genetically Modified ◽

Public Understanding ◽

Supervised Machine Learning ◽

Genetically Modified Food ◽

The Public ◽

Gm Food

The considerable amount of misinformation on social media regarding genetically modified (GM) food will not only hinder public understanding but also mislead the public to make unreasoned decisions. This study discovered a new mechanism of misinformation diffusion in the case of GM food and applied a framework of supervised machine learning to identify effective credibility indicators for the misinformation prediction of GM food. Main indicators are proposed, including user identities involved in spreading information, linguistic styles, and propagation dynamics. Results show that linguistic styles, including sentiment and topics, have the dominant predictive power. In addition, among the user identities, engagement, and extroversion are effective predictors, while reputation has almost no predictive power in this study. Finally, we provide strategies that readers should be aware of when assessing the credibility of online posts and suggest improvements that Weibo can use to avoid rumormongering and enhance the science communication of GM food.

Download Full-text

Building and Annotating a Codeswitched Hate Speech Corpora

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.03.03 ◽

2021 ◽

Vol 13 (3) ◽

pp. 33-52

Author(s):

Edward Ombui ◽

◽

Lawrence Muchemi ◽

Peter Wagacha

Keyword(s):

Machine Learning ◽

Social Media ◽

Hate Speech ◽

Empirical Studies ◽

Presidential Campaign ◽

Supervised Machine Learning ◽

Annotation Scheme ◽

Speech Corpora ◽

Duplex Theory ◽

Speech Identification

Presidential campaign periods are a major trigger event for hate speech on social media in almost every country. A systematic review of previous studies indicates inadequate publicly available annotated datasets and hardly any evidence of theoretical underpinning for the annotation schemes used for hate speech identification. This situation stifles the development of empirically useful data for research, especially in supervised machine learning. This paper describes the methodology that was used to develop a multidimensional hate speech framework based on the duplex theory of hate [1] components that include distance, passion, commitment to hate, and hate as a story. Subsequently, an annotation scheme based on the framework was used to annotate a random sample of ~51k tweets from ~400k tweets that were collected during the August and October 2017 presidential campaign period in Kenya. This resulted in a goldstandard codeswitched dataset that could be used for comparative and empirical studies in supervised machine learning. The resulting classifiers trained on this dataset could be used to provide real-time monitoring of hate speech spikes on social media and inform data-driven decision-making by relevant security agencies in government.

Download Full-text

Cyber Bullying Detection on Social Media using Machine Learning

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217381 ◽

2021 ◽

pp. 410-416

Author(s):

K. Mahesh ◽

Suwarna Gothane ◽

Aashish Toshniwal ◽

Vinay Nagarale ◽

Harish Gopu

Keyword(s):

Machine Learning ◽

Social Media ◽

Social Networking ◽

Age Groups ◽

Cyber Attacks ◽

Cyber Bullying ◽

Social Networking Websites ◽

The Social ◽

In The Beginning

From the day internet came into existence, the era of social networking sprouted. In the beginning, no one may have thought internet would be a host of numerous amazing services like the social networking. Today we can say that online applications and social networking websites have become a non-separable part of one’s life. Many people from diverse age groups spend hours daily on such websites. Despite the fact that people are emotionally connected together through social media, these facilities bring along big threats with them such as cyber-attacks, which includes cyberbullying.

Download Full-text