scholarly journals Detecting Potential Insider Threat: Analyzing Insiders’ Sentiment Exposed in Social Media

2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Won Park ◽  
Youngin You ◽  
Kyungho Lee

In the era of Internet of Things (IoT), impact of social media is increasing gradually. With the huge progress in the IoT device, insider threat is becoming much more dangerous. Trying to find what kind of people are in high risk for the organization, about one million of tweets were analyzed by sentiment analysis methodology. Dataset made by the web service “Sentiment140” was used to find possible malicious insider. Based on the analysis of the sentiment level, users with negative sentiments were classified by the criteria and then selected as possible malicious insiders according to the threat level. Machine learning algorithms in the open-sourced machine learning software “Weka (Waikato Environment for Knowledge Analysis)” were used to find the possible malicious insider. Decision Tree had the highest accuracy among supervised learning algorithms and K-Means had the highest accuracy among unsupervised learning. In addition, we extract the frequently used words from the topic modeling technique and then verified the analysis results by matching them to the information security compliance elements. These findings can contribute to achieve higher detection accuracy by combining individual’s characteristics to the previous studies such as analyzing system behavior.

2018 ◽  
Vol 7 (2.8) ◽  
pp. 684 ◽  
Author(s):  
V V. Ramalingam ◽  
Ayantan Dandapath ◽  
M Karthik Raja

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.


Author(s):  
Muskan Patidar

Abstract: Social networking platforms have given us incalculable opportunities than ever before, and its benefits are undeniable. Despite benefits, people may be humiliated, insulted, bullied, and harassed by anonymous users, strangers, or peers. Cyberbullying refers to the use of technology to humiliate and slander other people. It takes form of hate messages sent through social media and emails. With the exponential increase of social media users, cyberbullying has been emerged as a form of bullying through electronic messages. We have tried to propose a possible solution for the above problem, our project aims to detect cyberbullying in tweets using ML Classification algorithms like Naïve Bayes, KNN, Decision Tree, Random Forest, Support Vector etc. and also we will apply the NLTK (Natural language toolkit) which consist of bigram, trigram, n-gram and unigram on Naïve Bayes to check its accuracy. Finally, we will compare the results of proposed and baseline features with other machine learning algorithms. Findings of the comparison indicate the significance of the proposed features in cyberbullying detection. Keywords: Cyber bullying, Machine Learning Algorithms, Twitter, Natural Language Toolkit


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Ari Z. Klein ◽  
Abeed Sarker ◽  
Davy Weissenbacher ◽  
Graciela Gonzalez-Hernandez

Abstract Social media has recently been used to identify and study a small cohort of Twitter users whose pregnancies with birth defect outcomes—the leading cause of infant mortality—could be observed via their publicly available tweets. In this study, we exploit social media on a larger scale by developing natural language processing (NLP) methods to automatically detect, among thousands of users, a cohort of mothers reporting that their child has a birth defect. We used 22,999 annotated tweets to train and evaluate supervised machine learning algorithms—feature-engineered and deep learning-based classifiers—that automatically distinguish tweets referring to the user’s pregnancy outcome from tweets that merely mention birth defects. Because 90% of the tweets merely mention birth defects, we experimented with under-sampling and over-sampling approaches to address this class imbalance. An SVM classifier achieved the best performance for the two positive classes: an F1-score of 0.65 for the “defect” class and 0.51 for the “possible defect” class. We deployed the classifier on 20,457 unlabeled tweets that mention birth defects, which helped identify 542 additional users for potential inclusion in our cohort. Contributions of this study include (1) NLP methods for automatically detecting tweets by users reporting their birth defect outcomes, (2) findings that an SVM classifier can outperform a deep neural network-based classifier for highly imbalanced social media data, (3) evidence that automatic classification can be used to identify additional users for potential inclusion in our cohort, and (4) a publicly available corpus for training and evaluating supervised machine learning algorithms.


2021 ◽  
pp. 68-80
Author(s):  
Muhammad Umer Hashmi ◽  
Ngoc Duy Nguyen ◽  
Michael Johnstone ◽  
Kathryn Backholer ◽  
Asim Bhatti

The internet has become an irreplaceable communicating and informative tool in the current world. With the ever-growing importance and massive use of the internet today, there has been interesting from researchers to find the perfect Cyber Attack Detection Systems (CADSs) or rather referred to as Intrusion Detection Systems (IDSs) to protect against the vulnerabilities of network security. CADS presently exist in various variants but can be largely categorized into two broad classifications; signature-based detection and anomaly detection CADSs, based on their approaches to recognize attack packets.The signature-based CADS use the well-known signatures or fingerprints of the attack packets to signal the entry across the gateways of secured networks. Signature-based CADS can only recognize threats that use the known signature, new attacks with unknown signatures can, therefore, strike without notice. Alternatively, anomaly-based CADS are enabled to detect any abnormal traffic within the network and report. There are so many ways of identifying anomalies and different machine learning algorithms are introduced to counter such threats. Most systems, however, fall short of complete attack prevention in the real world due system administration and configuration, system complexity and abuse of authorized access. Several scholars and researchers have achieved a significant milestone in the development of CADS owing to the importance of computer and network security. This paper reviews the current trends of CADS analyzing the efficiency or level of detection accuracy of the machine learning algorithms for cyber-attack detection with an aim to point out to the best. CADS is a developing research area that continues to attract several researchers due to its critical objective.


Author(s):  
Jiarui Yin ◽  
Inikuro Afa Michael ◽  
Iduabo John Afa

Machine learning plays a key role in present day crime detection, analysis and prediction. The goal of this work is to propose methods for predicting crimes classified into different categories of severity. We implemented visualization and analysis of crime data statistics in recent years in the city of Boston. We then carried out a comparative study between two supervised learning algorithms, which are decision tree and random forest based on the accuracy and processing time of the models to make predictions using geographical and temporal information provided by splitting the data into training and test sets. The result shows that random forest as expected gives a better result by 1.54% more accuracy in comparison to decision tree, although this comes at a cost of at least 4.37 times the time consumed in processing. The study opens doors to application of similar supervised methods in crime data analytics and other fields of data science


Sign in / Sign up

Export Citation Format

Share Document