Utilising Flow Aggregation to Classify Benign Imitating Attacks

Cyber-attacks continue to grow, both in terms of volume and sophistication. This is aided by an increase in available computational power, expanding attack surfaces, and advancements in the human understanding of how to make attacks undetectable. Unsurprisingly, machine learning is utilised to defend against these attacks. In many applications, the choice of features is more important than the choice of model. A range of studies have, with varying degrees of success, attempted to discriminate between benign traffic and well-known cyber-attacks. The features used in these studies are broadly similar and have demonstrated their effectiveness in situations where cyber-attacks do not imitate benign behaviour. To overcome this barrier, in this manuscript, we introduce new features based on a higher level of abstraction of network traffic. Specifically, we perform flow aggregation by grouping flows with similarities. This additional level of feature abstraction benefits from cumulative information, thus qualifying the models to classify cyber-attacks that mimic benign traffic. The performance of the new features is evaluated using the benchmark CICIDS2017 dataset, and the results demonstrate their validity and effectiveness. This novel proposal will improve the detection accuracy of cyber-attacks and also build towards a new direction of feature extraction for complex ones.

Download Full-text

SCADA System Testbed for Cybersecurity Research Using Machine Learning Approach

Future Internet ◽

10.3390/fi10080076 ◽

2018 ◽

Vol 10 (8) ◽

pp. 76 ◽

Cited By ~ 12

Author(s):

Marcio Teixeira ◽

Tara Salman ◽

Maede Zolanvari ◽

Raj Jain ◽

Nader Meskin ◽

...

Keyword(s):

Machine Learning ◽

Supervisory Control ◽

Network Traffic ◽

Learning Algorithms ◽

Cyber Attacks ◽

Machine Learning Algorithms ◽

Learning Models ◽

Scada System ◽

Machine Learning Approach ◽

Machine Learning Models

This paper presents the development of a Supervisory Control and Data Acquisition (SCADA) system testbed used for cybersecurity research. The testbed consists of a water storage tank’s control system, which is a stage in the process of water treatment and distribution. Sophisticated cyber-attacks were conducted against the testbed. During the attacks, the network traffic was captured, and features were extracted from the traffic to build a dataset for training and testing different machine learning algorithms. Five traditional machine learning algorithms were trained to detect the attacks: Random Forest, Decision Tree, Logistic Regression, Naïve Bayes and KNN. Then, the trained machine learning models were built and deployed in the network, where new tests were made using online network traffic. The performance obtained during the training and testing of the machine learning models was compared to the performance obtained during the online deployment of these models in the network. The results show the efficiency of the machine learning models in detecting the attacks in real time. The testbed provides a good understanding of the effects and consequences of attacks on real SCADA environments.

Download Full-text

Log Message Anomaly Detection with Oversampling

10.31224/osf.io/d4e6a ◽

2020 ◽

Author(s):

Amir Farzad ◽

T. Aaron Gulliver

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Anomaly Detection ◽

Learning Algorithms ◽

Imbalanced Data ◽

Machine Learning Algorithms ◽

Detection Accuracy ◽

Data Sets ◽

Significant Challenge ◽

Proposed Model

Imbalanced data is a significant challenge in classification with machine learning algorithms. This is particularly important with log message data as negative logs are sparse so this data is typically imbalanced. In this paper, a model to generate text log messages is proposed which employs a SeqGAN network. An Autoencoder is used for feature extraction and anomaly detection is done using a GRU network. The proposed model is evaluated with three imbalanced log data sets, namely BGL, OpenStack, and Thunderbird. Results are presented which show that appropriate oversampling and data balancing improves anomaly detection accuracy.

Download Full-text

Hybrid rule-based botnet detection approach using machine learning for analysing DNS traffic

PeerJ Computer Science ◽

10.7717/peerj-cs.640 ◽

2021 ◽

Vol 7 ◽

pp. e640

Author(s):

Saif Al-mashhadi ◽

Mohammed Anbar ◽

Iznan Hasbullah ◽

Taief Alaa Alamiedy

Keyword(s):

Machine Learning ◽

False Positive ◽

False Positive Rate ◽

Communication Protocols ◽

Cyber Attacks ◽

Machine Learning Algorithms ◽

Detection Accuracy ◽

Botnet Detection ◽

Internet Service ◽

Positive Rate

Botnets can simultaneously control millions of Internet-connected devices to launch damaging cyber-attacks that pose significant threats to the Internet. In a botnet, bot-masters communicate with the command and control server using various communication protocols. One of the widely used communication protocols is the ‘Domain Name System’ (DNS) service, an essential Internet service. Bot-masters utilise Domain Generation Algorithms (DGA) and fast-flux techniques to avoid static blacklists and reverse engineering while remaining flexible. However, botnet’s DNS communication generates anomalous DNS traffic throughout the botnet life cycle, and such anomaly is considered an indicator of DNS-based botnets presence in the network. Despite several approaches proposed to detect botnets based on DNS traffic analysis; however, the problem still exists and is challenging due to several reasons, such as not considering significant features and rules that contribute to the detection of DNS-based botnet. Therefore, this paper examines the abnormality of DNS traffic during the botnet lifecycle to extract significant enriched features. These features are further analysed using two machine learning algorithms. The union of the output of two algorithms proposes a novel hybrid rule detection model approach. Two benchmark datasets are used to evaluate the performance of the proposed approach in terms of detection accuracy and false-positive rate. The experimental results show that the proposed approach has a 99.96% accuracy and a 1.6% false-positive rate, outperforming other state-of-the-art DNS-based botnet detection approaches.

Download Full-text

A malicious URLs detection system using optimization and machine learning classifiers

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v17.i3.pp1210-1214 ◽

2020 ◽

Vol 17 (3) ◽

pp. 1210

Author(s):

Ong Vienna Lee ◽

Ahmad Heryanto ◽

Mohd Faizal Ab Razak ◽

Anis Farihan Mat Raffei ◽

Danakorn Nincarean Eh Phon ◽

...

Keyword(s):

Machine Learning ◽

Detection System ◽

Cyber Attacks ◽

Support Vector ◽

Detection Accuracy ◽

Learning Approach ◽

Internet Users ◽

Analysis Technique ◽

Machine Learning Approach ◽

High Detection

<span>The openness of the World Wide Web (Web) has become more exposed to cyber-attacks. An attacker performs the cyber-attacks on Web using malware Uniform Resource Locators (URLs) since it widely used by internet users. Therefore, a significant approach is required to detect malicious URLs and identify their nature attack. This study aims to assess the efficiency of the machine learning approach to detect and identify malicious URLs. In this study, we applied features optimization approaches by using a bio-inspired algorithm for selecting significant URL features which able to detect malicious URLs applications. By using machine learning approach with static analysis technique is used for detecting malicious URLs applications. Based on this combination as well as significant features, this paper shows promising results with higher detection accuracy. The bio-inspired algorithm: particle swarm optimization (PSO) is used to optimized URLs features. In detecting malicious URLs, it shows that naïve Bayes and support vector machine (SVM) are able to achieve high detection accuracy with rate value of 99%, using URL as a feature.</span>

Download Full-text

SAAE-DNN: Deep Learning Method on Intrusion Detection

Symmetry ◽

10.3390/sym12101695 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1695

Author(s):

Chaofei Tang ◽

Nurbol Luktarhan ◽

Yuxin Zhao

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Intrusion Detection ◽

Decision Tree ◽

Binary Classification ◽

Attention Mechanism ◽

Detection Methods ◽

Detection Accuracy ◽

Multi Classification

Intrusion detection system (IDS) plays a significant role in preventing network attacks and plays a vital role in the field of national security. At present, the existing intrusion detection methods are generally based on traditional machine learning models, such as random forest and decision tree, but they rely heavily on artificial feature extraction and have relatively low accuracy. To solve the problems of feature extraction and low detection accuracy in intrusion detection, an intrusion detection model SAAE-DNN, based on stacked autoencoder (SAE), attention mechanism and deep neural network (DNN), is proposed. The SAE represents data with a latent layer, and the attention mechanism enables the network to obtain the key features of intrusion detection. The trained SAAE encoder can not only automatically extract features, but also initialize the weights of DNN potential layers to improve the detection accuracy of DNN. We evaluate the performance of SAAE-DNN in binary-classification and multi-classification on an NSL-KDD dataset. The SAAE-DNN model can detect normally and attack symmetrically, with an accuracy of 87.74% and 82.14% (binary-classification and multi-classification), which is higher than that of machine learning methods such as random forest and decision tree. The experimental results show that the model has a better performance than other comparison methods.

Download Full-text

Infill Defective Detection System Augmented by Semi-Supervised Learning

Volume 2B: Advanced Manufacturing ◽

10.1115/imece2020-23249 ◽

2020 ◽

Author(s):

Jinwoo Song ◽

Young B. Moon

Keyword(s):

Feature Extraction ◽

Additive Manufacturing ◽

Supervised Learning ◽

Network Model ◽

Detection System ◽

Cyber Attacks ◽

Training Data ◽

Detection Accuracy ◽

Data Sets ◽

Detection Systems

Abstract In an effort to identify cyber-attacks on infill structures, detection systems based on supervised learning have been attempted in Additive Manufacturing (AM) security investigations. However, supervised learning requires a myriad of training data sets to achieve acceptable detection accuracy. Besides, since it is impossible to train for unprecedented defective types, the detection systems cannot guarantee robustness against unforeseen attacks. To overcome such disadvantages of supervised learning, This paper presents infill defective detection system (IDDS) augmented by semi-supervised learning. Semi-supervised learning allows classifying a sheer volume of unlabeled data sets by training a comparably small number of labeled data sets. Additionally, IDDS exploits self-training to increase the robustness against various defective types that are not pre-trained. IDDS consists of the feature extraction, pre-training, self-training. To validate the usefulness of IDDS, five defective types were designed and tested with IDDS, which was trained by only normal labeled data sets. The results are compared with the basis accuracy from the perceptron network model with supervised learning.

Download Full-text

Feature extraction and prediction of Dengue Outbreaks

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206544 ◽

2020 ◽

pp. 216-222

Author(s):

Kunal Parikh ◽

Tanvi Makadia ◽

Harshil Patel

Keyword(s):

Public Health ◽

Machine Learning ◽

Developing Countries ◽

Feature Extraction ◽

Predictive Analytics ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Health Concerns ◽

The World ◽

Dengue Outbreaks

Dengue is unquestionably one of the biggest health concerns in India and for many other developing countries. Unfortunately, many people have lost their lives because of it. Every year, approximately 390 million dengue infections occur around the world among which 500,000 people are seriously infected and 25,000 people have died annually. Many factors could cause dengue such as temperature, humidity, precipitation, inadequate public health, and many others. In this paper, we are proposing a method to perform predictive analytics on dengue’s dataset using KNN: a machine-learning algorithm. This analysis would help in the prediction of future cases and we could save the lives of many.

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

OutlierNets: Highly Compact Deep Autoencoder Network Architectures for On-Device Acoustic Anomaly Detection

Sensors ◽

10.3390/s21144805 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4805

Author(s):

Saad Abbasi ◽

Mahmoud Famouri ◽

Mohammad Javad Shafiee ◽

Alexander Wong

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Detection Methods ◽

Detection Accuracy ◽

Network Architectures ◽

Design Exploration ◽

Convolutional Autoencoder ◽

Acoustic Anomaly ◽

Human Operators ◽

Computational Resources

Human operators often diagnose industrial machinery via anomalous sounds. Given the new advances in the field of machine learning, automated acoustic anomaly detection can lead to reliable maintenance of machinery. However, deep learning-driven anomaly detection methods often require an extensive amount of computational resources prohibiting their deployment in factories. Here we explore a machine-driven design exploration strategy to create OutlierNets, a family of highly compact deep convolutional autoencoder network architectures featuring as few as 686 parameters, model sizes as small as 2.7 KB, and as low as 2.8 million FLOPs, with a detection accuracy matching or exceeding published architectures with as many as 4 million parameters. The architectures are deployed on an Intel Core i5 as well as a ARM Cortex A72 to assess performance on hardware that is likely to be used in industry. Experimental results on the model’s latency show that the OutlierNet architectures can achieve as much as 30x lower latency than published networks.

Download Full-text

Deep Transfer Learning Based Intrusion Detection System for Electric Vehicular Networks

Sensors ◽

10.3390/s21144736 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4736

Author(s):

Sk. Tanzir Mehedi ◽

Adnan Anwar ◽

Ziaur Rahman ◽

Kawsar Ahmed

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Intrusion Detection ◽

Real Time ◽

Transfer Learning ◽

Security Requirements ◽

Detection Accuracy ◽

Area Network ◽

Complex Data ◽

Network Intrusion

The Controller Area Network (CAN) bus works as an important protocol in the real-time In-Vehicle Network (IVN) systems for its simple, suitable, and robust architecture. The risk of IVN devices has still been insecure and vulnerable due to the complex data-intensive architectures which greatly increase the accessibility to unauthorized networks and the possibility of various types of cyberattacks. Therefore, the detection of cyberattacks in IVN devices has become a growing interest. With the rapid development of IVNs and evolving threat types, the traditional machine learning-based IDS has to update to cope with the security requirements of the current environment. Nowadays, the progression of deep learning, deep transfer learning, and its impactful outcome in several areas has guided as an effective solution for network intrusion detection. This manuscript proposes a deep transfer learning-based IDS model for IVN along with improved performance in comparison to several other existing models. The unique contributions include effective attribute selection which is best suited to identify malicious CAN messages and accurately detect the normal and abnormal activities, designing a deep transfer learning-based LeNet model, and evaluating considering real-world data. To this end, an extensive experimental performance evaluation has been conducted. The architecture along with empirical analyses shows that the proposed IDS greatly improves the detection accuracy over the mainstream machine learning, deep learning, and benchmark deep transfer learning models and has demonstrated better performance for real-time IVN security.

Download Full-text