Exploiting the Outcome of Outlier Detection for Novel Attack Pattern Recognition on Streaming Data

Michael Heigl; Enrico Weigelt; Andreas Urmann; Dalibor Fiala; Martin Schramm

doi:10.3390/electronics10172160

Exploiting the Outcome of Outlier Detection for Novel Attack Pattern Recognition on Streaming Data

Electronics ◽

10.3390/electronics10172160 ◽

2021 ◽

Vol 10 (17) ◽

pp. 2160

Author(s):

Michael Heigl ◽

Enrico Weigelt ◽

Andreas Urmann ◽

Dalibor Fiala ◽

Martin Schramm

Keyword(s):

Pattern Recognition ◽

Network Security ◽

Outlier Detection ◽

Concept Drift ◽

Streaming Data ◽

Alert Correlation ◽

Attack Pattern ◽

Correlation Methods ◽

Attack Patterns ◽

The One

Future-oriented networking infrastructures are characterized by highly dynamic Streaming Data (SD) whose volume, speed and number of dimensions increased significantly over the past couple of years, energized by trends such as Software-Defined Networking or Artificial Intelligence. As an essential core component of network security, Intrusion Detection Systems (IDS) help to uncover malicious activity. In particular, consecutively applied alert correlation methods can aid in mining attack patterns based on the alerts generated by IDS. However, most of the existing methods lack the functionality to deal with SD data affected by the phenomenon called concept drift and are mainly designed to operate on the output from signature-based IDS. Although unsupervised Outlier Detection (OD) methods have the ability to detect yet unknown attacks, most of the alert correlation methods cannot handle the outcome of such anomaly-based IDS. In this paper, we introduce a novel framework called Streaming Outlier Analysis and Attack Pattern Recognition, denoted as SOAAPR, which is able to process the output of various online unsupervised OD methods in a streaming fashion to extract information about novel attack patterns. Three different privacy-preserving, fingerprint-like signatures are computed from the clustered set of correlated alerts by SOAAPR, which characterizes and represents the potential attack scenarios with respect to their communication relations, their manifestation in the data's features and their temporal behavior. Beyond the recognition of known attacks, comparing derived signatures, they can be leveraged to find similarities between yet unknown and novel attack patterns. The evaluation, which is split into two parts, takes advantage of attack scenarios from the widely-used and popular CICIDS2017 and CSE‐CIC‐IDS2018 datasets. Firstly, the streaming alert correlation capability is evaluated on CICIDS2017 and compared to a state-of-the-art offline algorithm, called Graph-based Alert Correlation (GAC), which has the potential to deal with the outcome of anomaly-based IDS. Secondly, the three types of signatures are computed from attack scenarios in the datasets and compared to each other. The discussion of results, on the one hand, shows that SOAAPR can compete with GAC in terms of alert correlation capability leveraging four different metrics and outperforms it significantly in terms of processing time by an average factor of 70 in 11 attack scenarios. On the other hand, in most cases, all three types of signatures seem to reliably characterize attack scenarios such that similar ones are grouped together, with up to 99.05\% similarity between the FTP and SSH Patator attack.intrusion detection; alert analysis; alert correlation; outlier detection; attack scenario; streaming data; network security

Download Full-text

Designing a Streaming Algorithm for Outlier Detection in Data Mining—An Incrementa Approach

Sensors ◽

10.3390/s20051261 ◽

2020 ◽

Vol 20 (5) ◽

pp. 1261 ◽

Cited By ~ 1

Author(s):

Kangqing Yu ◽

Wei Shi ◽

Nicola Santoro

Keyword(s):

Real Time ◽

Outlier Detection ◽

Complexity Analysis ◽

Concept Drift ◽

Area Under The Curve ◽

Sliding Window ◽

Detection Algorithm ◽

Streaming Data ◽

Processing Unit ◽

Time Data

To design an algorithm for detecting outliers over streaming data has become an important task in many common applications, arising in areas such as fraud detections, network analysis, environment monitoring and so forth. Due to the fact that real-time data may arrive in the form of streams rather than batches, properties such as concept drift, temporal context, transiency, and uncertainty need to be considered. In addition, data processing needs to be incremental with limited memory resource, and scalable. These facts create big challenges for existing outlier detection algorithms in terms of their accuracies when they are implemented in an incremental fashion, especially in the streaming environment. To address these problems, we first propose C_KDE_WR, which uses sliding window and kernel function to process the streaming data online, and reports its results demonstrating high throughput on handling real-time streaming data, implemented in a CUDA framework on Graphics Processing Unit (GPU). We also present another algorithm, C_LOF, based on a very popular and effective outlier detection algorithm called Local Outlier Factor (LOF) which unfortunately works only on batched data. Using a novel incremental approach that compensates the drawback of high complexity in LOF, we show how to implement it in a streaming context and to obtain results in a timely manner. Like C_KDE_WR, C_LOF also employs sliding-window and statistical-summary to help making decision based on the data in the current window. It also addresses all those challenges of streaming data as addressed in C_KDE_WR. In addition, we report the comparative evaluation on the accuracy of C_KDE_WR with the state-of-the-art SOD_GPU using Precision, Recall and F-score metrics. Furthermore, a t-test is also performed to demonstrate the significance of the improvement. We further report the testing results of C_LOF on different parameter settings and drew ROC and PR curve with their area under the curve (AUC) and Average Precision (AP) values calculated respectively. Experimental results show that C_LOF can overcome the masquerading problem, which often exists in outlier detection on streaming data. We provide complexity analysis and report experiment results on the accuracy of both C_KDE_WR and C_LOF algorithms in order to evaluate their effectiveness as well as their efficiencies.

Download Full-text

Streaming Data Classification using Hybrid Classifiers to tackle Stability-Plasticity Dilemma and Concept Drift

2020 IEEE 4th Conference on Information & Communication Technology (CICT) ◽

10.1109/cict51604.2020.9312077 ◽

2020 ◽

Author(s):

A L Amutha ◽

R Annie Uthra ◽

J Preetha Roselyn ◽

R Golda Brunet

Keyword(s):

Concept Drift ◽

Data Classification ◽

Streaming Data ◽

Hybrid Classifiers

Download Full-text

Handling adversarial concept drift in streaming data

Expert Systems with Applications ◽

10.1016/j.eswa.2017.12.022 ◽

2018 ◽

Vol 97 ◽

pp. 18-40 ◽

Cited By ~ 14

Author(s):

Tegjyot Singh Sethi ◽

Mehmed Kantardzic

Keyword(s):

Concept Drift ◽

Streaming Data

Download Full-text

Deep learning framework for handling concept drift and class imbalanced complex decision-making on streaming data

Complex & Intelligent Systems ◽

10.1007/s40747-021-00456-0 ◽

2021 ◽

Author(s):

S. Priya ◽

R. Annie Uthra

Keyword(s):

Decision Making ◽

Deep Learning ◽

Concept Drift ◽

Class Imbalance ◽

Streaming Data ◽

Superior Performance ◽

Data Streaming ◽

Minority Class ◽

Concept Drift Detection

AbstractIn present times, data science become popular to support and improve decision-making process. Due to the accessibility of a wide application perspective of data streaming, class imbalance and concept drifting become crucial learning problems. The advent of deep learning (DL) models finds useful for the classification of concept drift in data streaming applications. This paper presents an effective class imbalance with concept drift detection (CIDD) using Adadelta optimizer-based deep neural networks (ADODNN), named CIDD-ADODNN model for the classification of highly imbalanced streaming data. The presented model involves four processes namely preprocessing, class imbalance handling, concept drift detection, and classification. The proposed model uses adaptive synthetic (ADASYN) technique for handling class imbalance data, which utilizes a weighted distribution for diverse minority class examples based on the level of difficulty in learning. Next, a drift detection technique called adaptive sliding window (ADWIN) is employed to detect the existence of the concept drift. Besides, ADODNN model is utilized for the classification processes. For increasing the classifier performance of the DNN model, ADO-based hyperparameter tuning process takes place to determine the optimal parameters of the DNN model. The performance of the presented model is evaluated using three streaming datasets namely intrusion detection (NSL KDDCup) dataset, Spam dataset, and Chess dataset. A detailed comparative results analysis takes place and the simulation results verified the superior performance of the presented model by obtaining a maximum accuracy of 0.9592, 0.9320, and 0.7646 on the applied KDDCup, Spam, and Chess dataset, respectively.

Download Full-text

RECONFIGURABLE SELF-ADDRESSABLE MEMORY-BASED FSM A SCALABLE INTRUSION DETECTION ENGINE

International Journal of Smart Sensor and Adhoc Network. ◽

10.47893/ijssan.2012.1153 ◽

2012 ◽

pp. 144-151

Author(s):

B. SRILATHA ◽

KRISHNA KISHORE

Keyword(s):

High Speed ◽

Finite State Machine ◽

State Machine ◽

Network Attack ◽

Memory Efficiency ◽

Attack Pattern ◽

Current State ◽

Incoming Packet ◽

Finite State ◽

Attack Patterns

One way to detect and thwart a network attack is to compare each incoming packet with predefined patterns, also Called an attack pattern database, and raise an alert upon detecting a match. This article presents a novel pattern-matching Engine that exploits a memory-based, programmable state machine to achieve deterministic processing rates that are Independent of packet and pattern characteristics. Our engine is a self addressable memory based finite state machine (samFsm), whose current state coding exhibits all its possible next states. Moreover, it is fully reconfigurable in that new attack Patterns can be updated easily. A methodology was developed to program the memory and logic. Specifically, we merge “non-equivalent” states by introducing “super characters” on their inputs to further enhance memory efficiency without Adding labels. This is the most high speed self addressable memory based fsm.sam-fsm is one of the most storage-Efficient machines and reduces the memory requirement by 60 times. Experimental results are presented to demonstrate the Validity of sam-fsm.

Download Full-text

Concept Drift Detection on Streaming Data with Dynamic Outlier Aggregation

Lecture Notes in Business Information Processing - Process Mining Workshops ◽

10.1007/978-3-030-72693-5_16 ◽

2021 ◽

pp. 206-217

Author(s):

Ludwig Zellner ◽

Florian Richter ◽

Janina Sontheim ◽

Andrea Maldonado ◽

Thomas Seidl

Keyword(s):

Concept Drift ◽

Streaming Data ◽

Concept Drift Detection

Download Full-text

Investigating Brute Force Attack Patterns in IoT Network

Journal of Electrical and Computer Engineering ◽

10.1155/2019/4568368 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Deris Stiawan ◽

Mohd. Yazid Idris ◽

Reza Firsandaya Malik ◽

Siti Nurmaini ◽

Nizar Alsharif ◽

...

Keyword(s):

Brute Force ◽

File Transfer ◽

Iot Security ◽

Insider Attack ◽

Attack Pattern ◽

Internal Network ◽

Attack Patterns ◽

Iot Devices ◽

Set Up ◽

Brute Force Attack

Internet of Things (IoT) devices may transfer data to the gateway/application server through File Transfer Protocol (FTP) transaction. Unfortunately, in terms of security, the FTP server at a gateway or data sink very often is improperly set up. At the same time, password matching/theft holding is among the popular attacks as the intruders attack the IoT network. Thus, this paper attempts to provide an insight of this type of attack with the main aim of coming up with attack patterns that may help the IoT system administrator to analyze any similar attacks. This paper investigates brute force attack (BFA) on the FTP server of the IoT network by using a time-sensitive statistical relationship approach and visualizing the attack patterns that identify its configurations. The investigation focuses on attacks launched from the internal network, due to the assumption that the IoT network has already installed a firewall. An insider/internal attack launched from an internal network endangers more the entire IoT security system. The experiments use the IoT network testbed that mimic the internal attack scenario with three major goals: (i) to provide a topological description on how an insider attack occurs; (ii) to achieve attack pattern extraction from raw sniffed data; and (iii) to establish attack pattern identification as a parameter to visualize real-time attacks. Experimental results validate the investigation.

Download Full-text

Data-driven decision support under concept drift in streamed big data

Complex & Intelligent Systems ◽

10.1007/s40747-019-00124-4 ◽

2019 ◽

Vol 6 (1) ◽

pp. 157-163 ◽

Cited By ~ 2

Author(s):

Jie Lu ◽

Anjin Liu ◽

Yiliao Song ◽

Guangquan Zhang

Keyword(s):

Decision Making ◽

Big Data ◽

Real Time ◽

Concept Drift ◽

High Volume ◽

Streaming Data ◽

Data Driven ◽

Research Directions ◽

Decision Outcomes ◽

Past Data

Abstract Data-driven decision-making ($$\mathrm {D^3}$$D3M) is often confronted by the problem of uncertainty or unknown dynamics in streaming data. To provide real-time accurate decision solutions, the systems have to promptly address changes in data distribution in streaming data—a phenomenon known as concept drift. Past data patterns may not be relevant to new data when a data stream experiences significant drift, thus to continue using models based on past data will lead to poor prediction and poor decision outcomes. This position paper discusses the basic framework and prevailing techniques in streaming type big data and concept drift for $$\mathrm {D^3}$$D3M. The study first establishes a technical framework for real-time $$\mathrm {D^3}$$D3M under concept drift and details the characteristics of high-volume streaming data. The main methodologies and approaches for detecting concept drift and supporting $$\mathrm {D^3}$$D3M are highlighted and presented. Lastly, further research directions, related methods and procedures for using streaming data to support decision-making in concept drift environments are identified. We hope the observations in this paper could support researchers and professionals to better understand the fundamentals and research directions of $$\mathrm {D^3}$$D3M in streamed big data environments.

Download Full-text