A Survey on Data-Driven Learning for Intelligent Network Intrusion Detection Systems

Ghada Abdelmoumin; Jessica Whitaker; Danda B. Rawat; Abdul Rahman

doi:10.3390/electronics11020213

A Survey on Data-Driven Learning for Intelligent Network Intrusion Detection Systems

Electronics ◽

10.3390/electronics11020213 ◽

2022 ◽

Vol 11 (2) ◽

pp. 213

Author(s):

Ghada Abdelmoumin ◽

Jessica Whitaker ◽

Danda B. Rawat ◽

Abdul Rahman

Keyword(s):

Intrusion Detection ◽

Real Time ◽

Data Augmentation ◽

Synthetic Data ◽

Skewed Distribution ◽

Rapid Review ◽

Time Data ◽

Adversarial Learning ◽

Network Intrusion ◽

Real Time Data

An effective anomaly-based intelligent IDS (AN-Intel-IDS) must detect both known and unknown attacks. Hence, there is a need to train AN-Intel-IDS using dynamically generated, real-time data in an adversarial setting. Unfortunately, the public datasets available to train AN-Intel-IDS are ineluctably static, unrealistic, and prone to obsolescence. Further, the need to protect private data and conceal sensitive data features has limited data sharing, thus encouraging the use of synthetic data for training predictive and intrusion detection models. However, synthetic data can be unrealistic and potentially bias. On the other hand, real-time data are realistic and current; however, it is inherently imbalanced due to the uneven distribution of anomalous and non-anomalous examples. In general, non-anomalous or normal examples are more frequent than anomalous or attack examples, thus leading to skewed distribution. While imbalanced data are commonly predominant in intrusion detection applications, it can lead to inaccurate predictions and degraded performance. Furthermore, the lack of real-time data produces potentially biased models that are less effective in predicting unknown attacks. Therefore, training AN-Intel-IDS using imbalanced and adversarial learning is instrumental to their efficacy and high performance. This paper investigates imbalanced learning and adversarial learning for training AN-Intel-IDS using a qualitative study. It surveys and synthesizes generative-based data augmentation techniques for addressing the uneven data distribution and generative-based adversarial techniques for generating synthetic yet realistic data in an adversarial setting using rapid review, structured reporting, and subgroup analysis.

Download Full-text

Big Data Management in the Context of Real-Time Data Warehousing

Big Data Management, Technologies, and Applications - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-4699-5.ch007 ◽

2013 ◽

pp. 150-176

Author(s):

M. Asif Naeem ◽

Gillian Dobbie ◽

Gerald Weber

Keyword(s):

Big Data ◽

Data Integration ◽

Real Time ◽

Real Life ◽

Skewed Distribution ◽

Stream Data ◽

Time Data ◽

Master Data ◽

Real Time Data ◽

Resource Aware

In order to make timely and effective decisions, businesses need the latest information from big data warehouse repositories. To keep these repositories up to date, real-time data integration is required. An important phase in real-time data integration is data transformation where a stream of updates, which is huge in volume and infinite, is joined with large disk-based master data. Stream processing is an important concept in Big Data, since large volumes of data are often best processed immediately. A well-known algorithm called Mesh Join (MESHJOIN) was proposed to process stream data with disk-based master data, which uses limited memory. MESHJOIN is a candidate for a resource-aware system setup. The problem that the authors consider in this chapter is that MESHJOIN is not very selective. In particular, the performance of the algorithm is always inversely proportional to the size of the master data table. As a consequence, the resource consumption is in some scenarios suboptimal. They present an algorithm called Cache Join (CACHEJOIN), which performs asymptotically at least as well as MESHJOIN but performs better in realistic scenarios, particularly if parts of the master data are used with different frequencies. In order to quantify the performance differences, the authors compare both algorithms with a synthetic dataset of a known skewed distribution as well as TPC-H and real-life datasets.

Download Full-text

Real time data mining-based intrusion detection

Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01 ◽

10.1109/discex.2001.932195 ◽

2002 ◽

Cited By ~ 26

Author(s):

Wenke Lee ◽

S.J. Stolfo ◽

P.K. Chan ◽

E. Eskin ◽

Wei Fan ◽

...

Keyword(s):

Data Mining ◽

Intrusion Detection ◽

Real Time ◽

Time Data ◽

Real Time Data

Download Full-text

An Innovative Method to Extract Data in a Real-time Data Warehousing Environment

10.5121/csit.2021.112401 ◽

2021 ◽

Author(s):

Flavio de Assis Vilela ◽

Ricardo Rodrigues Ciferri

Keyword(s):

Real Time ◽

Data Warehousing ◽

Data Extraction ◽

Synthetic Data ◽

Knowledge Discovery In Databases ◽

Data Repository ◽

Time Data ◽

Innovative Method ◽

Real Time Data ◽

Time Requirements

ETL (Extract, Transform, and Load) is an essential process required to perform data extraction in knowledge discovery in databases and in data warehousing environments. The ETL process aims to gather data that is available from operational sources, process and store them into an integrated data repository. Also, the ETL process can be performed in a real-time data warehousing environment and store data into a data warehouse. This paper presents a new and innovative method named Data Extraction Magnet (DEM) to perform the extraction phase of ETL process in a real-time data warehousing environment based on non-intrusive, tag and parallelism concepts. DEM has been validated on a dairy farming domain using synthetic data. The results showed a great performance gain in comparison to the traditional trigger technique and the attendance of real-time requirements.

Download Full-text

HYBRIDJOIN for Near-Real-Time Data Warehousing

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2011100102 ◽

2011 ◽

Vol 7 (4) ◽

pp. 21-42 ◽

Cited By ~ 13

Author(s):

M. Asif Naeem ◽

Gillian Dobbie ◽

Gerald Weber

Keyword(s):

Real Time ◽

Data Stream ◽

Time Integration ◽

Synthetic Data ◽

Performance Measurements ◽

Time Data ◽

Join Algorithm ◽

Real Time Data ◽

Set Up ◽

Nested Loop

An important component of near-real-time data warehouses is the near-real-time integration layer. One important element in near-real-time data integration is the join of a continuous input data stream with a disk-based relation. For high-throughput streams, stream-based algorithms, such as Mesh Join (MESHJOIN), can be used. However, in MESHJOIN the performance of the algorithm is inversely proportional to the size of disk-based relation. The Index Nested Loop Join (INLJ) can be set up so that it processes stream input, and can deal with intermittences in the update stream but it has low throughput. This paper introduces a robust stream-based join algorithm called Hybrid Join (HYBRIDJOIN), which combines the two approaches. A theoretical result shows that HYBRIDJOIN is asymptotically as fast as the fastest of both algorithms. The authors present performance measurements of the implementation. In experiments using synthetic data based on a Zipfian distribution, HYBRIDJOIN performs significantly better for typical parameters of the Zipfian distribution, and in general performs in accordance with the theoretical model while the other two algorithms are unacceptably slow under different settings.

Download Full-text

Real-Time Data Augmentation Based Transfer Learning Model for Breast Cancer Diagnosis Using Histopathological Images

Advances in Biomedical Engineering and Technology - Lecture Notes in Bioengineering ◽

10.1007/978-981-15-6329-4_39 ◽

2020 ◽

pp. 473-488

Author(s):

Rishi Rai ◽

Dilip Singh Sisodia

Keyword(s):

Breast Cancer ◽

Real Time ◽

Transfer Learning ◽

Cancer Diagnosis ◽

Data Augmentation ◽

Breast Cancer Diagnosis ◽

Learning Model ◽

Time Data ◽

Real Time Data ◽

Histopathological Images

Download Full-text

HYBRIDJOIN for Near-Real-Time Data Warehousing

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch013 ◽

2013 ◽

pp. 280-302

Author(s):

M. Asif Naeem ◽

Gillian Dobbie ◽

Gerald Weber

Keyword(s):

Real Time ◽

Input Data ◽

Time Integration ◽

Synthetic Data ◽

Performance Measurements ◽

Time Data ◽

Join Algorithm ◽

Real Time Data ◽

Set Up ◽

Nested Loop

Download Full-text

Real Time Call Monitoring System Using Spark Streaming and Network Intrusion Detection Using Distributed WekaSpark

Journal of Machine Intelligence ◽

10.21174/jomi.v2i1.99 ◽

2017 ◽

Vol 2 (1) ◽

pp. 7-13

Author(s):

Darshan V S ◽

Ria Raphael

Keyword(s):

Intrusion Detection ◽

Real Time ◽

Time Data ◽

Real Time Processing ◽

Network Failure ◽

Network Intrusion ◽

Call Usage ◽

Call Monitoring ◽

Made In

With the increase of calls in industries it is very difficult to identify the calls made in a huge organization. The study and developing analytics out of the call history generated in terms of real time or the information stored helps in the improvement of the quality of calls in terms of network failure analysis, analysing call usage pattern from minimal to maximum to increase server efficiency, analyse user level pattern. The capability to process, analyse and evaluate real time data in a system is a challenging task, the test of building up an adaptable, shortcoming tolerant and flexible observing framework that Can deal with information continuously and at a huge scale is nontrivial. We exhibit a novel framework for real time processing and batch processing by using spark streaming and spark, also an ensemble model is used with distributed weka-spark for intrusion detection.

Download Full-text

AmpliconNet: Sequence Based Multi-layer Perceptron for Amplicon Read Classification Using Real-time Data Augmentation

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2018.8621287 ◽

2018 ◽

Author(s):

Ali Kishk ◽

Mohamed El-Hadidi

Keyword(s):

Real Time ◽

Data Augmentation ◽

Multi Layer Perceptron ◽

Time Data ◽

Real Time Data

Download Full-text

399-P: Nurse-Driven Intervention Using Real-Time Data to Improve Hypoglycemia Rates in a Respiratory Care Unit

Diabetes ◽

10.2337/db20-399-p ◽

2020 ◽

Vol 69 (Supplement 1) ◽

pp. 399-P

Author(s):

ANN MARIE HASSE ◽

RIFKA SCHULMAN ◽

TORI CALDER

Keyword(s):

Real Time ◽

Respiratory Care ◽

Time Data ◽

Real Time Data

Download Full-text

Real-time data stream analysis and entire process quality monitoring based on plant information

Journal of Computer Applications ◽

10.3724/sp.j.1087.2012.02935 ◽

2013 ◽

Vol 32 (10) ◽

pp. 2935-2939

Author(s):

Xiao-yong BIAN ◽

Xiao-long ZHANG ◽

Hai YU

Keyword(s):

Real Time ◽

Data Stream ◽

Process Quality ◽

Quality Monitoring ◽

Time Data ◽

Entire Process ◽

Real Time Data ◽

Data Stream Analysis

Download Full-text