Research on Detection and Recognition of Abnormal Data caused by Network Intrusion using Deep Learning

Abstract Based on deep learning, this study combined sparse autoencoder (SAE) with extreme learning machine (ELM) to design an SAE-ELM method to reduce the dimension of data features and realize the classification of different types of data. Experiments were carried out on NSL-KDD and UNSW-NB2015 data sets. The results showed that, compared with the K-means algorithm and the SVM algorithm, the proposed method had higher performance. On the NSL-KDD data set, the average accuracy rate of the SAE-ELM method was 98.93%, the false alarm rate was 0.17%, and the missing report rate was 5.36%. On the UNSW-NB2015 data set, the accuracy rate of the SAE-ELM method was 98.88%, the false alarm rate was 0.12%, and the missing report rate was 4.31%. The results show that the SAE-ELM method is effective in the detection and recognition of abnormal data and can be popularized and applied.

Download Full-text

Research on GPR image recognition based on deep learning

MATEC Web of Conferences ◽

10.1051/matecconf/202030903027 ◽

2020 ◽

Vol 309 ◽

pp. 03027

Author(s):

Zhimin Gong ◽

Huaiqing Zhang

Keyword(s):

Deep Learning ◽

Image Recognition ◽

Ground Penetrating Radar ◽

Simulation Software ◽

Accuracy Rate ◽

Data Set ◽

Average Accuracy ◽

Recognition Ability ◽

Ground Penetrating ◽

Traditional Image

It is difficult for traditional image recognition methods to accurately identify ground penetrating radar (GPR) images. This paper proposes a deep-learning based Faster R-CNN algorithm for the automatic classification and recognition of GPR images. Firstly, GPR images with different features were obtained by using gprMax, a professional GPR simulation software. Then, the feature of the target in the image was taken as the recognition object and the data set was made. Finally, Faster R-CNN’s recognition ability of GPR images was analyzed from various accuracy, average accuracy and other indicators. The results showed that Faster R-CNN could successfully identify GPR images and accurately classify them, with an average accuracy rate of 93.9%.

Download Full-text

Comparative Study of Datasets used in Cyber Security Intrusion Detection

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit2063103 ◽

2020 ◽

pp. 302-312

Author(s):

Rahul Yadav ◽

Phalguni Pathak ◽

Saumya Saraswat

Keyword(s):

Deep Learning ◽

Intrusion Detection ◽

Network Traffic ◽

Cyber Security ◽

Malware Detection ◽

Data Sets ◽

Security Threat ◽

Data Set ◽

Android Malware ◽

Network Intrusion

In recent years, deep learning frameworks are applied in various domains and achieved shows potential performance that includes malware detection software, self-driving cars, identity recognition cameras, adversarial attacks became one crucial security threat to several deep learning applications in today’s world Deep learning techniques became the core part for several cyber security applications like intrusion detection, android malware detection, spam, malware classification, binary analysis and phishing detection. . One of the major research challenges in this field is the insufficiency of a comprehensive data set which reflects contemporary network traffic scenarios, broad range of low footprint intrusions and in depth structured information about the network traffic. For Evaluation of network intrusion detection systems, many benchmark data sets were developed a decade ago. In this paper, we provides a focused literature survey of data sets used for network based intrusion detection and characterize the underlying packet and flow-based network data in detail used for intrusion detection in cyber security. The datasets plays incredibly vital role in intrusion detection; as a result we illustrate cyber datasets and provide a categorization of those datasets.

Download Full-text

Human Activity Recognition using Fourier Transform Inspired Deep Learning Combination Model

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327908666180727123657 ◽

2019 ◽

Vol 9 (1) ◽

pp. 16-31

Author(s):

Kyungkoo Jun

Keyword(s):

Fourier Transform ◽

Deep Learning ◽

Short Term Memory ◽

Window Size ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Proposed Model ◽

Testing Data ◽

Labeling Scheme

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.

Download Full-text

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Journal Of Big Data ◽

10.1186/s40537-021-00488-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yahya Albalawi ◽

Jim Buckley ◽

Nikola S. Nikolov

Keyword(s):

Social Media ◽

Deep Learning ◽

Comprehensive Evaluation ◽

Classification Problem ◽

Data Sets ◽

Word Embeddings ◽

Data Set ◽

Lower Accuracy ◽

Health Related ◽

The Impact

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.

Download Full-text

Implementation of a Modified Faster R-CNN for Target Detection Technology of Coastal Defense Radar

Remote Sensing ◽

10.3390/rs13091703 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1703

Author(s):

He Yan ◽

Chao Chen ◽

Guodong Jin ◽

Jindong Zhang ◽

Xudong Wang ◽

...

Keyword(s):

False Alarm ◽

False Alarm Rate ◽

Target Detection ◽

Real Data ◽

Detection Performance ◽

Detection Accuracy ◽

Constant False Alarm Rate ◽

Data Set ◽

Detection Technology ◽

Coastal Defense

The traditional method of constant false-alarm rate detection is based on the assumption of an echo statistical model. The target recognition accuracy rate and the high false-alarm rate under the background of sea clutter and other interferences are very low. Therefore, computer vision technology is widely discussed to improve the detection performance. However, the majority of studies have focused on the synthetic aperture radar because of its high resolution. For the defense radar, the detection performance is not satisfactory because of its low resolution. To this end, we herein propose a novel target detection method for the coastal defense radar based on faster region-based convolutional neural network (Faster R-CNN). The main processing steps are as follows: (1) the Faster R-CNN is selected as the sea-surface target detector because of its high target detection accuracy; (2) a modified Faster R-CNN based on the characteristics of sparsity and small target size in the data set is employed; and (3) soft non-maximum suppression is exploited to eliminate the possible overlapped detection boxes. Furthermore, detailed comparative experiments based on a real data set of coastal defense radar are performed. The mean average precision of the proposed method is improved by 10.86% compared with that of the original Faster R-CNN.

Download Full-text

Augmented Data Selector to Initiate Text-Based CAPTCHA Attack

Security and Communication Networks ◽

10.1155/2021/9930608 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Aolin Che ◽

Yalin Liu ◽

Hong Xiao ◽

Hao Wang ◽

Ke Zhang ◽

...

Keyword(s):

Deep Learning ◽

Training Data ◽

Accuracy Rate ◽

Data Set ◽

Research Directions ◽

Training Models ◽

The Past ◽

Future Work ◽

Design Cost ◽

Accuracy Rates

In the past decades, due to the low design cost and easy maintenance, text-based CAPTCHAs have been extensively used in constructing security mechanisms for user authentications. With the recent advances in machine/deep learning in recognizing CAPTCHA images, growing attack methods are presented to break text-based CAPTCHAs. These machine learning/deep learning-based attacks often rely on training models on massive volumes of training data. The poorly constructed CAPTCHA data also leads to low accuracy of attacks. To investigate this issue, we propose a simple, generic, and effective preprocessing approach to filter and enhance the original CAPTCHA data set so as to improve the accuracy of the previous attack methods. In particular, the proposed preprocessing approach consists of a data selector and a data augmentor. The data selector can automatically filter out a training data set with training significance. Meanwhile, the data augmentor uses four different image noises to generate different CAPTCHA images. The well-constructed CAPTCHA data set can better train deep learning models to further improve the accuracy rate. Extensive experiments demonstrate that the accuracy rates of five commonly used attack methods after combining our preprocessing approach are 2.62% to 8.31% higher than those without preprocessing approach. Moreover, we also discuss potential research directions for future work.

Download Full-text

Classification of Clinically Significant Prostate Cancer on Multi-Parametric MRI: A Validation Study Comparing Deep Learning and Radiomics

Cancers ◽

10.3390/cancers14010012 ◽

2021 ◽

Vol 14 (1) ◽

pp. 12

Author(s):

Jose M. Castillo T. ◽

Muhammad Arif ◽

Martijn P. A. Starmans ◽

Wiro J. Niessen ◽

Chris H. Bangma ◽

...

Keyword(s):

Prostate Cancer ◽

Deep Learning ◽

Characteristic Curve ◽

Model Development ◽

Learning Model ◽

Multiparametric Mri ◽

Data Sets ◽

Data Set ◽

Test Sets ◽

Deep Learning Model

The computer-aided analysis of prostate multiparametric MRI (mpMRI) could improve significant-prostate-cancer (PCa) detection. Various deep-learning- and radiomics-based methods for significant-PCa segmentation or classification have been reported in the literature. To be able to assess the generalizability of the performance of these methods, using various external data sets is crucial. While both deep-learning and radiomics approaches have been compared based on the same data set of one center, the comparison of the performances of both approaches on various data sets from different centers and different scanners is lacking. The goal of this study was to compare the performance of a deep-learning model with the performance of a radiomics model for the significant-PCa diagnosis of the cohorts of various patients. We included the data from two consecutive patient cohorts from our own center (n = 371 patients), and two external sets of which one was a publicly available patient cohort (n = 195 patients) and the other contained data from patients from two hospitals (n = 79 patients). Using multiparametric MRI (mpMRI), the radiologist tumor delineations and pathology reports were collected for all patients. During training, one of our patient cohorts (n = 271 patients) was used for both the deep-learning- and radiomics-model development, and the three remaining cohorts (n = 374 patients) were kept as unseen test sets. The performances of the models were assessed in terms of their area under the receiver-operating-characteristic curve (AUC). Whereas the internal cross-validation showed a higher AUC for the deep-learning approach, the radiomics model obtained AUCs of 0.88, 0.91 and 0.65 on the independent test sets compared to AUCs of 0.70, 0.73 and 0.44 for the deep-learning model. Our radiomics model that was based on delineated regions resulted in a more accurate tool for significant-PCa classification in the three unseen test sets when compared to a fully automated deep-learning model.

Download Full-text

Extreme Learning Machine with sigmoid activation function on large data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1433.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3523-3526

Keyword(s):

Efficient Algorithm ◽

Large Data ◽

Activation Function ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Learning Machine ◽

Sigmoid Activation Function ◽

State Of Art ◽

Better Than

This paper describes an efficient algorithm for classification in large data set. While many algorithms exist for classification, they are not suitable for larger contents and different data sets. For working with large data sets various ELM algorithms are available in literature. However the existing algorithms using fixed activation function and it may lead deficiency in working with large data. In this paper, we proposed novel ELM comply with sigmoid activation function. The experimental evaluations demonstrate the our ELM-S algorithm is performing better than ELM,SVM and other state of art algorithms on large data sets.

Download Full-text

Intelligent Agent-Based Intrusion Detection System Using Enhanced Multiclass SVM

Computational Intelligence and Neuroscience ◽

10.1155/2012/850259 ◽

2012 ◽

Vol 2012 ◽

pp. 1-10 ◽

Cited By ~ 29

Author(s):

S. Ganapathy ◽

P. Yogesh ◽

A. Kannan

Keyword(s):

Intrusion Detection ◽

False Alarm ◽

False Alarm Rate ◽

Outlier Detection ◽

Intelligent Agent ◽

Support Vector ◽

Detection Accuracy ◽

Data Set ◽

Agent Based ◽

Multiclass Svm

Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set.

Download Full-text

A Novel Network Traffic Anomaly Detection Based on Multi-Scale Fusion

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.48-49.102 ◽

2011 ◽

Vol 48-49 ◽

pp. 102-105

Author(s):

Guo Zhen Cheng ◽

Dong Nian Cheng ◽

He Lei

Keyword(s):

Anomaly Detection ◽

False Alarm ◽

False Alarm Rate ◽

Network Traffic ◽

Self Similarity ◽

Data Set ◽

Multi Scale ◽

Traffic Anomaly ◽

Detection Evaluation ◽

Better Than

Detecting network traffic anomaly is very important for network security. But it has high false alarm rate, low detect rate and that can’t perform real-time detection in the backbone very well due to its nonlinearity, nonstationarity and self-similarity. Therefore we propose a novel detection method—EMD-DS, and prove that it can reduce mean error rate of anomaly detection efficiently after EMD. On the KDD CUP 1999 intrusion detection evaluation data set, this detector detects 85.1% attacks at low false alarm rate which is better than some other systems.

Download Full-text