Automated Vulnerability Detection in Source Code Using Minimum Intermediate Representation Learning

Vulnerability is one of the root causes of network intrusion. An effective way to mitigate security threats is to discover and patch vulnerabilities before an attack. Traditional vulnerability detection methods rely on manual participation and incur a high false positive rate. The intelligent vulnerability detection methods suffer from the problems of long-term dependence, out of vocabulary, coarse detection granularity and lack of vulnerable samples. This paper proposes an automated and intelligent vulnerability detection method in source code based on the minimum intermediate representation learning. First, the sample in the form of source code is transformed into a minimum intermediate representation to exclude the irrelevant items and reduce the length of the dependency. Next, the intermediate representation is transformed into a real value vector through pre-training on an extended corpus, and the structure and semantic information are retained. Then, the vector is fed to three concatenated convolutional neural networks to obtain high-level features of vulnerability. Last, a classifier is trained using the learned features. To validate this vulnerability detection method, an experiment was performed. The empirical results confirmed that compared with the traditional methods and the state-of-the-art intelligent methods, our method has a better performance with fine granularity.

Download Full-text

A Vulnerability Detection System Based on Fusion of Assembly Code and Source Code

Security and Communication Networks ◽

10.1155/2021/9997641 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Xingzheng Li ◽

Bingwen Feng ◽

Guofeng Li ◽

Tong Li ◽

Mingjin He

Keyword(s):

Detection System ◽

Source Code ◽

Detection Methods ◽

Alignment Algorithm ◽

Vulnerability Detection ◽

Timely Manner ◽

Assembly Code ◽

Software Vulnerabilities ◽

Network Intrusion ◽

Deep Learning Model

Software vulnerabilities are one of the important reasons for network intrusion. It is vital to detect and fix vulnerabilities in a timely manner. Existing vulnerability detection methods usually rely on single code models, which may miss some vulnerabilities. This paper implements a vulnerability detection system by combining source code and assembly code models. First, code slices are extracted from the source code and assembly code. Second, these slices are aligned by the proposed code alignment algorithm. Third, aligned code slices are converted into vector and input into a hyper fusion-based deep learning model. Experiments are carried out to verify the system. The results show that the system presents a stable and convergent detection performance.

Download Full-text

Automated Software Vulnerability Detection Based on Hybrid Neural Network

Applied Sciences ◽

10.3390/app11073201 ◽

2021 ◽

Vol 11 (7) ◽

pp. 3201

Author(s):

Xin Li ◽

Lu Wang ◽

Yang Xin ◽

Yixian Yang ◽

Qifeng Tang ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Detection Methods ◽

Buffer Size ◽

Intermediate Representation ◽

Vulnerability Detection ◽

Global Features ◽

Hybrid Neural Network ◽

Structure Information ◽

High Level

Vulnerabilities threaten the security of information systems. It is crucial to detect and patch vulnerabilities before attacks happen. However, existing vulnerability detection methods suffer from long-term dependency, out of vocabulary, bias towards global features or local features, and coarse detection granularity. This paper proposes an automatic vulnerability detection framework in source code based on a hybrid neural network. First, the inputs are transformed into an intermediate representation with explicit structure information using lower level virtual machine intermediate representation (LLVM IR) and backward program slicing. After the transformation, the size of samples and the size of vocabulary are significantly reduced. A hybrid neural network model is then applied to extract high-level features of vulnerability, which learns features both from convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The former is applied to learn local vulnerability features, such as buffer size. Furthermore, the latter is utilized to learn global features, such as data dependency. The extracted features are made up of concatenated outputs of CNN and RNN. Experiments are performed to validate our vulnerability detection method. The results show that our proposed method achieves excellent results with F1-scores of 98.6% and accuracy of 99.0% on the SARD dataset. It outperforms state-of-the-art methods.

Download Full-text

PRATD: A Phased Remote Access Trojan Detection Method with Double-Sided Features

Electronics ◽

10.3390/electronics9111894 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1894

Author(s):

Chun Guo ◽

Zihua Song ◽

Yuan Ping ◽

Guowei Shen ◽

Yuhei Cui ◽

...

Keyword(s):

False Positive ◽

Detection Method ◽

False Positive Rate ◽

True Positive Rate ◽

Remote Access ◽

Detection Methods ◽

Security Threats ◽

True Positive ◽

Trojan Detection ◽

Positive Rate

Remote Access Trojan (RAT) is one of the most terrible security threats that organizations face today. At present, two major RAT detection methods are host-based and network-based detection methods. To complement one another’s strengths, this article proposes a phased RATs detection method by combining double-side features (PRATD). In PRATD, both host-side and network-side features are combined to build detection models, which is conducive to distinguishing the RATs from benign programs because that the RATs not only generate traffic on the network but also leave traces on the host at run time. Besides, PRATD trains two different detection models for the two runtime states of RATs for improving the True Positive Rate (TPR). The experiments on the network and host records collected from five kinds of benign programs and 20 famous RATs show that PRATD can effectively detect RATs, it can achieve a TPR as high as 93.609% with a False Positive Rate (FPR) as low as 0.407% for the known RATs, a TPR 81.928% and FPR 0.185% for the unknown RATs, which suggests it is a competitive candidate for RAT detection.

Download Full-text

CBAM: A Contextual Model for Network Anomaly Detection

Computers ◽

10.3390/computers10060079 ◽

2021 ◽

Vol 10 (6) ◽

pp. 79

Author(s):

Henry Clausen ◽

Gudmund Grov ◽

David Aspinall

Keyword(s):

Intrusion Detection ◽

Network Flows ◽

Concept Drift ◽

False Positive Rate ◽

Real Life ◽

Remote Access ◽

Detection Methods ◽

Short Term ◽

Deep Model ◽

Network Intrusion

Anomaly-based intrusion detection methods aim to combat the increasing rate of zero-day attacks, however, their success is currently restricted to the detection of high-volume attacks using aggregated traffic features. Recent evaluations show that the current anomaly-based network intrusion detection methods fail to reliably detect remote access attacks. These are smaller in volume and often only stand out when compared to their surroundings. Currently, anomaly methods try to detect access attack events mainly as point anomalies and neglect the context they appear in. We present and examine a contextual bidirectional anomaly model (CBAM) based on deep LSTM-networks that is specifically designed to detect such attacks as contextual network anomalies. The model efficiently learns short-term sequential patterns in network flows as conditional event probabilities. Access attacks frequently break these patterns when exploiting vulnerabilities, and can thus be detected as contextual anomalies. We evaluated CBAM on an assembly of three datasets that provide both representative network access attacks, real-life traffic over a long timespan, and traffic from a real-world red-team attack. We contend that this assembly is closer to a potential deployment environment than current NIDS benchmark datasets. We show that, by building a deep model, we are able to reduce the false positive rate to 0.16% while effectively detecting six out of seven access attacks, which is significantly lower than the operational range of other methods. We further demonstrate that short-term flow structures remain stable over long periods of time, making the CBAM robust against concept drift.

Download Full-text

An Area-Context-Based Credibility Detection for Big Data in IoT

Mobile Information Systems ◽

10.1155/2020/5068731 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Bo Zhao ◽

Xiang Li ◽

Jiayue Li ◽

Jianwen Zou ◽

Yifan Liu

Keyword(s):

Big Data ◽

Data Analysis ◽

Detection Method ◽

Big Data Analysis ◽

Detection Methods ◽

High Detection ◽

Data Credibility ◽

High Level

In order to improve the credibility of big data analysis platform’s results in IoT, it is necessary to improve the quality of IoT data. Many detection methods have been proposed to filter out incredible data, but there are certain deficiencies that performance is not high, detection is not comprehensive, and process is not credible. So this paper proposes an area-context-based credibility detection method for IoT data, which can effectively detect point anomalies, behavioral anomalies, and contextual anomalies. The performance of the context determination and the data credibility detection of the device satisfying the area characteristics is superior to the similar algorithms. As the experiments show, the proposed method can reach a high level of performance with more than 97% in metrics, which can effectively improve the quality of IoT data.

Download Full-text

Outlier Detection Method for Flash Flood Disaster Monitoring Data based on Information Entropy

Journal of Physics Conference Series ◽

10.1088/1742-6596/2138/1/012013 ◽

2021 ◽

Vol 2138 (1) ◽

pp. 012013

Author(s):

Yongzhi Chen ◽

Ziao Xu ◽

Chaoqun Niu

Keyword(s):

Outlier Detection ◽

Information Entropy ◽

Detection Method ◽

Flash Flood ◽

False Positive Rate ◽

Flood Disaster ◽

Detection Methods ◽

Positive Rate ◽

Disaster Monitoring ◽

Local Outlier

Abstract In the research of flash flood disaster monitoring and early warning, the Internet of Things is widely used in real-time information collection. There are abnormal situations such as noise, repetition and errors in a large amount of data collected by sensors, which will lead to false alarm, lower prediction accuracy and other problems. Aiming at the characteristic that outliers flow of sensors will cause obvious fluctuation of information entropy, this paper proposes a local outlier detection method based on information entropy and optimized by sliding window and LOF (Local Outlier Factor). This method can be used to improve the data quality, thus improving the accuracy of disaster prediction. The method is applied to data stream processing of water sensor, and the experimental results show that the method can accurately detect outliers. Compared with the existing detection methods that only use data distance to determine, the test positive rate is improved and the false positive rate is reduced.

Download Full-text

Webshell detection with byte-level features based on deep learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200314 ◽

2021 ◽

Vol 40 (1) ◽

pp. 1585-1596

Author(s):

Xiao Zhongzheng ◽

Nurbol Luktarhan

Keyword(s):

Deep Learning ◽

Web Applications ◽

Detection Efficiency ◽

Source Code ◽

Detection Methods ◽

Web Page ◽

Matching Method ◽

Network Intrusion ◽

Signature Matching ◽

And Control

A webshell is a common tool for network intrusion. It has the characteristics of considerable threat and good concealment. An attacker obtains the management authority of web services through the webshell to penetrate and control web applications smoothly. Because webshell and common web page features are almost identical, it can evade detection by traditional firewalls and anti-virus software. Moreover, with the application of various anti-detection feature hiding techniques to the webshell, it is difficult to detect new patterns in time based on the traditional signature matching method. Webshell detection has been proposed based on deep learning. First, a dataset is opcoded, and the source code and opcode code features are fused. Second, the processed dataset is reduced using the SRNN and an attention mechanism, and the capsule network improves complete predictions for unknown pages. Experiments prove that the algorithm has higher detection efficiency and accuracy than traditional webshell detection methods, and it can also detect new types of webshell with a certain probability.

Download Full-text

Application of Representation Learning based Chronological Modeling for Network Intrusion Detection

International Journal of Information Security and Privacy ◽

10.4018/ijisp.291701 ◽

2022 ◽

Vol 16 (1) ◽

pp. 0-0

Keyword(s):

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Representation Learning ◽

Detection Methods ◽

Network Intrusion Detection ◽

Moving Averages ◽

Malicious Activity ◽

Network Intrusion ◽

Individual Observation

An autoencoder has the potential to overcome the limitations of current intrusion detection methods by recognizing benign user activity rather than differentiating between benign and malicious activity. However, the line separating them is quite blurry with a significant overlap. The first part of this study aims to investigate the rationale behind this overlap. The results suggest that although a subset of traffic cannot be separated without labels, timestamps have the potential to be leveraged for identification of activity that does not conform to the normal or expected behavior of the network. The second part aims to eliminate dependence on visual-inspections by exploring automation. The trend of errors for HTTP traffic was modeled chronologically using resampled data and moving averages. This model successfully identified attacks that had orchestrated over HTTP within their respective time slots. These results support the hypothesis that it is technically feasible to build an anomaly-based intrusion detection system where each individual observation need not be categorized.

Download Full-text

Network Intrusion Detection Methods Based on Deep Learning

Recent Patents on Engineering ◽

10.2174/1872212114999200403092708 ◽

2020 ◽

Vol 14 ◽

Author(s):

Xiangwen Li ◽

Shuang Zhang

Keyword(s):

Deep Learning ◽

Intrusion Detection ◽

False Positive Rate ◽

Detection Algorithm ◽

Detection Methods ◽

Network Intrusion Detection ◽

Test Time ◽

Data Set ◽

Network Intrusion ◽

Kdd Cup 99

: To detect network attacks more effectively, this study uses Honeypot techniques to collect the latest network attack data and proposes network intrusion detection classification models based on deep learning combined with DNN and LSTM models. Experiments showed that the data set training models gave better results than the KDD CUP 99 training model’s detection rate and false positive rate. The DNN-LSTM intrusion detection algorithm proposed in this study gives better results than KDD CUP 99 training model. Compared to other algorithms such as LeNet, DNN-LSTM intrusion detection algorithm exhibits shorter classification test time along with better accuracy and recall rate of intrusion detection.

Download Full-text

A Hybrid NIDS Model Using Artificial Neural Network and D-S Evidence

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.2016010103 ◽

2016 ◽

Vol 8 (1) ◽

pp. 37-50 ◽

Cited By ~ 1

Author(s):

Chunlin Lu ◽

Yue Li ◽

Mingjie Ma ◽

Na Li

Keyword(s):

Neural Network ◽

Intrusion Detection ◽

Bp Neural Network ◽

False Positive Rate ◽

Experimental Simulation ◽

Detection Methods ◽

General Process ◽

Training Time ◽

Network Intrusion ◽

Artificial Neural

Artificial Neural Networks (ANNs), especially back-propagation (BP) neural network, can improve the performance of intrusion detection systems. However, for the current network intrusion detection methods, the detection precision, especially for low-frequent attacks, detection stability and training time are still needed to be enhanced. In this paper, a new model which based on optimized BP neural network and Dempster-Shafer theory to solve the above problems and help NIDS to achieve higher detection rate, less false positive rate and stronger stability. The general process of the authors' model is as follows: firstly dividing the main extracted feature into several different feature subsets. Then, based on different feature subsets, different ANN models are trained to build the detection engine. Finally, the D-S evidence theory is employed to integration these results, and obtain the final result. The effectiveness of this method is verified by experimental simulation utilizing KDD Cup1999 dataset.

Download Full-text