Cross-Platform Binary Code Homology Analysis Based on GRU Graph Embedding

Binary code homology analysis refers to detecting whether two pieces of binary code are compiled from the same piece of source code, which is a fundamental technique for many security applications, such as vulnerability search, plagiarism detection, and malware detection. With the increase in critical vulnerabilities in IoT devices, homology analysis is increasingly needed to perform cross-platform vulnerability searches. Existing methods for cross-platform binary code homology detection usually convert binary code to instruction sequences and do semantic embedding of the sequences as if they were natural language. However, the gap between natural language and binary code is large, and the spatial features of the binary code are easily lost by directly comparing the semantics. In this paper, we propose a GRU-based graph embedding method to compare the homology of binary functions. First, the attribute control flow graph (ACFG) is built for the assembly function, then the GRU-based graph embedding neural network is used to generate the embedding vector for the ACFG, and finally the homology of the binary code is determined by calculating the distance between the embedding vectors. The experimental results show that our method greatly improves the detection accuracy of negative samples compared with Gemini, the latest method based on graph embedding binary code similarity detection.

Download Full-text

Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection

Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS '17 ◽

10.1145/3133956.3134018 ◽

2017 ◽

Cited By ~ 65

Author(s):

Xiaojun Xu ◽

Chang Liu ◽

Qian Feng ◽

Heng Yin ◽

Le Song ◽

...

Keyword(s):

Neural Network ◽

Binary Code ◽

Graph Embedding ◽

Similarity Detection ◽

Cross Platform

Download Full-text

Cross-platform binary code similarity detection based on NMT and graph embedding

Mathematical Biosciences and Engineering ◽

10.3934/mbe.2021230 ◽

2021 ◽

Vol 18 (4) ◽

pp. 4528-4551

Author(s):

Xiaodong Zhu ◽

◽

Liehui Jiang ◽

Zeng Chen ◽

Keyword(s):

Binary Code ◽

Graph Embedding ◽

Similarity Detection ◽

Cross Platform

Download Full-text

Hierarchical Attention Graph Embedding Networks for Binary Code Similarity against Compilation Diversity

Security and Communication Networks ◽

10.1155/2021/9954520 ◽

2021 ◽

Vol 2021 ◽

pp. 1-19

Author(s):

Yan Wang ◽

Peng Jia ◽

Cheng Huang ◽

Jiayong Liu ◽

Peisong He

Keyword(s):

Hierarchical Structure ◽

High Performance ◽

Binary Code ◽

Graph Embedding ◽

Control Flow ◽

Basic Block ◽

Control Flow Graph ◽

Semantic Features ◽

Similarity Comparison ◽

Flow Graph

Binary code similarity comparison is the technique that determines if two functions are similar by only considering their compiled form, which has many applications, including clone detection, malware classification, and vulnerability discovery. However, it is challenging to design a robust code similarity comparison engine since different compilation settings that make logically similar assembly functions appear to be very different. Moreover, existing approaches suffer from high-performance overheads, lower robustness, or poor scalability. In this paper, a novel solution HBinSim is proposed by employing the multiview features of the function to address these challenges. It first extracts the syntactic and semantic features of each basic block by static analysis. HBinSim further analyzes the function and constructs a syntactic attribute control flow graph and a semantic attribute control flow graph for each function. Then, a hierarchical attention graph embedding network is designed for graph-structured data processing. The network model has a hierarchical structure that mirrors the hierarchical structure of the function. It has three levels of attention mechanisms applied at the instruction, basic block, and function level, enabling it to attend differentially to more and less critical content when constructing the function representation. We conduct extensive experiments to evaluate its effectiveness and efficiency. The results show that our tool outperforms the state-of-the-art binary code similarity comparison tools by a large margin against compilation diversity clone searching. A real-world vulnerabilities search case further demonstrates the usefulness of our system.

Download Full-text

Semantic Understanding of Source and Binary Code based on Natural Language Processing

2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC) ◽

10.1109/imcec51613.2021.9482032 ◽

2021 ◽

Author(s):

Zhongtang Zhang ◽

Shengli Liu ◽

Qichao Yang ◽

Shichen Guo

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Binary Code

Download Full-text

Adoption of Machine Learning With Adaptive Approach for Securing CPS

Handbook of Research on Machine and Deep Learning Applications for Cyber Security - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-9611-0.ch018 ◽

2020 ◽

pp. 388-415

Author(s):

Rama Mercy Sam Sigamani

Keyword(s):

Machine Learning ◽

Cyber Security ◽

Smart Grids ◽

False Negative ◽

False Negative Rate ◽

Cyber Physical Systems ◽

Detection Accuracy ◽

Physical Systems ◽

Positive Rate ◽

Iot Devices

The cyber physical system safety and security is the major concern on the incorporated components with interface standards, communication protocols, physical operational characteristics, and real-time sensing. The seamless integration of computational and distributed physical components with intelligent mechanisms increases the adaptability, autonomy, efficiency, functionality, reliability, safety, and usability of cyber-physical systems. In IoT-enabled cyber physical systems, cyber security is an essential challenge due to IoT devices in industrial control systems. Computational intelligence algorithms have been proposed to detect and mitigate the cyber-attacks in cyber physical systems, smart grids, power systems. The various machine learning approaches towards securing CPS is observed based on the performance metrics like detection accuracy, average classification rate, false negative rate, false positive rate, processing time per packet. A unique feature of CPS is considered through structural adaptation which facilitates a self-healing CPS.

Download Full-text

A Deep Paraphrase Identification Model Interacting Semantics with Syntax

Complexity ◽

10.1155/2020/9757032 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Leilei Kong ◽

Zhongyuan Han ◽

Yong Han ◽

Haoliang Qi

Keyword(s):

Neural Network ◽

Natural Language ◽

Convolutional Neural Network ◽

Semantic Representation ◽

Experimental Results ◽

Plagiarism Detection ◽

Linguistic Features ◽

Syntactic Structures ◽

Syntactic Features ◽

Identification Model

Paraphrase identification is central to many natural language applications. Based on the insight that a successful paraphrase identification model needs to adequately capture the semantics of the language objects as well as their interactions, we present a deep paraphrase identification model interacting semantics with syntax (DPIM-ISS) for paraphrase identification. DPIM-ISS introduces the linguistic features manifested in syntactic features to produce more explicit structures and encodes the semantic representation of sentence on different syntactic structures by means of interacting semantics with syntax. Then, DPIM-ISS learns the paraphrase pattern from this representation interacting the semantics with syntax by exploiting a convolutional neural network with convolution-pooling structure. Experiments are conducted on the corpus of Microsoft Research Paraphrase (MSRP), PAN 2010 corpus, and PAN 2012 corpus for paraphrase plagiarism detection. The experimental results demonstrate that DPIM-ISS outperforms the classical word-matching approaches, the syntax-similarity approaches, the convolution neural network-based models, and some deep paraphrase identification models.

Download Full-text

Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5466 ◽

2020 ◽

Vol 34 (01) ◽

pp. 1145-1152 ◽

Cited By ~ 1

Author(s):

Zeping Yu ◽

Rui Cao ◽

Qiyi Tang ◽

Sen Nie ◽

Junzhou Huang ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Computer Security ◽

Semantic Information ◽

Binary Code ◽

Graph Matching ◽

Control Flow ◽

Binary Function ◽

Similarity Detection ◽

Block Level

Binary code similarity detection, whose goal is to detect similar binary functions without having access to the source code, is an essential task in computer security. Traditional methods usually use graph matching algorithms, which are slow and inaccurate. Recently, neural network-based approaches have made great achievements. A binary function is first represented as an control-flow graph (CFG) with manually selected block features, and then graph neural network (GNN) is adopted to compute the graph embedding. While these methods are effective and efficient, they could not capture enough semantic information of the binary code. In this paper we propose semantic-aware neural networks to extract the semantic information of the binary code. Specially, we use BERT to pre-train the binary code on one token-level task, one block-level task, and two graph-level tasks. Moreover, we find that the order of the CFG's nodes is important for graph similarity detection, so we adopt convolutional neural network (CNN) on adjacency matrices to extract the order information. We conduct experiments on two tasks with four datasets. The results demonstrate that our method outperforms the state-of-art models.

Download Full-text

A novel Ensemble of Hybrid Intrusion Detection System for Detecting Internet of Things Attacks

Electronics ◽

10.3390/electronics8111210 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1210 ◽

Cited By ~ 11

Author(s):

Khraisat ◽

Gondal ◽

Vamplew ◽

Kamruzzaman ◽

Alazab

Keyword(s):

Internet Of Things ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

False Positive Rate ◽

Support Vector ◽

Detection Accuracy ◽

Lower False Positive Rate ◽

Positive Rate ◽

Iot Devices

The Internet of Things (IoT) has been rapidly evolving towards making a greater impact on everyday life to large industrial systems. Unfortunately, this has attracted the attention of cybercriminals who made IoT a target of malicious activities, opening the door to a possible attack to the end nodes. Due to the large number and diverse types of IoT devices, it is a challenging task to protect the IoT infrastructure using a traditional intrusion detection system. To protect IoT devices, a novel ensemble Hybrid Intrusion Detection System (HIDS) is proposed by combining a C5 classifier and One Class Support Vector Machine classifier. HIDS combines the advantages of Signature Intrusion Detection System (SIDS) and Anomaly-based Intrusion Detection System (AIDS). The aim of this framework is to detect both the well-known intrusions and zero-day attacks with high detection accuracy and low false-alarm rates. The proposed HIDS is evaluated using the Bot-IoT dataset, which includes legitimate IoT network traffic and several types of attacks. Experiments show that the proposed hybrid IDS provide higher detection rate and lower false positive rate compared to the SIDS and AIDS techniques.

Download Full-text