Towards Extracting Control Flow Abstraction with Static Disassembly for Binary Code

Binary code similarity detection, whose goal is to detect similar binary functions without having access to the source code, is an essential task in computer security. Traditional methods usually use graph matching algorithms, which are slow and inaccurate. Recently, neural network-based approaches have made great achievements. A binary function is first represented as an control-flow graph (CFG) with manually selected block features, and then graph neural network (GNN) is adopted to compute the graph embedding. While these methods are effective and efficient, they could not capture enough semantic information of the binary code. In this paper we propose semantic-aware neural networks to extract the semantic information of the binary code. Specially, we use BERT to pre-train the binary code on one token-level task, one block-level task, and two graph-level tasks. Moreover, we find that the order of the CFG's nodes is important for graph similarity detection, so we adopt convolutional neural network (CNN) on adjacency matrices to extract the order information. We conduct experiments on two tasks with four datasets. The results demonstrate that our method outperforms the state-of-art models.

Download Full-text

BinCFP: Efficient Multi-threaded Binary Code Control Flow Profiling

2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM) ◽

10.1109/scam.2016.21 ◽

2016 ◽

Cited By ~ 1

Author(s):

Jiang Ming ◽

Dinghao Wu

Keyword(s):

Binary Code ◽

Control Flow

Download Full-text

Identifying functions in binary code with reverse extended control flow graphs

Journal of Software Evolution and Process ◽

10.1002/smr.1733 ◽

2015 ◽

Vol 27 (10) ◽

pp. 793-820

Author(s):

Jing Qiu ◽

Xiaohong Su ◽

Peijun Ma

Keyword(s):

Binary Code ◽

Control Flow ◽

Flow Graphs

Download Full-text

Research on Function Model of Binary Code

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.198-199.374 ◽

2012 ◽

Vol 198-199 ◽

pp. 374-379

Author(s):

Wei Gao ◽

Jing He ◽

Ke An ◽

Hong Xia Gao ◽

Cheng Liu

Keyword(s):

Binary Code ◽

Functional Model ◽

Control Flow ◽

Control Flow Graph ◽

Function Model ◽

Flow Graph ◽

Storage Structure ◽

Reverse Analysis

The paper presents a reverse analysis and crack-found methodology with the functional model as the centre for OS trusted mechanism, pays attention to the functional model establishing of binary code and the description methodology. And it designs general XML-based storage structure to serve the storage and transformation of information cross levels in the level functional model of trusted mechanism, develops the plugin of IDA, which takes and stores routine Control Flow Graph and level functional model automatically from the binary code.

Download Full-text

Java Ranger at SV-COMP 2020 (Competition Contribution)

Tools and Algorithms for the Construction and Analysis of Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-030-45237-7_27 ◽

2020 ◽

pp. 393-397 ◽

Cited By ~ 3

Author(s):

Vaibhav Sharma ◽

Soha Hussein ◽

Michael W. Whalen ◽

Stephen McCamant ◽

Willem Visser

Keyword(s):

Binary Code ◽

Symbolic Execution ◽

Control Flow ◽

Bounded Control ◽

Java Bytecode ◽

Java Code

Abstract Path-merging is a known technique for accelerating symbolic execution. One technique, named “veritesting” by Avgerinos et al. uses summaries of bounded control-flow regions and has been shown to accelerate symbolic execution of binary code. But, when applied to symbolic execution of Java code, veritesting needs to be extended to summarize dynamically dispatched methods and exceptional control-flow. Such an extension of veritesting has been implemented in Java Ranger by implementing as an extension of Symbolic PathFinder, a symbolic executor for Java bytecode. In this paper, we briefly describe the architecture of Java Ranger and describe its setup for SV-COMP 2020.

Download Full-text

A Hybrid Approach for Control Flow Graph Construction from Binary Code

2013 20th Asia-Pacific Software Engineering Conference (APSEC) ◽

10.1109/apsec.2013.132 ◽

2013 ◽

Cited By ~ 3

Author(s):

Minh Hai Nguyen ◽

Thien Binh Nguyen ◽

Thanh Tho Quan ◽

Mizuhito Ogawa

Keyword(s):

Binary Code ◽

Hybrid Approach ◽

Control Flow ◽

Control Flow Graph ◽

Flow Graph

Download Full-text

Hierarchical Attention Graph Embedding Networks for Binary Code Similarity against Compilation Diversity

Security and Communication Networks ◽

10.1155/2021/9954520 ◽

2021 ◽

Vol 2021 ◽

pp. 1-19

Author(s):

Yan Wang ◽

Peng Jia ◽

Cheng Huang ◽

Jiayong Liu ◽

Peisong He

Keyword(s):

Hierarchical Structure ◽

High Performance ◽

Binary Code ◽

Graph Embedding ◽

Control Flow ◽

Basic Block ◽

Control Flow Graph ◽

Semantic Features ◽

Similarity Comparison ◽

Flow Graph

Binary code similarity comparison is the technique that determines if two functions are similar by only considering their compiled form, which has many applications, including clone detection, malware classification, and vulnerability discovery. However, it is challenging to design a robust code similarity comparison engine since different compilation settings that make logically similar assembly functions appear to be very different. Moreover, existing approaches suffer from high-performance overheads, lower robustness, or poor scalability. In this paper, a novel solution HBinSim is proposed by employing the multiview features of the function to address these challenges. It first extracts the syntactic and semantic features of each basic block by static analysis. HBinSim further analyzes the function and constructs a syntactic attribute control flow graph and a semantic attribute control flow graph for each function. Then, a hierarchical attention graph embedding network is designed for graph-structured data processing. The network model has a hierarchical structure that mirrors the hierarchical structure of the function. It has three levels of attention mechanisms applied at the instruction, basic block, and function level, enabling it to attend differentially to more and less critical content when constructing the function representation. We conduct extensive experiments to evaluate its effectiveness and efficiency. The results show that our tool outperforms the state-of-the-art binary code similarity comparison tools by a large margin against compilation diversity clone searching. A real-world vulnerabilities search case further demonstrates the usefulness of our system.

Download Full-text

Cross-Platform Binary Code Homology Analysis Based on GRU Graph Embedding

Security and Communication Networks ◽

10.1155/2021/3095203 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Shen Wang ◽

Xunzhi Jiang ◽

Xiangzhan Yu ◽

Xiaohui Su

Keyword(s):

Natural Language ◽

Binary Code ◽

Graph Embedding ◽

Control Flow ◽

Detection Accuracy ◽

Plagiarism Detection ◽

Homology Detection ◽

Homology Analysis ◽

Cross Platform ◽

Iot Devices

Binary code homology analysis refers to detecting whether two pieces of binary code are compiled from the same piece of source code, which is a fundamental technique for many security applications, such as vulnerability search, plagiarism detection, and malware detection. With the increase in critical vulnerabilities in IoT devices, homology analysis is increasingly needed to perform cross-platform vulnerability searches. Existing methods for cross-platform binary code homology detection usually convert binary code to instruction sequences and do semantic embedding of the sequences as if they were natural language. However, the gap between natural language and binary code is large, and the spatial features of the binary code are easily lost by directly comparing the semantics. In this paper, we propose a GRU-based graph embedding method to compare the homology of binary functions. First, the attribute control flow graph (ACFG) is built for the assembly function, then the GRU-based graph embedding neural network is used to generate the embedding vector for the ACFG, and finally the homology of the binary code is determined by calculating the distance between the embedding vectors. The experimental results show that our method greatly improves the detection accuracy of negative samples compared with Gemini, the latest method based on graph embedding binary code similarity detection.

Download Full-text