Hierarchical Attention Graph Embedding Networks for Binary Code Similarity against Compilation Diversity

Binary code similarity comparison is the technique that determines if two functions are similar by only considering their compiled form, which has many applications, including clone detection, malware classification, and vulnerability discovery. However, it is challenging to design a robust code similarity comparison engine since different compilation settings that make logically similar assembly functions appear to be very different. Moreover, existing approaches suffer from high-performance overheads, lower robustness, or poor scalability. In this paper, a novel solution HBinSim is proposed by employing the multiview features of the function to address these challenges. It first extracts the syntactic and semantic features of each basic block by static analysis. HBinSim further analyzes the function and constructs a syntactic attribute control flow graph and a semantic attribute control flow graph for each function. Then, a hierarchical attention graph embedding network is designed for graph-structured data processing. The network model has a hierarchical structure that mirrors the hierarchical structure of the function. It has three levels of attention mechanisms applied at the instruction, basic block, and function level, enabling it to attend differentially to more and less critical content when constructing the function representation. We conduct extensive experiments to evaluate its effectiveness and efficiency. The results show that our tool outperforms the state-of-the-art binary code similarity comparison tools by a large margin against compilation diversity clone searching. A real-world vulnerabilities search case further demonstrates the usefulness of our system.

Download Full-text

Research on Function Model of Binary Code

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.198-199.374 ◽

2012 ◽

Vol 198-199 ◽

pp. 374-379

Author(s):

Wei Gao ◽

Jing He ◽

Ke An ◽

Hong Xia Gao ◽

Cheng Liu

Keyword(s):

Binary Code ◽

Functional Model ◽

Control Flow ◽

Control Flow Graph ◽

Function Model ◽

Flow Graph ◽

Storage Structure ◽

Reverse Analysis

The paper presents a reverse analysis and crack-found methodology with the functional model as the centre for OS trusted mechanism, pays attention to the functional model establishing of binary code and the description methodology. And it designs general XML-based storage structure to serve the storage and transformation of information cross levels in the level functional model of trusted mechanism, develops the plugin of IDA, which takes and stores routine Control Flow Graph and level functional model automatically from the binary code.

Download Full-text

A LOOP-BASED SCHEDULING ALGORITHM FOR HARDWARE DESCRIPTION LANGUAGES

Parallel Processing Letters ◽

10.1142/s0129626494000326 ◽

1994 ◽

Vol 04 (03) ◽

pp. 351-364 ◽

Cited By ~ 5

Author(s):

MAHER RAHMOUNI ◽

KEVIN O’BRIEN ◽

AHMED A. JERRAYA

Keyword(s):

High Performance ◽

Scheduling Algorithm ◽

Control Flow ◽

Control Flow Graph ◽

Hardware Description Languages ◽

Loop Scheduling ◽

Flow Graph ◽

Loop Feedback ◽

Hardware Description ◽

Description Languages

This paper presents Dynamic Loop Scheduling (DLS), a loop-based algorithm that can efficiently schedule large, control-flow dominated designs. It compares favourably with results produced for traditional path-based approaches and at the same time requires much less overhead to implement. The high-performance of DLS is due mainly to the inclusion of loop feedback edges in the control-flow graph and the interruption of the path generation on the fly. The latter eliminates the generation of false paths thereby avoiding the path explosion problem.

Download Full-text