Identifying functions in binary code with reverse extended control flow graphs

When developing a software system, there are a number of principles, paradigms, and tools available to choose from. For a specific platform or programming language, a standard way can usually be found to archive the ultimate system; for example, a combination of an incremental development process, object-oriented analysis and design, and a well supported CASE (Computer-Aided Software Engineering) tool. Regardless of the technology to be adopted, the final outcome of the software development is always a working software system. However, when it comes to software reengineering, there is rather less consensus on either approaches or outcomes. Shall we use black-box or white-box reverse engineering for program understanding? Shall we produce data and control flow graphs, or some kind of formal specifications as the output of analysis? Each of these techniques has its pros and cons of tackling various software reengineering problems, and none of them on its own suffices to a whole reengineering project. A proper integration of various techniques capable of solving a specific issue could be an effective way to unravel a complicated software system. This kind of integration has to be done from an architectural point of view. One of the most exciting outcomes of recent efforts on software architecture is the Object Management Group’s (OMG) Model-Driven Architecture (MDA). MDA provides a unified framework for developing middleware-based modern distributed systems, and also a definite goal for software reengineering. This chapter presents a unified software reengineering methodology based on Model-Driven Architecture, which consists of a framework, a process, and related techniques.

Download Full-text

Control-Flow Semantics for Assembly-Level Data-Flow Graphs

Relational Methods in Computer Science - Lecture Notes in Computer Science ◽

10.1007/11734673_12 ◽

2006 ◽

pp. 147-160 ◽

Cited By ~ 4

Author(s):

Wolfram Kahl ◽

Christopher K. Anand ◽

Jacques Carette

Keyword(s):

Data Flow ◽

Control Flow ◽

Level Data ◽

Data Flow Graphs ◽

Flow Graphs

Download Full-text

Generating Control Flow Graphs from NATURAL

Advances in Intelligent Systems and Computing - Information Technology - New Generations ◽

10.1007/978-3-319-54978-1_52 ◽

2017 ◽

pp. 395-402

Author(s):

Strauss Cunha Carvalho ◽

Renê Esteves Maria ◽

Leonardo Schmitt ◽

Luiz Alberto Vieira Dias

Keyword(s):

Control Flow ◽

Flow Graphs

Download Full-text

Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5466 ◽

2020 ◽

Vol 34 (01) ◽

pp. 1145-1152 ◽

Cited By ~ 1

Author(s):

Zeping Yu ◽

Rui Cao ◽

Qiyi Tang ◽

Sen Nie ◽

Junzhou Huang ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Computer Security ◽

Semantic Information ◽

Binary Code ◽

Graph Matching ◽

Control Flow ◽

Binary Function ◽

Similarity Detection ◽

Block Level

Binary code similarity detection, whose goal is to detect similar binary functions without having access to the source code, is an essential task in computer security. Traditional methods usually use graph matching algorithms, which are slow and inaccurate. Recently, neural network-based approaches have made great achievements. A binary function is first represented as an control-flow graph (CFG) with manually selected block features, and then graph neural network (GNN) is adopted to compute the graph embedding. While these methods are effective and efficient, they could not capture enough semantic information of the binary code. In this paper we propose semantic-aware neural networks to extract the semantic information of the binary code. Specially, we use BERT to pre-train the binary code on one token-level task, one block-level task, and two graph-level tasks. Moreover, we find that the order of the CFG's nodes is important for graph similarity detection, so we adopt convolutional neural network (CNN) on adjacency matrices to extract the order information. We conduct experiments on two tasks with four datasets. The results demonstrate that our method outperforms the state-of-art models.

Download Full-text