Identifying functions in binary code with reverse extended control flow graphs

2015 ◽  
Vol 27 (10) ◽  
pp. 793-820
Author(s):  
Jing Qiu ◽  
Xiaohong Su ◽  
Peijun Ma
Keyword(s):  
Author(s):  
Rémi Géraud ◽  
Mirko Koscina ◽  
Paul Lenczner ◽  
David Naccache ◽  
David Saulpic
Keyword(s):  

Author(s):  
Bing Qiao ◽  
Hongji Yang ◽  
Alan O’Callaghan

When developing a software system, there are a number of principles, paradigms, and tools available to choose from. For a specific platform or programming language, a standard way can usually be found to archive the ultimate system; for example, a combination of an incremental development process, object-oriented analysis and design, and a well supported CASE (Computer-Aided Software Engineering) tool. Regardless of the technology to be adopted, the final outcome of the software development is always a working software system. However, when it comes to software reengineering, there is rather less consensus on either approaches or outcomes. Shall we use black-box or white-box reverse engineering for program understanding? Shall we produce data and control flow graphs, or some kind of formal specifications as the output of analysis? Each of these techniques has its pros and cons of tackling various software reengineering problems, and none of them on its own suffices to a whole reengineering project. A proper integration of various techniques capable of solving a specific issue could be an effective way to unravel a complicated software system. This kind of integration has to be done from an architectural point of view. One of the most exciting outcomes of recent efforts on software architecture is the Object Management Group’s (OMG) Model-Driven Architecture (MDA). MDA provides a unified framework for developing middleware-based modern distributed systems, and also a definite goal for software reengineering. This chapter presents a unified software reengineering methodology based on Model-Driven Architecture, which consists of a framework, a process, and related techniques.


Author(s):  
Strauss Cunha Carvalho ◽  
Renê Esteves Maria ◽  
Leonardo Schmitt ◽  
Luiz Alberto Vieira Dias
Keyword(s):  

2020 ◽  
Vol 34 (01) ◽  
pp. 1145-1152 ◽  
Author(s):  
Zeping Yu ◽  
Rui Cao ◽  
Qiyi Tang ◽  
Sen Nie ◽  
Junzhou Huang ◽  
...  

Binary code similarity detection, whose goal is to detect similar binary functions without having access to the source code, is an essential task in computer security. Traditional methods usually use graph matching algorithms, which are slow and inaccurate. Recently, neural network-based approaches have made great achievements. A binary function is first represented as an control-flow graph (CFG) with manually selected block features, and then graph neural network (GNN) is adopted to compute the graph embedding. While these methods are effective and efficient, they could not capture enough semantic information of the binary code. In this paper we propose semantic-aware neural networks to extract the semantic information of the binary code. Specially, we use BERT to pre-train the binary code on one token-level task, one block-level task, and two graph-level tasks. Moreover, we find that the order of the CFG's nodes is important for graph similarity detection, so we adopt convolutional neural network (CNN) on adjacency matrices to extract the order information. We conduct experiments on two tasks with four datasets. The results demonstrate that our method outperforms the state-of-art models.


Sign in / Sign up

Export Citation Format

Share Document