machine code
Recently Published Documents


TOTAL DOCUMENTS

216
(FIVE YEARS 43)

H-INDEX

17
(FIVE YEARS 3)

2021 ◽  
Vol 7 (4) ◽  
pp. 95-109
Author(s):  
K. Izrailov

Reverse engineering correct source code from a machine code to find and neutralize vulnerabilities is the most pressing problem for the field of telecommunications equipment. The decompilation techniques applicable for this have potentially reached their evolutionary limit. As a result, new concepts are required that can make a quantum leap in problem solving. Proceeding from this, the paper proposes the concept of genetic decompilation, which is a solution to the problem of multiparameter optimization in the form of iterative approximation of instances of the source code to the "original" one which will compile to the given machine code. This concept is tested by conducting a series of experiments with the developed software prototype using a basic example of machine code. The results of the experiments prove the proof of the concept, thereby suggesting new innovative directions for ensuring information security in this subject area.


2021 ◽  
Author(s):  
Zhenshuo Chen ◽  
Eoin Brophy ◽  
Tomas Ward

<div>Network and system security are incredibly critical issues now. Due to the rapid proliferation of malware, traditional analysis methods struggle with enormous samples.</div><div>In this paper, we propose four easy-to-extract and small-scale features, including sizes and permissions of Windows PE sections, content complexity, and import libraries, to classify malware families, and use automatic machine learning to search for the best model and hyper-parameters for each feature and their combinations. Compared with detailed behavior-related features like API sequences, proposed features provide macroscopic information about malware. The analysis is based on static disassembly scripts and hexadecimal machine code. Unlike dynamic behavior analysis, static analysis is resource-efficient and offers complete code coverage, but is vulnerable to code obfuscation and encryption.<br></div><div>The results demonstrate that features which work well in dynamic analysis are not necessarily effective when applied to static analysis. For instance, API 4-grams only achieve 57.96% accuracy and involve a relatively high dimensional feature set (5000 dimensions). In contrast, the novel proposed features together with a classical machine learning algorithm (Random Forest) presents very good accuracy at 99.40% and the feature vector is of much smaller dimension (40 dimensions). We demonstrate the effectiveness of this approach through integration in IDA Pro, which also facilitates the collection of new training samples and subsequent model retraining.<br></div>


2021 ◽  
Author(s):  
Zhenshuo Chen ◽  
Eoin Brophy ◽  
Tomas Ward

<div>Network and system security are incredibly critical issues now. Due to the rapid proliferation of malware, traditional analysis methods struggle with enormous samples.</div><div>In this paper, we propose four easy-to-extract and small-scale features, including sizes and permissions of Windows PE sections, content complexity, and import libraries, to classify malware families, and use automatic machine learning to search for the best model and hyper-parameters for each feature and their combinations. Compared with detailed behavior-related features like API sequences, proposed features provide macroscopic information about malware. The analysis is based on static disassembly scripts and hexadecimal machine code. Unlike dynamic behavior analysis, static analysis is resource-efficient and offers complete code coverage, but is vulnerable to code obfuscation and encryption.<br></div><div>The results demonstrate that features which work well in dynamic analysis are not necessarily effective when applied to static analysis. For instance, API 4-grams only achieve 57.96% accuracy and involve a relatively high dimensional feature set (5000 dimensions). In contrast, the novel proposed features together with a classical machine learning algorithm (Random Forest) presents very good accuracy at 99.40% and the feature vector is of much smaller dimension (40 dimensions). We demonstrate the effectiveness of this approach through integration in IDA Pro, which also facilitates the collection of new training samples and subsequent model retraining.<br></div>


2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-30
Author(s):  
Son Tuan Vu ◽  
Albert Cohen ◽  
Arnaud De Grandmaison ◽  
Christophe Guillon ◽  
Karine Heydemann

Software protections against side-channel and physical attacks are essential to the development of secure applications. Such protections are meaningful at machine code or micro-architectural level, but they typically do not carry observable semantics at source level. This renders them susceptible to miscompilation, and security engineers embed input/output side-effects to prevent optimizing compilers from altering them. Yet these side-effects are error-prone and compiler-dependent. The current practice involves analyzing the generated machine code to make sure security or privacy properties are still enforced. These side-effects may also be too expensive in fine-grained protections such as control-flow integrity. We introduce observations of the program state that are intrinsic to the correct execution of security protections, along with means to specify and preserve observations across the compilation flow. Such observations complement the input/output semantics-preservation contract of compilers. We introduce an opacification mechanism to preserve and enforce a partial ordering of observations. This approach is compatible with a production compiler and does not incur any modification to its optimization passes. We validate the effectiveness and performance of our approach on a range of benchmarks, expressing the secure compilation of these applications in terms of observations to be made at specific program points.


2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-32
Author(s):  
Peisen Yao ◽  
Qingkai Shi ◽  
Heqing Huang ◽  
Charles Zhang

This paper concerns the scalability challenges of symbolic abstraction: given a formula ϕ in a logic L and an abstract domain A , find a most precise element in the abstract domain that over-approximates the meaning of ϕ. Symbolic abstraction is an important point in the space of abstract interpretation, as it allows for automatically synthesizing the best abstract transformers. However, current techniques for symbolic abstraction can have difficulty delivering on its practical strengths, due to performance issues. In this work, we introduce two algorithms for the symbolic abstraction of quantifier-free bit-vector formulas, which apply to the bit-vector interval domain and a certain kind of polyhedral domain, respectively. We implement and evaluate the proposed techniques on two machine code analysis clients, namely static memory corruption analysis and constrained random fuzzing. Using a suite of 57,933 queries from the clients, we compare our approach against a diverse group of state-of-the-art algorithms. The experiments show that our algorithms achieve a substantial speedup over existing techniques and illustrate significant precision advantages for the clients. Our work presents strong evidence that symbolic abstraction of numeric domains can be efficient and practical for large and realistic programs.


2021 ◽  
Author(s):  
Mihai Oltean ◽  
D. Dumitrescu

Abstract Multi Expression Programming (MEP) is a new evolutionary paradigm intended for solving computationally difficult problems. MEP individuals are linear entities that encode complex computer programs. MEP chromosomes are represented in the same way as C or Pascal compilers translate mathematical expressions into machine code. MEP is used for solving some difficult problems like symbolic regression and game strategy discovering. MEP is compared with Gene Expression Programming (GEP) and Cartesian Genetic Programming (CGP) by using several well-known test problems. For the considered problems MEP outperforms GEP and CGP. For these examples MEP is two magnitude orders better than CGP.


2021 ◽  
Author(s):  
Mihai Oltean

Abstract Multi Expression Programming (MEP) is a Genetic Programming variant that uses a linear representation of chromosomes. MEP individuals are strings of genes encoding complex computer programs. When MEP individuals encode expressions, their representation is similar to the way in which compilers translate C or Pascal expressions into machine code. A unique MEP feature is the ability to store multiple solutions to a problem in a single chromosome. Usually, the best solution is chosen for fitness assignment. When solving symbolic regression or classification problems (or any other problems for which the training set is known before the problem is solved) MEP has the same complexity as other techniques storing a single solution in a chromosome (such as GP, CGP, GEP, or GE). Evaluation of the expressions encoded into an MEP individual can be performed by a single parsing of the chromosome. Offspring obtained by crossover and mutation is always syntactically correct MEP individuals (computer programs). Thus, no extra processing for repairing newly obtained individuals is needed.


2021 ◽  
Author(s):  
Mihai Oltean ◽  
D. Dumitrescu

Abstract Multi Expression Programming (MEP) is a new evolutionary paradigm intended for solving computationally difficult problems. MEP individuals are linear entities that encode complex computer programs. MEP chromosomes are represented in the same way as C or Pascal compilers translate mathematical expressions into machine code. MEP is used for solving some difficult problems like symbolic regression and game strategy discovering. MEP is compared with Gene Expression Programming (GEP) and Cartesian Genetic Programming (CGP) by using several well-known test problems. For the considered problems MEP outperforms GEP and CGP. For these examples MEP is two magnitude orders better than CGP.


2021 ◽  
Vol 14 (12) ◽  
pp. 2691-2694
Author(s):  
Henning Funke ◽  
Jens Teubner
Keyword(s):  

Queue ◽  
2021 ◽  
Vol 19 (2) ◽  
pp. 21-28
Author(s):  
George V. Neville-Neil

When you're starting out you want to be able to hold the entire program in your head if at all possible. Once you're conversant with your first, simple assembly language and the machine architecture you're working with, it will be completely possible to look at a page or two of your assembly and know not only what it is supposed to do but also what the machine will do for you step by step. When you look at a high-level language, you should be able to understand what you mean it to do, but often you have no idea just how your intent will be translated into action. Assembly and machine code is where the action is.


Sign in / Sign up

Export Citation Format

Share Document