machine code Latest Research Papers

2021 ◽

Vol 7 (4) ◽

pp. 95-109

Author(s):

K. Izrailov

Keyword(s):

Source Code ◽

Subject Area ◽

Machine Code ◽

Iterative Approximation ◽

Quantum Leap ◽

New Concepts ◽

Multiparameter Optimization ◽

Series Of Experiments ◽

The Given ◽

Software Prototype

Reverse engineering correct source code from a machine code to find and neutralize vulnerabilities is the most pressing problem for the field of telecommunications equipment. The decompilation techniques applicable for this have potentially reached their evolutionary limit. As a result, new concepts are required that can make a quantum leap in problem solving. Proceeding from this, the paper proposes the concept of genetic decompilation, which is a solution to the problem of multiparameter optimization in the form of iterative approximation of instances of the source code to the "original" one which will compile to the given machine code. This concept is tested by conducting a series of experiments with the developed software prototype using a basic example of machine code. The results of the experiments prove the proof of the concept, thereby suggesting new innovative directions for ensuring information security in this subject area.

Download Full-text

Malware Classification Using Static Disassembly and Machine Learning

10.36227/techrxiv.17259806.v1 ◽

2021 ◽

Author(s):

Zhenshuo Chen ◽

Eoin Brophy ◽

Tomas Ward

Keyword(s):

Machine Learning ◽

Static Analysis ◽

Learning Algorithm ◽

Small Scale ◽

The Novel ◽

Machine Code ◽

Code Coverage ◽

Malware Classification ◽

Training Samples ◽

Critical Issues

<div>Network and system security are incredibly critical issues now. Due to the rapid proliferation of malware, traditional analysis methods struggle with enormous samples.</div><div>In this paper, we propose four easy-to-extract and small-scale features, including sizes and permissions of Windows PE sections, content complexity, and import libraries, to classify malware families, and use automatic machine learning to search for the best model and hyper-parameters for each feature and their combinations. Compared with detailed behavior-related features like API sequences, proposed features provide macroscopic information about malware. The analysis is based on static disassembly scripts and hexadecimal machine code. Unlike dynamic behavior analysis, static analysis is resource-efficient and offers complete code coverage, but is vulnerable to code obfuscation and encryption.<br></div><div>The results demonstrate that features which work well in dynamic analysis are not necessarily effective when applied to static analysis. For instance, API 4-grams only achieve 57.96% accuracy and involve a relatively high dimensional feature set (5000 dimensions). In contrast, the novel proposed features together with a classical machine learning algorithm (Random Forest) presents very good accuracy at 99.40% and the feature vector is of much smaller dimension (40 dimensions). We demonstrate the effectiveness of this approach through integration in IDA Pro, which also facilitates the collection of new training samples and subsequent model retraining.<br></div>

Download Full-text

Malware Classification Using Static Disassembly and Machine Learning

10.36227/techrxiv.17259806 ◽

2021 ◽

Author(s):

Zhenshuo Chen ◽

Eoin Brophy ◽

Tomas Ward

Keyword(s):

Machine Learning ◽

Static Analysis ◽

Learning Algorithm ◽

Small Scale ◽

The Novel ◽

Machine Code ◽

Code Coverage ◽

Malware Classification ◽

Training Samples ◽

Critical Issues

<div>Network and system security are incredibly critical issues now. Due to the rapid proliferation of malware, traditional analysis methods struggle with enormous samples.</div><div>In this paper, we propose four easy-to-extract and small-scale features, including sizes and permissions of Windows PE sections, content complexity, and import libraries, to classify malware families, and use automatic machine learning to search for the best model and hyper-parameters for each feature and their combinations. Compared with detailed behavior-related features like API sequences, proposed features provide macroscopic information about malware. The analysis is based on static disassembly scripts and hexadecimal machine code. Unlike dynamic behavior analysis, static analysis is resource-efficient and offers complete code coverage, but is vulnerable to code obfuscation and encryption.<br></div><div>The results demonstrate that features which work well in dynamic analysis are not necessarily effective when applied to static analysis. For instance, API 4-grams only achieve 57.96% accuracy and involve a relatively high dimensional feature set (5000 dimensions). In contrast, the novel proposed features together with a classical machine learning algorithm (Random Forest) presents very good accuracy at 99.40% and the feature vector is of much smaller dimension (40 dimensions). We demonstrate the effectiveness of this approach through integration in IDA Pro, which also facilitates the collection of new training samples and subsequent model retraining.<br></div>

Download Full-text

Reconciling optimization with secure compilation

Proceedings of the ACM on Programming Languages ◽

10.1145/3485519 ◽

2021 ◽

Vol 5 (OOPSLA) ◽

pp. 1-30

Author(s):

Son Tuan Vu ◽

Albert Cohen ◽

Arnaud De Grandmaison ◽

Christophe Guillon ◽

Karine Heydemann

Keyword(s):

Side Effects ◽

Control Flow ◽

Partial Ordering ◽

Input Output ◽

Machine Code ◽

Fine Grained ◽

Control Flow Integrity ◽

Correct Execution ◽

Source Level ◽

And Performance

Software protections against side-channel and physical attacks are essential to the development of secure applications. Such protections are meaningful at machine code or micro-architectural level, but they typically do not carry observable semantics at source level. This renders them susceptible to miscompilation, and security engineers embed input/output side-effects to prevent optimizing compilers from altering them. Yet these side-effects are error-prone and compiler-dependent. The current practice involves analyzing the generated machine code to make sure security or privacy properties are still enforced. These side-effects may also be too expensive in fine-grained protections such as control-flow integrity. We introduce observations of the program state that are intrinsic to the correct execution of security protections, along with means to specify and preserve observations across the compilation flow. Such observations complement the input/output semantics-preservation contract of compilers. We introduce an opacification mechanism to preserve and enforce a partial ordering of observations. This approach is compatible with a production compiler and does not incur any modification to its optimization passes. We validate the effectiveness and performance of our approach on a range of benchmarks, expressing the secure compilation of these applications in terms of observations to be made at specific program points.

Download Full-text

Program analysis via efficient symbolic abstraction

Proceedings of the ACM on Programming Languages ◽

10.1145/3485495 ◽

2021 ◽

Vol 5 (OOPSLA) ◽

pp. 1-32

Author(s):

Peisen Yao ◽

Qingkai Shi ◽

Heqing Huang ◽

Charles Zhang

Keyword(s):

Strong Evidence ◽

Program Analysis ◽

Abstract Interpretation ◽

State Of The Art ◽

Diverse Group ◽

Machine Code ◽

Code Analysis ◽

Polyhedral Domain ◽

Performance Issues ◽

Bit Vector

This paper concerns the scalability challenges of symbolic abstraction: given a formula ϕ in a logic L and an abstract domain A , find a most precise element in the abstract domain that over-approximates the meaning of ϕ. Symbolic abstraction is an important point in the space of abstract interpretation, as it allows for automatically synthesizing the best abstract transformers. However, current techniques for symbolic abstraction can have difficulty delivering on its practical strengths, due to performance issues. In this work, we introduce two algorithms for the symbolic abstraction of quantifier-free bit-vector formulas, which apply to the bit-vector interval domain and a certain kind of polyhedral domain, respectively. We implement and evaluate the proposed techniques on two machine code analysis clients, namely static memory corruption analysis and constrained random fuzzing. Using a suite of 57,933 queries from the clients, we compare our approach against a diverse group of state-of-the-art algorithms. The experiments show that our algorithms achieve a substantial speedup over existing techniques and illustrate significant precision advantages for the clients. Our work presents strong evidence that symbolic abstraction of numeric domains can be efficient and practical for large and realistic programs.

Download Full-text

Multi Expression Programming

10.21203/rs.3.rs-853086/v2 ◽

2021 ◽

Author(s):

Mihai Oltean ◽

D. Dumitrescu

Keyword(s):

Gene Expression ◽

Genetic Programming ◽

Gene Expression Programming ◽

Test Problems ◽

Machine Code ◽

Cartesian Genetic Programming ◽

Mathematical Expressions ◽

Multi Expression Programming ◽

Complex Computer ◽

Better Than

Abstract Multi Expression Programming (MEP) is a new evolutionary paradigm intended for solving computationally difficult problems. MEP individuals are linear entities that encode complex computer programs. MEP chromosomes are represented in the same way as C or Pascal compilers translate mathematical expressions into machine code. MEP is used for solving some difficult problems like symbolic regression and game strategy discovering. MEP is compared with Gene Expression Programming (GEP) and Cartesian Genetic Programming (CGP) by using several well-known test problems. For the considered problems MEP outperforms GEP and CGP. For these examples MEP is two magnitude orders better than CGP.

Download Full-text

Multi Expression Programming - an in-depth description

10.21203/rs.3.rs-898407/v1 ◽

2021 ◽

Author(s):

Mihai Oltean

Keyword(s):

Linear Representation ◽

Computer Programs ◽

Classification Problems ◽

Machine Code ◽

Training Set ◽

Single Chromosome ◽

Genes Encoding ◽

Multi Expression Programming ◽

Crossover And Mutation ◽

Complex Computer

Abstract Multi Expression Programming (MEP) is a Genetic Programming variant that uses a linear representation of chromosomes. MEP individuals are strings of genes encoding complex computer programs. When MEP individuals encode expressions, their representation is similar to the way in which compilers translate C or Pascal expressions into machine code. A unique MEP feature is the ability to store multiple solutions to a problem in a single chromosome. Usually, the best solution is chosen for fitness assignment. When solving symbolic regression or classification problems (or any other problems for which the training set is known before the problem is solved) MEP has the same complexity as other techniques storing a single solution in a chromosome (such as GP, CGP, GEP, or GE). Evaluation of the expressions encoded into an MEP individual can be performed by a single parsing of the chromosome. Offspring obtained by crossover and mutation is always syntactically correct MEP individuals (computer programs). Thus, no extra processing for repairing newly obtained individuals is needed.

Download Full-text

Multi Expression Programming

10.21203/rs.3.rs-853086/v1 ◽

2021 ◽

Author(s):

Mihai Oltean ◽

D. Dumitrescu

Keyword(s):

Gene Expression ◽

Genetic Programming ◽

Gene Expression Programming ◽

Test Problems ◽

Machine Code ◽

Cartesian Genetic Programming ◽

Mathematical Expressions ◽

Multi Expression Programming ◽

Complex Computer ◽

Better Than

Abstract Multi Expression Programming (MEP) is a new evolutionary paradigm intended for solving computationally difficult problems. MEP individuals are linear entities that encode complex computer programs. MEP chromosomes are represented in the same way as C or Pascal compilers translate mathematical expressions into machine code. MEP is used for solving some difficult problems like symbolic regression and game strategy discovering. MEP is compared with Gene Expression Programming (GEP) and Cartesian Genetic Programming (CGP) by using several well-known test problems. For the considered problems MEP outperforms GEP and CGP. For these examples MEP is two magnitude orders better than CGP.

Download Full-text

Low-latency compilation of SQL queries to machine code

Proceedings of the VLDB Endowment ◽

10.14778/3476311.3476321 ◽

2021 ◽

Vol 14 (12) ◽

pp. 2691-2694

Author(s):

Henning Funke ◽

Jens Teubner

Keyword(s):

Low Latency ◽

Machine Code

Download Full-text

In Praise of the Disassembler

Queue ◽

10.1145/3466132.3469121 ◽

2021 ◽

Vol 19 (2) ◽

pp. 21-28

Author(s):

George V. Neville-Neil

Keyword(s):

Assembly Language ◽

High Level Language ◽

Machine Code ◽

High Level ◽

Simple Assembly

When you're starting out you want to be able to hold the entire program in your head if at all possible. Once you're conversant with your first, simple assembly language and the machine architecture you're working with, it will be completely possible to look at a page or two of your assembly and know not only what it is supposed to do but also what the machine will do for you step by step. When you look at a high-level language, you should be able to understand what you mean it to do, but often you have no idea just how your intent will be translated into action. Assembly and machine code is where the action is.

Download Full-text

machine code
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Genetic Decompilation Concept of the Telecommunication Devices Machine Code

Malware Classification Using Static Disassembly and Machine Learning

Malware Classification Using Static Disassembly and Machine Learning

Reconciling optimization with secure compilation

Program analysis via efficient symbolic abstraction

Multi Expression Programming

Multi Expression Programming - an in-depth description

Multi Expression Programming

Low-latency compilation of SQL queries to machine code

In Praise of the Disassembler

Export Citation Format

machine codeRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Genetic Decompilation Concept of the Telecommunication Devices Machine Code

Malware Classification Using Static Disassembly and Machine Learning

Malware Classification Using Static Disassembly and Machine Learning

Reconciling optimization with secure compilation

Program analysis via efficient symbolic abstraction

Multi Expression Programming

Multi Expression Programming - an in-depth description

Multi Expression Programming

Low-latency compilation of SQL queries to machine code

In Praise of the Disassembler

machine code
Recently Published Documents