Machine learning steered symbolic execution framework for complex software code

Obfuscation is used to protect programs from analysis and reverse engineering. There are theoretically effective and resistant obfuscation methods, but most of them are not implemented in practice yet. The main reasons are large overhead for the execution of obfuscated code and the limitation of application only to a specific class of programs. On the other hand, a large number of obfuscation methods have been developed that are applied in practice. The existing approaches to the assessment of such obfuscation methods are based mainly on the static characteristics of programs. Therefore, the comprehensive (taking into account the dynamic characteristics of programs) justification of their effectiveness and resistance is a relevant task. It seems that such a justification can be made using machine learning methods, based on feature vectors that describe both static and dynamic characteristics of programs. In this paper, it is proposed to build such a vector on the basis of characteristics of two compared programs: the original and obfuscated, original and deobfuscated, obfuscated and deobfuscated. In order to obtain the dynamic characteristics of the program, a scheme based on a symbolic execution is constructed and presented in this paper. The choice of the symbolic execution is justified by the fact that such characteristics can describe the difficulty of comprehension of the program in dynamic analysis. The paper proposes two implementations of the scheme: extended and simplified. The extended scheme is closer to the process of analyzing a program by an analyst, since it includes the steps of disassembly and translation into intermediate code, while in the simplified scheme these steps are excluded. In order to identify the characteristics of symbolic execution that are suitable for assessing the effectiveness and resistance of obfuscation based on machine learning methods, experiments with the developed schemes were carried out. Based on the obtained results, a set of suitable characteristics is determined.

Download Full-text

Symbolic execution of complex program driven by machine learning based constraint solving

Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering - ASE 2016 ◽

10.1145/2970276.2970364 ◽

2016 ◽

Cited By ~ 8

Author(s):

Xin Li ◽

Yongjuan Liang ◽

Hong Qian ◽

Yi-Qi Hu ◽

Lei Bu ◽

...

Keyword(s):

Machine Learning ◽

Symbolic Execution ◽

Constraint Solving ◽

Complex Program

Download Full-text

A Machine Learning-based Software Code Weakness Detect Approach

Journal of Physics Conference Series ◽

10.1088/1742-6596/1213/2/022016 ◽

2019 ◽

Vol 1213 ◽

pp. 022016

Author(s):

AiGuo Lu ◽

KangYi Luo

Keyword(s):

Machine Learning ◽

Software Code

Download Full-text

Engrossing Prosecution of Code Smells Type Identification and Rectification using Machine Learning AdaBoost Classifier

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1331.029420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 624-631

Keyword(s):

Machine Learning ◽

Decision Tree ◽

False Positive Rate ◽

Source Code ◽

Minimum Cost ◽

Space Complexity ◽

Identification Accuracy ◽

Code Smells ◽

Code Smell ◽

Software Code

Software code smells are the structural features which reside in a software source code. Code smell detection is an established method to discover the problems in source code and reorganize the inner structure of object-oriented software for improving the quality of such software, particularly in terms of maintainability, reusability and cost minimization. The developer identified where the code smell is identified and rectified within a system is a major challenging issue. The various code smell detection technique has been designed but it failed to classify the code type and minimum rectification cost. In order to perform classification with minimum cost, an efficient technique called Machine Learning Ada-Boost Classifier (MLABC) technique is introduced. The MLABC technique improves the software quality by identifying and rectifying the different types of software code smell in source code. Initially, MLABC technique uses decision tree as base classifier to identify the code smell type. The decision tree is used to classify the code smell type based on the certain rule. After that, the base classifiers are combined to make a strong classifier using adaboost machine learning technique. The output of strong classifier is used to identify the code smell type. Finally, the code smell type rectification is performed by applying the refactoring technique where the code smell is identified with minimum cost and space complexity. Experimental results shows that the proposed MLABC technique improves the software code quality in terms of code smell type identification accuracy, false positive rate, code smell type rectification cost and space complexity with the source code

Download Full-text

CAVIAR: a method for automatic cavity detection, description and decomposition into subcavities

10.26434/chemrxiv.12806819.v3 ◽

2021 ◽

Author(s):

Jean-Rémy Marchand ◽

Bernard Pirard ◽

Peter Ertl ◽

Finton Sirockin

Keyword(s):

Machine Learning ◽

Binding Sites ◽

Protein Structures ◽

Cavity Detection ◽

Protein Binding Sites ◽

Machine Learning Methods ◽

Software Code ◽

Dynamics Simulations ◽

Hit Identification

The accurate description of protein binding sites is essential to the determination of similarity and the application of machine learning methods to relate the binding sites to observed functions. This work describes CAVIAR, a new open source tool for generating descriptors for binding sites, using protein structures in PDB and mmCIF format as well as trajectory frames from molecular dynamics simulations as input. The applicability of CAVIAR descriptors is showcased by computing machine learning predictions of binding site ligandability. The method can also automatically assign subcavities, even in the absence of a bound ligand. The defined subpockets mimic the empirical definitions used in medicinal chemistry projects. It is shown that the experimental binding affinity scales relatively well with the number of subcavities filled by the ligand, with compounds binding to more than three subcavities having nanomolar or better affinities to the target. The CAVIAR descriptors and methods can be used in any machine learning-based investigations of problems involving binding sites, from protein engineering to hit identification. The full software code is available on GitHub and a conda package is hosted on Anaconda cloud.

Download Full-text

SADPonzi: Detecting and Characterizing Ponzi Schemes in Ethereum Smart Contracts

Proceedings of the ACM on Measurement and Analysis of Computing Systems ◽

10.1145/3460093 ◽

2021 ◽

Vol 5 (2) ◽

pp. 1-30

Author(s):

Weimin Chen ◽

Xinran Li ◽

Yuting Sui ◽

Ningyu He ◽

Haoyu Wang ◽

...

Keyword(s):

Machine Learning ◽

Semantic Information ◽

Symbolic Execution ◽

Experimental Result ◽

Smart Contracts ◽

Smart Contract ◽

Detection Approach ◽

Ponzi Scheme ◽

Feasible Path ◽

Definition Of

Ponzi schemes are financial scams that lure users under the promise of high profits. With the prosperity of Bitcoin and blockchain technologies, there has been growing anecdotal evidence that this classic fraud has emerged in the blockchain ecosystem. Existing studies have proposed machine-learning based approaches for detecting Ponzi schemes, i.e., either based on the operation codes (opcodes) of the smart contract binaries or the transaction patterns of addresses. However, state-of-the-art approaches face several major limitations, including lacking interpretability and high false positive rates. Moreover, machine-learning based methods are susceptible to evasion techniques, and transaction-based techniques do not work on smart contracts that have a small number of transactions. These limitations render existing methods for detecting Ponzi schemes ineffective. In this paper, we propose SADPonzi, a semantic-aware detection approach for identifying Ponzi schemes in Ethereum smart contracts. Specifically, by strictly following the definition of Ponzi schemes, we propose a heuristic-guided symbolic execution technique to first generate the semantic information for each feasible path in smart contracts and then identify investor-related transfer behaviors and the distribution strategies adopted. Experimental result on a well-labelled benchmark suggests that SADPonzi can achieve 100% precision and recall, outperforming all existing machine-learning based techniques. We further apply SADPonzi to all 3.4 million smart contracts deployed by EOAs in Ethereum and identify 835 Ponzi scheme contracts, with over 17 million US Dollars invested by victims. Our observations confirm the urgency of identifying and mitigating Ponzi schemes in the blockchain ecosystem.

Download Full-text

Generalized model of software code`s static analysis based on machine learning for vulnerabilitys search

Informatization and communication ◽

10.34219/2078-8320-2020-11-2-143-152 ◽

2020 ◽

pp. 143-152 ◽

Cited By ~ 3

Author(s):

M.V. Buinevich ◽

K.E. Izrailov

Keyword(s):

Machine Learning ◽

Static Analysis ◽

Domain Model ◽

Highly Qualified ◽

Program Code ◽

Learning Methods ◽

Static And Dynamic Analysis ◽

The Past ◽

Machine Learning Methods ◽

Software Code

Over the past years, the use of unsafe software, the search for vulnerabilities in which relies on static and dynamic analysis, continues to be the main threat to the infosphere. The manual form of conducting static analysis is extremely time-consuming and requires the involvement of highly qualified, and therefore deficient specialists. An alternative is the automation of the process based on artificial intelligence. This work is aimed at finding solutions for the use of machine learning methods at all stages of the static analysis of program code, for which the formal needs of the stages and the possibilities of the methods are studied and correlated. The main result of the study is a generalized domain model, and private — 14 solutions to the “key” problems of static analysis of program code using machine learning methods.

Download Full-text

Application of Machine Training to Search Vulnerabilities in the Software Code

Telecom IT ◽

10.31854/2307-1303-2019-7-4-59-65 ◽

2019 ◽

Vol 7 (4) ◽

pp. 59-65

Author(s):

M. Buinevich ◽

P. Zhukovskaya ◽

K. Izrailov ◽

V. Pokussov

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Information Security ◽

Learning Technology ◽

Huge Amount ◽

Detection Algorithms ◽

The Subject ◽

Software Code

The article considers the possibility of using artificial intelligence in the field of information security. For this, the following domain contradiction was highlighted: sharpening of detection algorithms for specific vulnerabilities in VS code, constant modification of the vulnerability code. To resolve the contradiction, an object is considered – software with vulnerabilities in the process of its development, for the subject – ways of intellectual analysis of its characteristics. A hypothetical solution is proposed through the use of Machine Learning Technology to identify vulnerabilities in software, using a huge amount of accumulated knowledge. Also, possible signs of software representations used for the operation of the Technology are given.

Download Full-text

Enhancing Symbolic Execution by Machine Learning Based Solver Selection

Proceedings 2019 Workshop on Binary Analysis Research ◽

10.14722/bar.2019.23080 ◽

2019 ◽

Author(s):

Sheng-Han Wen ◽

Wei-Loon Mow ◽

Wei-Ning Chen ◽

Chien-Yuan Wang ◽

Hsu-Chun Hsiao

Keyword(s):

Machine Learning ◽

Symbolic Execution

Download Full-text