Machine learning steered symbolic execution framework for complex software code

Author(s):  
Lei Bu ◽  
Yongjuan Liang ◽  
Zhunyi Xie ◽  
Hong Qian ◽  
Yi-Qi Hu ◽  
...  
2021 ◽  
Vol 28 (1) ◽  
pp. 38-51
Author(s):  
Petr D. Borisov ◽  
Yury V. Kosolapov

Obfuscation is used to protect programs from analysis and reverse engineering. There are theoretically effective and resistant obfuscation methods, but most of them are not implemented in practice yet. The main reasons are large overhead for the execution of obfuscated code and the limitation of application only to a specific class of programs. On the other hand, a large number of obfuscation methods have been developed that are applied in practice. The existing approaches to the assessment of such obfuscation methods are based mainly on the static characteristics of programs. Therefore, the comprehensive (taking into account the dynamic characteristics of programs) justification of their effectiveness and resistance is a relevant task. It seems that such a justification can be made using machine learning methods, based on feature vectors that describe both static and dynamic characteristics of programs. In this paper, it is proposed to build such a vector on the basis of characteristics of two compared programs: the original and obfuscated, original and deobfuscated, obfuscated and deobfuscated. In order to obtain the dynamic characteristics of the program, a scheme based on a symbolic execution is constructed and presented in this paper. The choice of the symbolic execution is justified by the fact that such characteristics can describe the difficulty of comprehension of the program in dynamic analysis. The paper proposes two implementations of the scheme: extended and simplified. The extended scheme is closer to the process of analyzing a program by an analyst, since it includes the steps of disassembly and translation into intermediate code, while in the simplified scheme these steps are excluded. In order to identify the characteristics of symbolic execution that are suitable for assessing the effectiveness and resistance of obfuscation based on machine learning methods, experiments with the developed schemes were carried out. Based on the obtained results, a set of suitable characteristics is determined.


2019 ◽  
Vol 1213 ◽  
pp. 022016
Author(s):  
AiGuo Lu ◽  
KangYi Luo

Software code smells are the structural features which reside in a software source code. Code smell detection is an established method to discover the problems in source code and reorganize the inner structure of object-oriented software for improving the quality of such software, particularly in terms of maintainability, reusability and cost minimization. The developer identified where the code smell is identified and rectified within a system is a major challenging issue. The various code smell detection technique has been designed but it failed to classify the code type and minimum rectification cost. In order to perform classification with minimum cost, an efficient technique called Machine Learning Ada-Boost Classifier (MLABC) technique is introduced. The MLABC technique improves the software quality by identifying and rectifying the different types of software code smell in source code. Initially, MLABC technique uses decision tree as base classifier to identify the code smell type. The decision tree is used to classify the code smell type based on the certain rule. After that, the base classifiers are combined to make a strong classifier using adaboost machine learning technique. The output of strong classifier is used to identify the code smell type. Finally, the code smell type rectification is performed by applying the refactoring technique where the code smell is identified with minimum cost and space complexity. Experimental results shows that the proposed MLABC technique improves the software code quality in terms of code smell type identification accuracy, false positive rate, code smell type rectification cost and space complexity with the source code


2021 ◽  
Author(s):  
Jean-Rémy Marchand ◽  
Bernard Pirard ◽  
Peter Ertl ◽  
Finton Sirockin

<p></p><p>The accurate description of protein binding sites is essential to the determination of similarity and the application of machine learning methods to relate the binding sites to observed functions. This work describes CAVIAR, a new open source tool for generating descriptors for binding sites, using protein structures in PDB and mmCIF format as well as trajectory frames from molecular dynamics simulations as input. The applicability of CAVIAR descriptors is showcased by computing machine learning predictions of binding site ligandability. The method can also automatically assign subcavities, even in the absence of a bound ligand. The defined subpockets mimic the empirical definitions used in medicinal chemistry projects. It is shown that the experimental binding affinity scales relatively well with the number of subcavities filled by the ligand, with compounds binding to more than three subcavities having nanomolar or better affinities to the target. The CAVIAR descriptors and methods can be used in any machine learning-based investigations of problems involving binding sites, from protein engineering to hit identification. The full software code is available on GitHub and a conda package is hosted on Anaconda cloud.</p><p></p>


Author(s):  
Weimin Chen ◽  
Xinran Li ◽  
Yuting Sui ◽  
Ningyu He ◽  
Haoyu Wang ◽  
...  

Ponzi schemes are financial scams that lure users under the promise of high profits. With the prosperity of Bitcoin and blockchain technologies, there has been growing anecdotal evidence that this classic fraud has emerged in the blockchain ecosystem. Existing studies have proposed machine-learning based approaches for detecting Ponzi schemes, i.e., either based on the operation codes (opcodes) of the smart contract binaries or the transaction patterns of addresses. However, state-of-the-art approaches face several major limitations, including lacking interpretability and high false positive rates. Moreover, machine-learning based methods are susceptible to evasion techniques, and transaction-based techniques do not work on smart contracts that have a small number of transactions. These limitations render existing methods for detecting Ponzi schemes ineffective. In this paper, we propose SADPonzi, a semantic-aware detection approach for identifying Ponzi schemes in Ethereum smart contracts. Specifically, by strictly following the definition of Ponzi schemes, we propose a heuristic-guided symbolic execution technique to first generate the semantic information for each feasible path in smart contracts and then identify investor-related transfer behaviors and the distribution strategies adopted. Experimental result on a well-labelled benchmark suggests that SADPonzi can achieve 100% precision and recall, outperforming all existing machine-learning based techniques. We further apply SADPonzi to all 3.4 million smart contracts deployed by EOAs in Ethereum and identify 835 Ponzi scheme contracts, with over 17 million US Dollars invested by victims. Our observations confirm the urgency of identifying and mitigating Ponzi schemes in the blockchain ecosystem.


Author(s):  
M.V. Buinevich ◽  
K.E. Izrailov

Over the past years, the use of unsafe software, the search for vulnerabilities in which relies on static and dynamic analysis, continues to be the main threat to the infosphere. The manual form of conducting static analysis is extremely time-consuming and requires the involvement of highly qualified, and therefore deficient specialists. An alternative is the automation of the process based on artificial intelligence. This work is aimed at finding solutions for the use of machine learning methods at all stages of the static analysis of program code, for which the formal needs of the stages and the possibilities of the methods are studied and correlated. The main result of the study is a generalized domain model, and private — 14 solutions to the “key” problems of static analysis of program code using machine learning methods.


Telecom IT ◽  
2019 ◽  
Vol 7 (4) ◽  
pp. 59-65
Author(s):  
M. Buinevich ◽  
P. Zhukovskaya ◽  
K. Izrailov ◽  
V. Pokussov

The article considers the possibility of using artificial intelligence in the field of information security. For this, the following domain contradiction was highlighted: sharpening of detection algorithms for specific vulnerabilities in VS code, constant modification of the vulnerability code. To resolve the contradiction, an object is considered – software with vulnerabilities in the process of its development, for the subject – ways of intellectual analysis of its characteristics. A hypothetical solution is proposed through the use of Machine Learning Technology to identify vulnerabilities in software, using a huge amount of accumulated knowledge. Also, possible signs of software representations used for the operation of the Technology are given.


Author(s):  
Sheng-Han Wen ◽  
Wei-Loon Mow ◽  
Wei-Ning Chen ◽  
Chien-Yuan Wang ◽  
Hsu-Chun Hsiao

Sign in / Sign up

Export Citation Format

Share Document