Improving the efficiency of program analysis with symbolic execution

Buffer overflow (BO) is a well-known and widely exploited security vulnerability. Despite the extensive body of research, BO is still a threat menacing security-critical applications. The authors present a comprehensive systematic review on techniques intended to detecting BO vulnerabilities before releasing a software to production. They found that most of the studies addresses several vulnerabilities or memory errors, being not specific to BO detection. The authors organized them in seven categories: program analysis, testing, computational intelligence, symbolic execution, models, and code inspection. Program analysis, testing and code inspection techniques are available for use by the practitioner. However, program analysis adoption is hindered by the high number of false alarms; testing is broadly used but in ad hoc manner; and code inspection can be used in practice provided it is added as a task of the software development process. New techniques combining object code analysis with techniques from different categories seem a promising research avenue towards practical BO detection.

Download Full-text

Heuristic Guided Selective Path Exploration for Loop Structure in Coverage Testing

International Journal of Open Source Software and Processes ◽

10.4018/ijossp.2017040104 ◽

2017 ◽

Vol 8 (2) ◽

pp. 59-75

Author(s):

Xu-zhou Zhang ◽

Yun-zhan Gong ◽

Ya-Wen Wang

Keyword(s):

Program Analysis ◽

Symbolic Execution ◽

Loop Structure ◽

Program Behavior ◽

Function Calls ◽

Heuristic Strategy ◽

Coverage Testing ◽

Dynamic Execution ◽

And Function ◽

Combinatorial Strategy

Static program analysis is a strong technique for analyzing program behavior, but suffers from scalability problem, such as path explosion which is caused by the presence of loops and function calls. This article applies the selective execution mechanism and heuristic strategy on exploring paths through loops. This combinatorial strategy tries to alleviate the path explosion problem from three aspects: 1) exploring loops with different approaches according to their relative position to a specific target; 2) combining static analysis, dynamic execution, and symbolic execution to deal with the separated program; 3) applying a heuristic strategy on offering guidance for the path exploration. These approaches are integrated to automatically generate paths for specified targets in loop structure. Experimental results show that the authors' proposed strategy is available for combination of different loops. It outperforms some existing techniques on achieving better coverage for programs containing loops, and is applicable in engineering.

Download Full-text

Symbolic value-flow static analysis: deep, precise, complete modeling of Ethereum smart contracts

Proceedings of the ACM on Programming Languages ◽

10.1145/3485540 ◽

2021 ◽

Vol 5 (OOPSLA) ◽

pp. 1-30

Author(s):

Yannis Smaragdakis ◽

Neville Grech ◽

Sifis Lagouvardos ◽

Konstantinos Triantafyllou ◽

Ilias Tsatiris

Keyword(s):

Static Analysis ◽

High Precision ◽

Real World ◽

Program Analysis ◽

Symbolic Execution ◽

Research Work ◽

Smart Contracts ◽

Program Behavior ◽

Symbolic Value ◽

Value Flow

We present a static analysis approach that combines concrete values and symbolic expressions. This symbolic value-flow (“symvalic”) analysis models program behavior with high precision, e.g., full path sensitivity. To achieve deep modeling of program semantics, the analysis relies on a symbiotic relationship between a traditional static analysis fixpoint computation and a symbolic solver: the solver does not merely receive a complex “path condition” to solve, but is instead invoked repeatedly (often tens or hundreds of thousands of times), in close cooperation with the flow computation of the analysis. The result of the symvalic analysis architecture is a static modeling of program behavior that is much more complete than symbolic execution, much more precise than conventional static analysis, and domain-agnostic: no special-purpose definition of anti-patterns is necessary in order to compute violations of safety conditions with high precision. We apply the analysis to the domain of Ethereum smart contracts. This domain represents a fundamental challenge for program analysis approaches: despite numerous publications, research work has not been effective at uncovering vulnerabilities of high real-world value. In systematic comparison of symvalic analysis with past tools, we find significantly increased completeness (shown as 83-96% statement coverage and more true error reports) combined with much higher precision, as measured by rate of true positive reports. In terms of real-world impact, since the beginning of 2021, the analysis has resulted in the discovery and disclosure of several critical vulnerabilities, over funds in the many millions of dollars. Six separate bug bounties totaling over $350K have been awarded for these disclosures.

Download Full-text

Obtaining Real-World Benchmark Programs from Open-Source Repositories Through Abstract-Semantics Preserving Transformations

10.18122/td/1644/boisestate ◽

2020 ◽

Author(s):

Maria Paquin

Keyword(s):

Open Source ◽

Static Analysis ◽

Real World ◽

Program Analysis ◽

Symbolic Execution ◽

Analysis Techniques ◽

Second Stage ◽

The Third ◽

Third Stage ◽

Transformation Algorithms

Benchmark programs are an integral part of program analysis research. Researchers use benchmark programs to evaluate existing techniques and test the feasibility of new approaches. The larger and more realistic the set of benchmarks, the more confident a researcher can be about the correctness and reproducibility of their results. However, obtaining an adequate set of benchmark programs has been a long-standing challenge in the program analysis community. In this thesis, we present the APT tool, a framework we designed and implemented to automate the generation of realistic benchmark programs suitable for program analysis evaluations. Our tool targets intra-procedural analyses that operate on an integer domain, specifically symbolic execution. The framework is composed of three main stages. In the first stage, the tool extracts potential benchmark programs from open-source repositories suitable for symbolic execution. In the second stage, the tool transforms the extracted programs into compilable, stand-alone benchmarks by removing external dependencies and nonlinear expressions. In the third stage, the benchmarks are verified and made available for the user. We have designed our transformation algorithms to remove program dependencies and nonlinear expressions while preserving their semantics-equivalence in the abstraction of symbolic analysis. That is, we want the information the analysis computes on the original program and its transformed version to be equivalent. Our work provides static analysis researchers with concise, compilable benchmark programs that are relevant to symbolic execution, allowing them to focus their efforts on advancing analysis techniques. Furthermore, our work benefits the software engineering community by enabling static analysis researchers to perform benchmarking with a large, realistic set of programs, thus strengthening the empirical evidence of the advancements in static program analysis.

Download Full-text

Differential program analysis with fuzzing and symbolic execution

Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering - ASE 2018 ◽

10.1145/3238147.3241537 ◽

2018 ◽

Author(s):

Yannic Noller

Keyword(s):

Program Analysis ◽

Symbolic Execution

Download Full-text

Infeasible Path Detection Based on Code Pattern and Backward Symbolic Execution

Mathematical Problems in Engineering ◽

10.1155/2020/4258291 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Yang Song ◽

Xuzhou Zhang ◽

Yun-Zhan Gong

Keyword(s):

Program Analysis ◽

Symbolic Execution ◽

Test Case ◽

Code Coverage ◽

Detection Techniques ◽

Coverage Analysis ◽

Low Efficiency ◽

The Relationship ◽

Selection Of ◽

Path Detection

This paper sets out to reveal the relationship between code pattern and infeasible paths and gives advices to the selection of infeasible path detection techniques. Lots of program paths are proved to be infeasible, which leads to imprecision and low efficiency of program analysis. Detection of infeasible paths is required in many areas of software engineering including test coverage analysis, test case generation, and security vulnerability analysis. The immediate cause of path infeasibility is the contradiction of path constraints, whose distribution will affect the performance of different program analysis techniques. But there is a lack of research on the distribution of contradict constraints currently. We propose a code pattern based on the empirical study of infeasible paths; the statistical result proves the correlation of the pattern with contradict constraints. We then develop a path feasibility detection method based on backward symbolic execution. Performance of the proposed technique is evaluated from two aspects: the efficiency of detecting infeasibility paths for specific program element and the improvement of applying the technique on code coverage testing.

Download Full-text

A Theory of Arrays with set and copy Operations

10.29007/q58t ◽

2018 ◽

Author(s):

Stephan Falke ◽

Carsten Sinz ◽

Florian Merz

Keyword(s):

Model Checking ◽

Program Analysis ◽

Software Verification ◽

Symbolic Execution ◽

Bounded Model Checking ◽

Main Memory ◽

Basic Theory ◽

Analysis Software

The theory of arrays is widely used in order to model main memory in program analysis, software verification, bounded model checking, symbolic execution, etc. Nonetheless, the basic theory as introduced by McCarthy is not expressive enough for important practical cases since it only supports array updates at single locations. In programs, the memory is often modified using functions such as memset or memcpy/memmove, which modify a user-specified range of locations whose size might not be known statically. In this paper we present an extension of the theory of arrays with set and copy operations which make it possible to reason about such functions. We also discuss further applications of the theory.

Download Full-text

Generating Test Inputs from String Constraints with an Automata-Based Solver

10.18122/td.1876.boisestate ◽

2021 ◽

Author(s):

Marlin Roberts

Keyword(s):

Program Analysis ◽

Symbolic Execution ◽

Research Area ◽

Computational Techniques ◽

Complex Data ◽

Data Types ◽

Test Input ◽

Analysis Technique ◽

Important Research Area ◽

Backward Analysis

Software testing is an integral part of the software development process. To test certain parts of software, developers need to identify inputs that reach those parts. Data and control dependencies make this a non-trivial task, and as the complexity of software increases it becomes more difficult to manually derive such inputs. Due to complex data manipulations, this process is even more challenging for programs with string inputs, such as security applications. Thus, automated reachability test input generation for string data types is an important research area. Symbolic Execution is a path-sensitive static program analysis technique that can automatically generate conditions for inputs that reach a given program location. Commonly, such conditions are encoded as automata that describe a set of strings at that location. Automata result from string operations applied to inputs along that path. However, these automata do not necessarily correspond to string inputs that result in string values at the program location. To find those input values, we need to undo the effects of string operations through backward analysis. The intricate relationships between symbolic string values complicate this process. These relationships are due to non-injective string operations and data-flow dependencies of string values. This thesis presents a novel method for test input generation for automatabased string constraints. It uses single-track automata along with novel computational techniques to perform inverse string operations. Empirical evaluations on a set of benchmarks have shown this method to be effective in solving automatabased string constraints from real-world applications.

Download Full-text

A novel symbolic execution framework for multi-procedure program analysis

2009 2nd IEEE International Conference on Broadband Network & Multimedia Technology ◽

10.1109/icbnmt.2009.5347802 ◽

2009 ◽

Cited By ~ 1

Author(s):

Fan Wenqing ◽

Liang Hongling ◽

Yang Yixian ◽

Xu Guoai

Keyword(s):

Program Analysis ◽

Symbolic Execution

Download Full-text

Order-sorted Homeomorphic Embedding Modulo Combinations of Associativity and/or Commutativity Axioms*

Fundamenta Informaticae ◽

10.3233/fi-2020-1991 ◽

2020 ◽

Vol 177 (3-4) ◽

pp. 297-329

Author(s):

María Alpuente ◽

Angel Cuenca-Ortega ◽

Santiago Escobar ◽

José Meseguer

Keyword(s):

Program Analysis ◽

Symbolic Execution ◽

Experimental Results ◽

Termination Criteria ◽

Symbolic Methods

The Homeomorphic Embedding relation has been amply used for defining termination criteria of symbolic methods for program analysis, transformation, and verification. However, homeomorphic embedding has never been investigated in the context of order-sorted rewrite theories that support symbolic execution methods modulo equational axioms. This paper generalizes the symbolic homeomorphic embedding relation to order–sorted rewrite theories that may contain various combinations of associativity and/or commutativity axioms for different binary operators. We systematically measure the performance of different, increasingly efficient formulations of the homeomorphic embedding relation modulo axioms that we implement in Maude. Our experimental results show that the most efficient version indeed pays off in practice.

Download Full-text