Benchmark programs are an integral part of program analysis research. Researchers
use benchmark programs to evaluate existing techniques and test the feasibility of
new approaches. The larger and more realistic the set of benchmarks, the more confident
a researcher can be about the correctness and reproducibility of their results. However,
obtaining an adequate set of benchmark programs has been a long-standing challenge in the
program analysis community.
In this thesis, we present the APT tool, a framework we designed and implemented to
automate the generation of realistic benchmark programs suitable for program analysis
evaluations. Our tool targets intra-procedural analyses that operate on an integer domain,
specifically symbolic execution. The framework is composed of three main stages. In the
first stage, the tool extracts potential benchmark programs from open-source repositories
suitable for symbolic execution. In the second stage, the tool transforms the extracted
programs into compilable, stand-alone benchmarks by removing external dependencies and
nonlinear expressions. In the third stage, the benchmarks are verified and made available
for the user.
We have designed our transformation algorithms to remove program dependencies and nonlinear
expressions while preserving their semantics-equivalence in the abstraction of symbolic
analysis. That is, we want the information the analysis computes on the original program and
its transformed version to be equivalent. Our work provides static analysis researchers with
concise, compilable benchmark programs that are relevant to symbolic execution, allowing them
to focus their efforts on advancing analysis techniques. Furthermore, our work benefits the
software engineering community by enabling static analysis researchers to perform benchmarking
with a large, realistic set of programs, thus strengthening the empirical evidence of the
advancements in static program analysis.