Program analysis via efficient symbolic abstraction

This paper concerns the scalability challenges of symbolic abstraction: given a formula ϕ in a logic L and an abstract domain A , find a most precise element in the abstract domain that over-approximates the meaning of ϕ. Symbolic abstraction is an important point in the space of abstract interpretation, as it allows for automatically synthesizing the best abstract transformers. However, current techniques for symbolic abstraction can have difficulty delivering on its practical strengths, due to performance issues. In this work, we introduce two algorithms for the symbolic abstraction of quantifier-free bit-vector formulas, which apply to the bit-vector interval domain and a certain kind of polyhedral domain, respectively. We implement and evaluate the proposed techniques on two machine code analysis clients, namely static memory corruption analysis and constrained random fuzzing. Using a suite of 57,933 queries from the clients, we compare our approach against a diverse group of state-of-the-art algorithms. The experiments show that our algorithms achieve a substantial speedup over existing techniques and illustrate significant precision advantages for the clients. Our work presents strong evidence that symbolic abstraction of numeric domains can be efficient and practical for large and realistic programs.

Download Full-text

Practical Abstract Interpretation of Binary Code

Proceedings of the Institute for System Programming of RAS ◽

10.15514/ispras-2020-32(6)-8 ◽

2020 ◽

Vol 32 (6) ◽

pp. 101-110

Author(s):

Mikhail Aleksandrovich Solovev ◽

Maksim Gennadevich Bakulin ◽

Sergei Sergeevich Makarov ◽

Dmitrii Valerevich Manushin ◽

Vartan Andronikovich Padaryan

Keyword(s):

Program Analysis ◽

Abstract Interpretation ◽

Binary Code ◽

Transfer Functions ◽

Main Memory ◽

Code Analysis ◽

Concolic Execution ◽

Point Analysis ◽

Dynamic Execution ◽

Mathematical Foundations

The mathematical foundations of abstract interpretation provide a unified method of formalization and research of program analysis algorithms for a broad spectrum of practical problems. However, its practical usage for binary code analysis faces several challenges, of both scientific and engineering nature. In this paper we address some of those challenges. We describe an intermediate representation that is tailored to binary code analysis; unlike some other IRs it is still useable in system code analysis. To achieve this, we take into account the low-level specifics of how CPUs work; on the IR level this mostly pertains to modeling main memory in that accesses can fail, and addresses can alias. Further, we propose an infrastructure for carrying out abstract interpretation on top of the IR. The user needs to implement the abstract state and the transfer functions, and the infrastructure handles the rest: two executors are currently implemented, one for analysis of a single path, and one for fixed point analysis. Both executors handle interprocedural analysis internally, via inlining or using summaries, so the interpretations only consider only procedure at a time, which greatly simplifies implementation. The IR and the abstract interpretation framework are used together to define a model pipeline for a target instruction set architecture, consisting of a fetch stage, a decode stage, and an execute stage. A distinct fetch stage allows to model delay slots, hardware loops, etc. We currently have limited implementations for RISC-V and x86. The x86 implementation is evaluated in two experiments where concolic execution is used to automatically analyze a «crackme» program, both in dynamic (execution trace) and static (executable image) setting. In conclusion, we outline the future directions of our project.

Download Full-text

Annelid Diversity: Historical Overview and Future Perspectives

Diversity ◽

10.3390/d13030129 ◽

2021 ◽

Vol 13 (3) ◽

pp. 129

Author(s):

María Capa ◽

Pat Hutchings

Keyword(s):

State Of The Art ◽

Molecular Taxonomy ◽

Evolutionary Relationships ◽

Diverse Group ◽

Future Perspectives ◽

Special Issue ◽

Species Numbers ◽

Species Complexes ◽

Taxonomic Groups ◽

A New Species

Annelida is a ubiquitous, common and diverse group of organisms, found in terrestrial, fresh waters and marine environments. Despite the large efforts put into resolving the evolutionary relationships of these and other Lophotrochozoa, and the delineation of the basal nodes within the group, these are still unanswered. Annelida holds an enormous diversity of forms and biological strategies alongside a large number of species, following Arthropoda, Mollusca, Vertebrata and perhaps Platyhelminthes, among the species most rich in phyla within Metazoa. The number of currently accepted annelid species changes rapidly when taxonomic groups are revised due to synonymies and descriptions of a new species. The group is also experiencing a recent increase in species numbers as a consequence of the use of molecular taxonomy methods, which allows the delineation of the entities within species complexes. This review aims at succinctly reviewing the state-of-the-art of annelid diversity and summarizing the main systematic revisions carried out in the group. Moreover, it should be considered as the introduction to the papers that form this Special Issue on Systematics and Biodiversity of Annelids.

Download Full-text

Bin2vec: learning representations of binary executable programs for security tasks

Cybersecurity ◽

10.1186/s42400-021-00088-4 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Shushan Arakelyan ◽

Sima Arasteh ◽

Christophe Hauser ◽

Erik Kline ◽

Aram Galstyan

Keyword(s):

Program Analysis ◽

State Of The Art ◽

Classification Error ◽

New Approach ◽

Convolutional Networks ◽

Computational Program ◽

Functional Algorithm ◽

Binary Program ◽

Vulnerability Discovery ◽

Executable Programs

AbstractTackling binary program analysis problems has traditionally implied manually defining rules and heuristics, a tedious and time consuming task for human analysts. In order to improve automation and scalability, we propose an alternative direction based on distributed representations of binary programs with applicability to a number of downstream tasks. We introduce Bin2vec, a new approach leveraging Graph Convolutional Networks (GCN) along with computational program graphs in order to learn a high dimensional representation of binary executable programs. We demonstrate the versatility of this approach by using our representations to solve two semantically different binary analysis tasks – functional algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results, and demonstrate improvement over state-of-the-art methods for both tasks. We evaluated Bin2vec on 49191 binaries for the functional algorithm classification task, and on 30 different CWE-IDs including at least 100 CVE entries each for the vulnerability discovery task. We set a new state-of-the-art result by reducing the classification error by 40% compared to the source-code based inst2vec approach, while working on binary code. For almost every vulnerability class in our dataset, our prediction accuracy is over 80% (and over 90% in multiple classes).

Download Full-text

Recursively Binary Modification Model for Nested Named Entity Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6329 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8164-8171

Author(s):

Bing Li ◽

Shifeng Liu ◽

Yifang Sun ◽

Wei Wang ◽

Xiang Zhao

Keyword(s):

Strong Evidence ◽

State Of The Art ◽

Named Entity Recognition ◽

Bayesian Framework ◽

Entity Recognition ◽

Named Entities ◽

Named Entity ◽

Nested Structures ◽

Benchmark Datasets ◽

Head Component

Recently, there has been an increasing interest in identifying named entities with nested structures. Existing models only make independent typing decisions on the entire entity span while ignoring strong modification relations between sub-entity types. In this paper, we present a novel Recursively Binary Modification model for nested named entity recognition. Our model utilizes the modification relations among sub-entities types to infer the head component on top of a Bayesian framework and uses entity head as a strong evidence to determine the type of the entity span. The process is recursive, allowing lower-level entities to help better model those on the outer-level. To the best of our knowledge, our work is the first effort that uses modification relation in nested NER task. Extensive experiments on four benchmark datasets demonstrate that our model outperforms state-of-the-art models in nested NER tasks, and delivers competitive results with state-of-the-art models in flat NER task, without relying on any extra annotations or NLP tools.

Download Full-text

Satisfiability in composition-nominative logics

Open Computer Science ◽

10.2478/s13537-012-0027-3 ◽

2012 ◽

Vol 2 (3) ◽

Cited By ~ 2

Author(s):

Mykola Nikitchenko ◽

Valentyn Tymofieiev

Keyword(s):

Infinite Number ◽

Program Analysis ◽

State Of The Art ◽

Satisfiability Problem ◽

Methodological Basis ◽

Logic Models ◽

Satisfiability Checking ◽

Partial Predicates

AbstractComposition-nominative logics are algebra-based logics of partial predicates constructed in a semantic-syntactic style on the methodological basis, which is common with programming. They can be considered as generalizations of traditional logics on classes of partial predicates that do not have fixed arity. In this paper we present and investigate algorithms for solving the satisfiability problem in various classes of composition-nominative logics. We consider the satisfiability problem for logics of the propositional, renominative, and quantifier levels and prove the reduction of the problem to the satisfiability problem for classical logics. The method developed in the paper enables us to leverage existent state-of-the-art satisfiability checking procedures for solving the satisfiability problem in composition-nominative logics, which could be crucial for handling industrial instances coming from domains such as program analysis and verification. The reduction proposed in the paper requires extension of logic language and logic models with an infinite number of unessential variables and with a predicate of equality to a constant.

Download Full-text

Proceedings of the 3rd ACM SIGPLAN International Workshop on the State of the Art in Java Program Analysis - SOAP '14

10.1145/2614628 ◽

2014 ◽

Keyword(s):

Program Analysis ◽

State Of The Art ◽

International Workshop ◽

The State ◽

Java Program

Download Full-text

Proceedings of the 5th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis

10.1145/2931021 ◽

2016 ◽

Keyword(s):

Program Analysis ◽

State Of The Art ◽

International Workshop

Download Full-text

A program analysis framework for tccp based on abstract interpretation

Formal Aspects of Computing ◽

10.1007/s00165-016-0409-8 ◽

2017 ◽

Vol 29 (3) ◽

pp. 531-557

Author(s):

Marco Comini ◽

María-del-Mar Gallardo ◽

Laura Titolo ◽

Alicia Villanueva

Keyword(s):

Program Analysis ◽

Abstract Interpretation ◽

Analysis Framework

Download Full-text

Proceedings of the 8th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis - SOAP 2019

10.1145/3315568 ◽

2019 ◽

Keyword(s):

Program Analysis ◽

State Of The Art ◽

International Workshop

Download Full-text

A Fresh Look at Zones and Octagons

ACM Transactions on Programming Languages and Systems ◽

10.1145/3457885 ◽

2021 ◽

Vol 43 (3) ◽

pp. 1-51

Author(s):

Graeme Gange ◽

Zequn Ma ◽

Jorge A. Navas ◽

Peter Schachte ◽

Harald Søndergaard ◽

...

Keyword(s):

Data Structures ◽

Program Analysis ◽

Software Verification ◽

State Of The Art ◽

The State ◽

Bound Constraints ◽

Data Structures And Algorithms ◽

Alternative Approaches ◽

Abstract Domains ◽

Automated Discovery

Zones and Octagons are popular abstract domains for static program analysis. They enable the automated discovery of simple numerical relations that hold between pairs of program variables. Both domains are well understood mathematically but the detailed implementation of static analyses based on these domains poses many interesting algorithmic challenges. In this article, we study the two abstract domains, their implementation and use. Utilizing improved data structures and algorithms for the manipulation of graphs that represent difference-bound constraints, we present fast implementations of both abstract domains, built around a common infrastructure. We compare the performance of these implementations against alternative approaches offering the same precision. We quantify the differences in performance by measuring their speed and precision on standard benchmarks. We also assess, in the context of software verification, the extent to which the improved precision translates to better verification outcomes. Experiments demonstrate that our new implementations improve the state of the art for both Zones and Octagons significantly.

Download Full-text