scholarly journals Program analysis via efficient symbolic abstraction

2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-32
Author(s):  
Peisen Yao ◽  
Qingkai Shi ◽  
Heqing Huang ◽  
Charles Zhang

This paper concerns the scalability challenges of symbolic abstraction: given a formula ϕ in a logic L and an abstract domain A , find a most precise element in the abstract domain that over-approximates the meaning of ϕ. Symbolic abstraction is an important point in the space of abstract interpretation, as it allows for automatically synthesizing the best abstract transformers. However, current techniques for symbolic abstraction can have difficulty delivering on its practical strengths, due to performance issues. In this work, we introduce two algorithms for the symbolic abstraction of quantifier-free bit-vector formulas, which apply to the bit-vector interval domain and a certain kind of polyhedral domain, respectively. We implement and evaluate the proposed techniques on two machine code analysis clients, namely static memory corruption analysis and constrained random fuzzing. Using a suite of 57,933 queries from the clients, we compare our approach against a diverse group of state-of-the-art algorithms. The experiments show that our algorithms achieve a substantial speedup over existing techniques and illustrate significant precision advantages for the clients. Our work presents strong evidence that symbolic abstraction of numeric domains can be efficient and practical for large and realistic programs.

2020 ◽  
Vol 32 (6) ◽  
pp. 101-110
Author(s):  
Mikhail Aleksandrovich Solovev ◽  
Maksim Gennadevich Bakulin ◽  
Sergei Sergeevich Makarov ◽  
Dmitrii Valerevich Manushin ◽  
Vartan Andronikovich Padaryan

The mathematical foundations of abstract interpretation provide a unified method of formalization and research of program analysis algorithms for a broad spectrum of practical problems. However, its practical usage for binary code analysis faces several challenges, of both scientific and engineering nature. In this paper we address some of those challenges. We describe an intermediate representation that is tailored to binary code analysis; unlike some other IRs it is still useable in system code analysis. To achieve this, we take into account the low-level specifics of how CPUs work; on the IR level this mostly pertains to modeling main memory in that accesses can fail, and addresses can alias. Further, we propose an infrastructure for carrying out abstract interpretation on top of the IR. The user needs to implement the abstract state and the transfer functions, and the infrastructure handles the rest: two executors are currently implemented, one for analysis of a single path, and one for fixed point analysis. Both executors handle interprocedural analysis internally, via inlining or using summaries, so the interpretations only consider only procedure at a time, which greatly simplifies implementation. The IR and the abstract interpretation framework are used together to define a model pipeline for a target instruction set architecture, consisting of a fetch stage, a decode stage, and an execute stage. A distinct fetch stage allows to model delay slots, hardware loops, etc. We currently have limited implementations for RISC-V and x86. The x86 implementation is evaluated in two experiments where concolic execution is used to automatically analyze a «crackme» program, both in dynamic (execution trace) and static (executable image) setting. In conclusion, we outline the future directions of our project.


Diversity ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 129
Author(s):  
María Capa ◽  
Pat Hutchings

Annelida is a ubiquitous, common and diverse group of organisms, found in terrestrial, fresh waters and marine environments. Despite the large efforts put into resolving the evolutionary relationships of these and other Lophotrochozoa, and the delineation of the basal nodes within the group, these are still unanswered. Annelida holds an enormous diversity of forms and biological strategies alongside a large number of species, following Arthropoda, Mollusca, Vertebrata and perhaps Platyhelminthes, among the species most rich in phyla within Metazoa. The number of currently accepted annelid species changes rapidly when taxonomic groups are revised due to synonymies and descriptions of a new species. The group is also experiencing a recent increase in species numbers as a consequence of the use of molecular taxonomy methods, which allows the delineation of the entities within species complexes. This review aims at succinctly reviewing the state-of-the-art of annelid diversity and summarizing the main systematic revisions carried out in the group. Moreover, it should be considered as the introduction to the papers that form this Special Issue on Systematics and Biodiversity of Annelids.


Cybersecurity ◽  
2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Shushan Arakelyan ◽  
Sima Arasteh ◽  
Christophe Hauser ◽  
Erik Kline ◽  
Aram Galstyan

AbstractTackling binary program analysis problems has traditionally implied manually defining rules and heuristics, a tedious and time consuming task for human analysts. In order to improve automation and scalability, we propose an alternative direction based on distributed representations of binary programs with applicability to a number of downstream tasks. We introduce Bin2vec, a new approach leveraging Graph Convolutional Networks (GCN) along with computational program graphs in order to learn a high dimensional representation of binary executable programs. We demonstrate the versatility of this approach by using our representations to solve two semantically different binary analysis tasks – functional algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results, and demonstrate improvement over state-of-the-art methods for both tasks. We evaluated Bin2vec on 49191 binaries for the functional algorithm classification task, and on 30 different CWE-IDs including at least 100 CVE entries each for the vulnerability discovery task. We set a new state-of-the-art result by reducing the classification error by 40% compared to the source-code based inst2vec approach, while working on binary code. For almost every vulnerability class in our dataset, our prediction accuracy is over 80% (and over 90% in multiple classes).


2020 ◽  
Vol 34 (05) ◽  
pp. 8164-8171
Author(s):  
Bing Li ◽  
Shifeng Liu ◽  
Yifang Sun ◽  
Wei Wang ◽  
Xiang Zhao

Recently, there has been an increasing interest in identifying named entities with nested structures. Existing models only make independent typing decisions on the entire entity span while ignoring strong modification relations between sub-entity types. In this paper, we present a novel Recursively Binary Modification model for nested named entity recognition. Our model utilizes the modification relations among sub-entities types to infer the head component on top of a Bayesian framework and uses entity head as a strong evidence to determine the type of the entity span. The process is recursive, allowing lower-level entities to help better model those on the outer-level. To the best of our knowledge, our work is the first effort that uses modification relation in nested NER task. Extensive experiments on four benchmark datasets demonstrate that our model outperforms state-of-the-art models in nested NER tasks, and delivers competitive results with state-of-the-art models in flat NER task, without relying on any extra annotations or NLP tools.


2012 ◽  
Vol 2 (3) ◽  
Author(s):  
Mykola Nikitchenko ◽  
Valentyn Tymofieiev

AbstractComposition-nominative logics are algebra-based logics of partial predicates constructed in a semantic-syntactic style on the methodological basis, which is common with programming. They can be considered as generalizations of traditional logics on classes of partial predicates that do not have fixed arity. In this paper we present and investigate algorithms for solving the satisfiability problem in various classes of composition-nominative logics. We consider the satisfiability problem for logics of the propositional, renominative, and quantifier levels and prove the reduction of the problem to the satisfiability problem for classical logics. The method developed in the paper enables us to leverage existent state-of-the-art satisfiability checking procedures for solving the satisfiability problem in composition-nominative logics, which could be crucial for handling industrial instances coming from domains such as program analysis and verification. The reduction proposed in the paper requires extension of logic language and logic models with an infinite number of unessential variables and with a predicate of equality to a constant.


2017 ◽  
Vol 29 (3) ◽  
pp. 531-557
Author(s):  
Marco Comini ◽  
María-del-Mar Gallardo ◽  
Laura Titolo ◽  
Alicia Villanueva

2021 ◽  
Vol 43 (3) ◽  
pp. 1-51
Author(s):  
Graeme Gange ◽  
Zequn Ma ◽  
Jorge A. Navas ◽  
Peter Schachte ◽  
Harald Søndergaard ◽  
...  

Zones and Octagons are popular abstract domains for static program analysis. They enable the automated discovery of simple numerical relations that hold between pairs of program variables. Both domains are well understood mathematically but the detailed implementation of static analyses based on these domains poses many interesting algorithmic challenges. In this article, we study the two abstract domains, their implementation and use. Utilizing improved data structures and algorithms for the manipulation of graphs that represent difference-bound constraints, we present fast implementations of both abstract domains, built around a common infrastructure. We compare the performance of these implementations against alternative approaches offering the same precision. We quantify the differences in performance by measuring their speed and precision on standard benchmarks. We also assess, in the context of software verification, the extent to which the improved precision translates to better verification outcomes. Experiments demonstrate that our new implementations improve the state of the art for both Zones and Octagons significantly.


Sign in / Sign up

Export Citation Format

Share Document