Program Analysis and Programming Languages for Security

Detecting buffer overruns from a source code is one of the most common and yet challenging tasks in program analysis. Current approaches based on rigid rules and handcrafted features are limited in terms of flexible applicability and robustness due to diverse bug patterns and characteristics existing in sophisticated real-world software programs. In this paper, we propose a novel, data-driven approach that is completely end-to-end without requiring any hand-crafted features, thus free from any program language-specific structural limitations. In particular, our approach leverages a recently proposed neural network model called memory networks that have shown the state-of-the-art performances mainly in question-answering tasks. Our experimental results using source code samples demonstrate that our proposed model is capable of accurately detecting different types of buffer overruns. We also present in-depth analyses on how a memory network can learn to understand the semantics in programming languages solely from raw source codes, such as tracing variables of interest, identifying numerical values, and performing their quantitative comparisons.

Download Full-text

Profiling Initialisation Behaviour in Java

10.26686/wgtn.17003419 ◽

2021 ◽

Author(s):

◽

Stephen Frank Nelson

Keyword(s):

Programming Languages ◽

Program Analysis ◽

Garbage Collection ◽

Hybrid Approach ◽

Memory Consumption ◽

Aspect Oriented Programming ◽

Analysis Study ◽

Programming Tools ◽

Mutable State ◽

Blank Slate

Freshly created objects are a blank slate: their mutable state and their constant properties must be initialised before they can be used. Programming languages like Java typically support object initialisation by providing constructor methods. This thesis examines the actual initialisation of objects in real-world programs to determine whether constructor methods support the initialisation that programmers actually perform. Determining which object initialisation techniques are most popular and how they can be identified will allow language designers to better understand the needs of programmers, and give insights that VM designers could use to optimise the performance of language implementations, reduce memory consumption, and improve garbage collection behaviour. Traditional profiling typically either focuses on timing, or uses sampling or heap snapshots to approximate whole program analysis. Classifying the behaviour of objects throughout their lifetime requires analysis of all program behaviour without approximation. This thesis presents two novel whole-program object profilers: one using purely class modification (#prof ), and a hybrid approach utilising class modification and JVM support (rprof ). #prof modifies programs using aspect-oriented programming tools to generate and aggregate data and examines objects that enter different collections to determine whether correlation exists between initialisation behaviour and the use of equality operators and collections. rprof confirms the results of an existing static analysis study of field initialisation using runtime analysis, and provides a novel study of object initialisation behaviour patterns.

Download Full-text

Reconstruction of Multi-Dimensional Form of Linearized Accesses to Arrays in SAPFOR

Russian Digital Libraries Journal ◽

10.26907/1562-5419-2020-23-4-770-787 ◽

2020 ◽

Vol 23 (4) ◽

pp. 770-787

Author(s):

Nikita Andreevich Kataev ◽

Vladislav Nikolaevich Vasilkin

Keyword(s):

Programming Languages ◽

Program Analysis ◽

Dimensional Structure ◽

Loop Nests ◽

Data Dependence Analysis ◽

C Programming ◽

Program Parallelization ◽

Hard Problems ◽

Automate Program

The system for automated parallelization SAPFOR (System FOR Automated Parallelization) includes tools for program analysis and transformation. The main goal of the system is to reduce the complexity of program parallelization. SAPFOR system is focused on the investigation of multilingual applications in Fortran and C programming languages. The low-level LLVM IR representation is used in SAPFOR for program analysis. This representation allows us to perform various IR-level optimizations to improve the quality of program analysis. At the same time, it loses some features of the program, which are available in its higher level representation. One of these features is the multi-dimensional structure of the arrays. Data dependence analysis is one of the main problems which should be solved to automate program parallelization. Moreover, such an analysis belongs to the class of NP-hard problems. Knowledge of the multidimensional structure of arrays allows in many cases to take into account the structure of index expressions in calls to arrays and reduce the complexity of the analysis. In addition, the use of multi-dimensional arrays allows us to use multi-dimensional processor matrix and to parallelize a whole loop nests, rather than a single loop in the nest. So, parallelism of a program is going to be increased. These opportunities are natively supported in the DVM system. This paper discusses the approach used in the SAPFOR system to recover the form of multi-dimensional arrays by their linearized representation in LLVM IR. The proposed approach has been successfully evaluated on various applications including performance tests from the NAS Parallel Benchmarks suite.

Download Full-text

Theoretical Aspects of Semantics-Based Language Implementation

DAIMI Report Series ◽

10.7146/dpb.v19i329.6561 ◽

1990 ◽

Vol 19 (329) ◽

Author(s):

Flemming Nielson

Keyword(s):

Programming Languages ◽

Program Analysis ◽

Code Generation ◽

Program Transformation ◽

Abstract Interpretation ◽

Language Implementation

The research summarised here concerns theoretical aspects involved in the implementation of programming languages directly from a description of their semantics. This involves a study of the subtasks abstract interpretation (a framework for program analysis), code generation and program transformation and the main aim has been to ensure the correctness of these subtasks.

Download Full-text

Making abstract models complete

Mathematical Structures in Computer Science ◽

10.1017/s0960129514000358 ◽

2014 ◽

Vol 26 (4) ◽

pp. 658-701 ◽

Cited By ~ 3

Author(s):

ROBERTO GIACOBAZZI ◽

ISABELLA MASTROENI

Keyword(s):

Programming Languages ◽

Program Analysis ◽

Abstract Interpretation ◽

False Alarms ◽

Program Transformations ◽

Monotone Functions ◽

Complete Lattices ◽

Semantics Of Programming Languages ◽

Mathematical Properties ◽

Predicate Transformer

Completeness is a key feature of abstract interpretation. It corresponds to exactness of the abstraction of fix-points and relies upon the need of absence of false alarms in static program analysis. Making abstract interpretation complete is therefore a major problem in approximating the semantics of programming languages. In this paper, we consider the problem of making abstract interpretations complete by minimally modifying the predicate transformer, i.e. the semantics, of a program. We study the mathematical properties of complete functions on complete lattices and prove the existence of minimal transformations of monotone functions to achieve completeness. We then apply minimal complete transformers to prove the minimality of standard program transformations in security, such as static program monitoring.

Download Full-text

Integrated Hardware Garbage Collection

ACM Transactions on Embedded Computing Systems ◽

10.1145/3450147 ◽

2021 ◽

Vol 20 (5) ◽

pp. 1-25

Author(s):

Andrés Amaya García ◽

David May ◽

Ed Nutting

Keyword(s):

Software Development ◽

Programming Languages ◽

Real Time ◽

Program Analysis ◽

Garbage Collection ◽

Data Representation ◽

Garbage Collector ◽

Tightly Coupled ◽

Memory Cycle ◽

High Level

Garbage collected programming languages, such as Python and C#, have accelerated software development. These modern languages increase productivity and software reliability as they provide high-level data representation and control structures. Modern languages are widely used in software development for mobile, desktop, and server devices, but their adoption is limited in real-time embedded systems. There is clear interest in supporting modern languages in embedded devices as emerging markets, like the Internet of Things, demand ever smarter and more reliable products. Multiple commercial and open-source projects, such as Zerynth and MicroPython, are attempting to provide support. But these projects rely on software garbage collectors that impose high overheads and introduce unpredictable pauses, preventing their use in many embedded applications. These limitations arise from the unsuitability of conventional processors for performing efficient, predictable garbage collection. We propose the Integrated Hardware Garbage Collector (IHGC); a garbage collector tightly coupled with the processor that runs continuously in the background. Further, we introduce a static analysis technique to guarantee that real-time programs are never paused by the collector. Our design allocates a memory cycle to the collector when the processor is not using the memory. The IHGC achieves this by careful division of collection work into single-memory-access steps that are interleaved with the processor’s memory accesses. As a result, our collector eliminates run-time overheads and enables real-time program analysis. The principles behind the IHGC can be used in conjunction with existing architectures. For example, we simulated the IHGC alongside the ARMv6-M architecture. Compared to a conventional processor, our experiments indicate that the IHGC offers 1.5–7 times better performance for programs that rely on garbage collection. The IHGC delivers the benefits of garbage-collected languages with real-time performance but without the complexity and overheads inherent in software collectors.

Download Full-text

Profiling Initialisation Behaviour in Java

10.26686/wgtn.17003419.v1 ◽

2021 ◽

Author(s):

◽

Stephen Frank Nelson

Keyword(s):

Programming Languages ◽

Program Analysis ◽

Garbage Collection ◽

Hybrid Approach ◽

Memory Consumption ◽

Aspect Oriented Programming ◽

Analysis Study ◽

Programming Tools ◽

Mutable State ◽

Blank Slate

Freshly created objects are a blank slate: their mutable state and their constant properties must be initialised before they can be used. Programming languages like Java typically support object initialisation by providing constructor methods. This thesis examines the actual initialisation of objects in real-world programs to determine whether constructor methods support the initialisation that programmers actually perform. Determining which object initialisation techniques are most popular and how they can be identified will allow language designers to better understand the needs of programmers, and give insights that VM designers could use to optimise the performance of language implementations, reduce memory consumption, and improve garbage collection behaviour. Traditional profiling typically either focuses on timing, or uses sampling or heap snapshots to approximate whole program analysis. Classifying the behaviour of objects throughout their lifetime requires analysis of all program behaviour without approximation. This thesis presents two novel whole-program object profilers: one using purely class modification (#prof ), and a hybrid approach utilising class modification and JVM support (rprof ). #prof modifies programs using aspect-oriented programming tools to generate and aggregate data and examines objects that enter different collections to determine whether correlation exists between initialisation behaviour and the use of equality operators and collections. rprof confirms the results of an existing static analysis study of field initialisation using runtime analysis, and provides a novel study of object initialisation behaviour patterns.

Download Full-text

Particpants' Proceedings on the Workshop: Types for Program Analysis

DAIMI Report Series ◽

10.7146/dpb.v24i493.7021 ◽

1995 ◽

Vol 24 (493) ◽

Author(s):

Hanne Riis Nielson ◽

Kirsten Lackner Solberg

Keyword(s):

Programming Languages ◽

Program Analysis ◽

Abstract Interpretation ◽

Recent Trend ◽

Type Systems ◽

Program Committee ◽

Program Optimization ◽

Proof Techniques ◽

Areas Of Interest

As a satellite meeting of the TAPSOFT'95 conference we organized a small workshop on program analysis. The title of the workshop, ``Types for Program Analysis´´, was motivated by the recent trend of letting the presentation and development of program analyses be influenced by annotated type systems, effect systems, and more general logical systems. The contents of the workshop was intended to be somewhat broader; consequently the call for participation listed the following areas of interest:- specification of specific analyses for programming languages,- the role of effects, polymorphism, conjunction/disjunction types, dependent types etc.in specification of analyses,- algorithmic tools and methods for solving general classes of type-based analyses,- the role of unification, semi-unification etc. in implementations of analyses,- proof techniques for establishing the safety of analyses,- relationship to other approaches to program analysis, including abstract interpretation and constraint-based methods,- exploitation of analysis results in program optimization and implementation.The submissions were not formally refereed; however each submission was read by several members of the program committee and received detailed comments and suggestions for improvement. We expect that several of the papers, in slightly revised forms, will show up at future conferences. The workshop took place at Aarhus University on May 26 and May 27 and lasted two half days.

Download Full-text

A Clausal Normal Form Translation for FOOL

10.29007/ltkk ◽

2018 ◽

Author(s):

Evgenii Kotelnikov ◽

Laura Kovács ◽

Martin Suda ◽

Andrei Voronkov

Keyword(s):

Normal Form ◽

Programming Languages ◽

Program Analysis ◽

State Of The Art ◽

Order Logic ◽

Theorem Prover ◽

First Order Logic ◽

First Order ◽

Theorem Provers ◽

Superposition Calculus

Automated theorem provers for first-order logic usually operate on sets of first-order clauses. It is well-known that the translation of a formula in full first-order logic to a clausal normal form (CNF) can crucially affect performance of a theorem prover. In our recent work we introduced a modification of first-order logic extended by the first class boolean sort and syntactical constructs that mirror features of programming languages. We called this logic FOOL. Formulas in FOOL can be translated to ordinary first-order formulas and checked by first-order theorem provers. While this translation is straightforward, it does not result in a CNF that can be efficiently handled by state-of-the-art theorem provers which use superposition calculus. In this paper we present a new CNF translation algorithm for FOOL that is friendly and efficient for superposition-based first-order provers. We implemented the algorithm in the Vampire theorem prover and evaluated it on a large number of problems coming from formalisation of mathematics and program analysis. Our experimental results show an increase of performance of the prover with our CNF translation compared to the naive translation.

Download Full-text