scholarly journals ARMv8-A System Semantics: Instruction Fetch in Relaxed Architectures

Author(s):  
Ben Simner ◽  
Shaked Flur ◽  
Christopher Pulte ◽  
Alasdair Armstrong ◽  
Jean Pichon-Pharabod ◽  
...  

AbstractComputing relies on architecture specifications to decouple hardware and software development. Historically these have been prose documents, with all the problems that entails, but research over the last ten years has developed rigorous and executable-as-test-oracle specifications of mainstream architecture instruction sets and “user-mode” concurrency, clarifying architectures and bringing them into the scope of programming-language semantics and verification. However, the system semantics, of instruction-fetch and cache maintenance, exceptions and interrupts, and address translation, remains obscure, leaving us without a solid foundation for verification of security-critical systems software.In this paper we establish a robust model for one aspect of system semantics: instruction fetch and cache maintenance for ARMv8-A. Systems code relies on executing instructions that were written by data writes, e.g. in program loading, dynamic linking, JIT compilation, debugging, and OS configuration, but hardware implementations are often highly optimised, e.g. with instruction caches, linefill buffers, out-of-order fetching, branch prediction, and instruction prefetching, which can affect programmer-observable behaviour. It is essential, both for programming and verification, to abstract from such microarchitectural details as much as possible, but no more. We explore the key architecture design questions with a series of examples, discussed in detail with senior Arm staff; capture the architectural intent in operational and axiomatic semantic models, extending previous work on “user-mode” concurrency; make these models executable as test oracles for small examples; and experimentally validate them against hardware behaviour (finding a bug in one hardware device). We thereby bring these subtle issues into the mathematical domain, clarifying the architecture and enabling future work on system software verification.

Author(s):  
Alasdair Armstrong ◽  
Brian Campbell ◽  
Ben Simner ◽  
Christopher Pulte ◽  
Peter Sewell

AbstractArchitecture specifications such as Armv8-A and RISC-V are the ultimate foundation for software verification and the correctness criteria for hardware verification. They should define the allowed sequential and relaxed-memory concurrency behaviour of programs, but hitherto there has been no integration of full-scale instruction-set architecture (ISA) semantics with axiomatic concurrency models, either in mathematics or in tools. These ISA semantics can be surprisingly large and intricate, e.g. 100k+ lines for Armv8-A.   In this paper we present a tool, Isla, for computing the allowed behaviours of concurrent litmus tests with respect to full-scale ISA definitions, in Sail, and arbitrary axiomatic relaxed-memory concurrency models, in the Cat language. It is based on a generic symbolic engine for Sail ISA specifications, which should be valuable also for other verification tasks. We equip the tool with a web interface to make it widely accessible, and illustrate and evaluate it for Armv8-A and RISC-V.   By using full-scale and authoritative ISA semantics, this lets one evaluate litmus tests using arbitrary user instructions with high confidence. Moreover, because these ISA specifications give detailed and validated definitions of the sequential aspects of systems functionality, as used by hypervisors and operating systems, e.g. instruction fetch, exceptions, and address translation, our tool provides a basis for developing concurrency semantics for these. We demonstrate this for the Armv8-A instruction-fetch model and self-modifying code examples of Simner et al.


2011 ◽  
Vol 20 (03) ◽  
pp. 349-373 ◽  
Author(s):  
NADIA NEDJAH ◽  
RODRIGO MARTINS DA SILVA ◽  
LUIZA DE MACEDO MOURELLE

There are several possible implementations of artificial neural network that are based either on software or hardware systems. Software implementations are rather inefficient due to the fact that the intrinsic parallelism of the underlying computation is usually not taken advantage of in a mono-processor kind of computing system. Existing hardware implementations of ANNs are efficient as the dedicated datapath used is optimized and the hardware is usually parallel. Hardware implementations of ANNs may be either digital, analog, or even hybrid. Digital implementations of ANNs tend to be of high complexity, thus of high cost, and somehow imprecise due to the use of lookup table for the activation function. On the other hand, analog implementation of ANNs are generally very simple and much more precise. In this paper, we focus on possible analog implementations of ANNs. The neuron is based on a simple operational amplifier. The reviewed implementations allow for the use of both negative and positive synaptic weights. An alternative implementation permits the realization of the training process.


2021 ◽  
Vol 14 (2) ◽  
pp. 1-18
Author(s):  
Shenghsun Cho ◽  
Mrunal Patel ◽  
Michael Ferdman ◽  
Peter Milder

Software verification is an important stage of the software development process, particularly for mission-critical systems. As the traditional methodology of using unit tests falls short of verifying complex software, developers are increasingly relying on formal verification methods, such as explicit state model checking, to automatically verify that the software functions properly. However, due to the ever-increasing complexity of software designs, model checking cannot be performed in a reasonable amount of time when running on general-purpose cores, leading to the exploration of hardware-accelerated model checking. FPGAs have been demonstrated to be promising verification accelerators, exhibiting nearly three orders of magnitude speedup over software. Unfortunately, the “FPGA programmability wall,” particularly the long synthesis and place-and-route times, block the general adoption of FPGAs for model checking. To address this problem, we designed a runtime-programmable pipeline specifically for model checkers on FPGAs to minimize the “preparation time” before a model can be checked. Our design of the successor state generator and the state validator modules enables FPGA-acceleration of model checking without incurring the time-consuming FPGA implementation stages, reducing the preparation time before checking a model from hours to less than a minute, while incurring only a 26% execution time overhead compared to model-specific implementations.


Author(s):  
Franck Cassez ◽  
Ralf Huuck ◽  
Gerwin Klein ◽  
Bastian Schlich

2020 ◽  
Vol 10 (7) ◽  
pp. 2464
Author(s):  
Sihyeong Park ◽  
Mi-Young Kwon ◽  
Hoon-Kyu Kim ◽  
Hyungshin Kim

Multicore architecture is applied to contemporary avionics systems to deal with complex tasks. However, multicore architectures can cause interference by contention because the cores share hardware resources. This interference reduces the predictable execution time of safety-critical systems, such as avionics systems. To reduce this interference, methods of separating hardware resources or limiting capacity by core have been proposed. Existing studies have modified kernels to control hardware resources. Additionally, an execution model has been proposed that can reduce interference by adjusting the execution order of tasks without software modification. Avionics systems require several rigorous software verification procedures. Therefore, modifying existing software can be costly and time-consuming. In this work, we propose a method to apply execution models proposed in existing studies without modifying commercial real-time operating systems. We implemented the time-division multiple access (TDMA) and acquisition execution restitution (AER) execution models with pseudo-partition and message queuing on VxWorks 653. Moreover, we propose a multi-TDMA model considering the characteristics of the target hardware. For the interference analysis, we measured the L1 and L2 cache misses and the number of main memory requests. We demonstrated that the interference caused by memory sharing was reduced by at least 60% in the execution model. In particular, multi-TDMA doubled utilization compared to TDMA and also reduced the execution time by 20% compared to the AER model.


Sign in / Sign up

Export Citation Format

Share Document