scholarly journals Specifying and testing GPU workgroup progress models

2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-30
Author(s):  
Tyler Sorensen ◽  
Lucas F. Salvador ◽  
Harmit Raval ◽  
Hugues Evrard ◽  
John Wickerson ◽  
...  

As GPU availability has increased and programming support has matured, a wider variety of applications are being ported to these platforms. Many parallel applications contain fine-grained synchronization idioms; as such, their correct execution depends on a degree of relative forward progress between threads (or thread groups). Unfortunately, many GPU programming specifications (e.g. Vulkan and Metal) say almost nothing about relative forward progress guarantees between workgroups. Although prior work has proposed a spectrum of plausible progress models for GPUs, cross-vendor specifications have yet to commit to any model. This work is a collection of tools and experimental data to aid specification designers when considering forward progress guarantees in programming frameworks. As a foundation, we formalize a small parallel programming language that captures the essence of fine-grained synchronization. We then provide a means of formally specifying a progress model, and develop a termination oracle that decides whether a given program is guaranteed to eventually terminate with respect to a given progress model. Next, we formalize a set of constraints that describe concurrent programs that require forward progress to terminate. This allows us to synthesize a large set of 483 progress litmus tests. Combined with the termination oracle, we can determine the expected status of each litmus test -- i.e. whether it is guaranteed to eventually terminate -- under various progress models. We present a large experimental campaign running the litmus tests across 8 GPUs from 5 different vendors. Our results highlight that GPUs have significantly different termination behaviors under our test suite. Most notably, we find that Apple and ARM GPUs do not support the linear occupancy-bound model, as was hypothesized by prior work.


2015 ◽  
Vol 50 (6) ◽  
pp. 77-87 ◽  
Author(s):  
Ilya Sergey ◽  
Aleksandar Nanevski ◽  
Anindya Banerjee


2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-30
Author(s):  
Son Tuan Vu ◽  
Albert Cohen ◽  
Arnaud De Grandmaison ◽  
Christophe Guillon ◽  
Karine Heydemann

Software protections against side-channel and physical attacks are essential to the development of secure applications. Such protections are meaningful at machine code or micro-architectural level, but they typically do not carry observable semantics at source level. This renders them susceptible to miscompilation, and security engineers embed input/output side-effects to prevent optimizing compilers from altering them. Yet these side-effects are error-prone and compiler-dependent. The current practice involves analyzing the generated machine code to make sure security or privacy properties are still enforced. These side-effects may also be too expensive in fine-grained protections such as control-flow integrity. We introduce observations of the program state that are intrinsic to the correct execution of security protections, along with means to specify and preserve observations across the compilation flow. Such observations complement the input/output semantics-preservation contract of compilers. We introduce an opacification mechanism to preserve and enforce a partial ordering of observations. This approach is compatible with a production compiler and does not incur any modification to its optimization passes. We validate the effectiveness and performance of our approach on a range of benchmarks, expressing the secure compilation of these applications in terms of observations to be made at specific program points.



2021 ◽  
Vol 43 (4) ◽  
pp. 1-134
Author(s):  
Emanuele D’Osualdo ◽  
Julian Sutherland ◽  
Azadeh Farzan ◽  
Philippa Gardner

We present TaDA Live, a concurrent separation logic for reasoning compositionally about the termination of blocking fine-grained concurrent programs. The crucial challenge is how to deal with abstract atomic blocking : that is, abstract atomic operations that have blocking behaviour arising from busy-waiting patterns as found in, for example, fine-grained spin locks. Our fundamental innovation is with the design of abstract specifications that capture this blocking behaviour as liveness assumptions on the environment. We design a logic that can reason about the termination of clients that use such operations without breaking their abstraction boundaries, and the correctness of the implementations of the operations with respect to their abstract specifications. We introduce a novel semantic model using layered subjective obligations to express liveness invariants and a proof system that is sound with respect to the model. The subtlety of our specifications and reasoning is illustrated using several case studies.



2016 ◽  
Vol 72 (3) ◽  
pp. 324-337 ◽  
Author(s):  
A. Janner

Considered is the coarse-grained modeling of icosahedral viruses in terms of a three-dimensional lattice (the digital modeling lattice) selected among the projected points in space of a six-dimensional icosahedral lattice. Backbone atomic positions (Cα's for the residues of the capsid and phosphorus atoms P for the genome nucleotides) are then indexed by their nearest lattice point. This leads to a fine-grained lattice point characterization of the full viral chains in the backbone approximation (denoted as digital modeling). Coarse-grained models then follow by a proper selection of the indexed backbone positions, where for each chain one can choose the desired coarseness. This approach is applied to three viruses, the Satellite tobacco mosaic virus, the bacteriophage MS2 and the Pariacoto virus, on the basis of structural data from the Brookhaven Protein Data Bank. In each case the various stages of the procedure are illustrated for a given coarse-grained model and the corresponding indexed positions are listed. Alternative coarse-grained models have been derived and compared. Comments on related results and approaches, found among the very large set of publications in this field, conclude this article.



1999 ◽  
Vol 09 (02) ◽  
pp. 243-252 ◽  
Author(s):  
O. LARSSON ◽  
M. FEIG ◽  
L. JOHNSSON

We demonstrate good metacomputing efficiency and portability for three typical large-scale parallel applications; one molecular dynamics code and two electromagnetics codes. The codes were developed for distributed memory parallel platforms using Fortran77 or Fortran90 with MPI. The performance measurements were made for a testbed of two IBM SPs connected through the vBNS. No change of the application codes were required for correct execution of the codes on the testbed using the Globus Toolkit for the required metacomputing services. However, we observe that for good performance, it may be necessary for MPI codes to make use of overlapped computation and communication. For such MPI codes, a communications library designed for hierarchical or clustered communication can yield very good metacomputing efficiencies when high-performance networks, such as the vBNS or the Abilene networks, such as the vBNS or the Abilene networks, are used for platform connectivity. We demonstrate this by inserting a thin layer between the MPI application and the MPI libraries, providing some clustering of communications between platforms.



2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Victoria Nyst ◽  
Marta Morgado ◽  
Timothy Mac Hadjah ◽  
Marco Nyarko ◽  
Mariana Martins ◽  
...  

Abstract This article looks at cross-linguistic variation in lexical iconicity, addressing the question of to what extent and how this variation is patterned. More than in spoken languages, iconicity is highly frequent in the lexicons of sign languages. It is also highly complex, in that often multiple motivated components jointly shape an iconic lexeme. Recent typological research on spoken languages finds tentative iconic patterning in a large number of basic lexical items, underlining once again the significance of iconicity for human language. The uncontested and widespread use of iconicity found in the lexicons of sign languages enables us to take typological research into lexical iconicity to the next level. Indeed, previous studies have shown cross-linguistic variation in: a) the use of embodying and handling handshapes in sign languages (mostly of European origin) and b) the frequency of space-based size depiction in African and European sign languages. The two types of variation may be interrelated, as handling handshapes may use space-based size depiction. In this study, we first replicate earlier studies on the distribution of embodying and handling handshapes, this time in a data set consisting of a relatively large set of sign languages (n = 11), most of which are used in Africa. The results confirm significant variation across these sign languages. These findings are then compared to the use of space-based size depiction, revealing that these patterns independently from the distribution of embodying/handling handshapes. We argue that the results call for expanding typological studies on representational strategies in iconic signs beyond the now relatively well studied instrument/manipulation alternation. Fine-grained analyses on a multitude of iconic features in signs are likely to reveal cross-linguistic variation in iconic tendencies in SL lexicons.



2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-28
Author(s):  
Dan Iorga ◽  
Alastair F. Donaldson ◽  
Tyler Sorensen ◽  
John Wickerson

Heterogeneous CPU/FPGA devices, in which a CPU and an FPGA can execute together while sharing memory, are becoming popular in several computing sectors. In this paper, we study the shared-memory semantics of these devices, with a view to providing a firm foundation for reasoning about the programs that run on them. Our focus is on Intel platforms that combine an Intel FPGA with a multicore Xeon CPU. We describe the weak-memory behaviours that are allowed (and observable) on these devices when CPU threads and an FPGA thread access common memory locations in a fine-grained manner through multiple channels. Some of these behaviours are familiar from well-studied CPU and GPU concurrency; others are weaker still. We encode these behaviours in two formal memory models: one operational, one axiomatic. We develop executable implementations of both models, using the CBMC bounded model-checking tool for our operational model and the Alloy modelling language for our axiomatic model. Using these, we cross-check our models against each other via a translator that converts Alloy-generated executions into queries for the CBMC model. We also validate our models against actual hardware by translating 583 Alloy-generated executions into litmus tests that we run on CPU/FPGA devices; when doing this, we avoid the prohibitive cost of synthesising a hardware design per litmus test by creating our own 'litmus-test processor' in hardware. We expect that our models will be useful for low-level programmers, compiler writers, and designers of analysis tools. Indeed, as a demonstration of the utility of our work, we use our operational model to reason about a producer/consumer buffer implemented across the CPU and the FPGA. When the buffer uses insufficient synchronisation -- a situation that our model is able to detect -- we observe that its performance improves at the cost of occasional data corruption.



Sign in / Sign up

Export Citation Format

Share Document