scholarly journals Specifying and testing GPU workgroup progress models

2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-30
Author(s):  
Tyler Sorensen ◽  
Lucas F. Salvador ◽  
Harmit Raval ◽  
Hugues Evrard ◽  
John Wickerson ◽  
...  

As GPU availability has increased and programming support has matured, a wider variety of applications are being ported to these platforms. Many parallel applications contain fine-grained synchronization idioms; as such, their correct execution depends on a degree of relative forward progress between threads (or thread groups). Unfortunately, many GPU programming specifications (e.g. Vulkan and Metal) say almost nothing about relative forward progress guarantees between workgroups. Although prior work has proposed a spectrum of plausible progress models for GPUs, cross-vendor specifications have yet to commit to any model. This work is a collection of tools and experimental data to aid specification designers when considering forward progress guarantees in programming frameworks. As a foundation, we formalize a small parallel programming language that captures the essence of fine-grained synchronization. We then provide a means of formally specifying a progress model, and develop a termination oracle that decides whether a given program is guaranteed to eventually terminate with respect to a given progress model. Next, we formalize a set of constraints that describe concurrent programs that require forward progress to terminate. This allows us to synthesize a large set of 483 progress litmus tests. Combined with the termination oracle, we can determine the expected status of each litmus test -- i.e. whether it is guaranteed to eventually terminate -- under various progress models. We present a large experimental campaign running the litmus tests across 8 GPUs from 5 different vendors. Our results highlight that GPUs have significantly different termination behaviors under our test suite. Most notably, we find that Apple and ARM GPUs do not support the linear occupancy-bound model, as was hypothesized by prior work.

2015 ◽  
Vol 50 (6) ◽  
pp. 77-87 ◽  
Author(s):  
Ilya Sergey ◽  
Aleksandar Nanevski ◽  
Anindya Banerjee

2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-30
Author(s):  
Son Tuan Vu ◽  
Albert Cohen ◽  
Arnaud De Grandmaison ◽  
Christophe Guillon ◽  
Karine Heydemann

Software protections against side-channel and physical attacks are essential to the development of secure applications. Such protections are meaningful at machine code or micro-architectural level, but they typically do not carry observable semantics at source level. This renders them susceptible to miscompilation, and security engineers embed input/output side-effects to prevent optimizing compilers from altering them. Yet these side-effects are error-prone and compiler-dependent. The current practice involves analyzing the generated machine code to make sure security or privacy properties are still enforced. These side-effects may also be too expensive in fine-grained protections such as control-flow integrity. We introduce observations of the program state that are intrinsic to the correct execution of security protections, along with means to specify and preserve observations across the compilation flow. Such observations complement the input/output semantics-preservation contract of compilers. We introduce an opacification mechanism to preserve and enforce a partial ordering of observations. This approach is compatible with a production compiler and does not incur any modification to its optimization passes. We validate the effectiveness and performance of our approach on a range of benchmarks, expressing the secure compilation of these applications in terms of observations to be made at specific program points.


2021 ◽  
Vol 43 (4) ◽  
pp. 1-134
Author(s):  
Emanuele D’Osualdo ◽  
Julian Sutherland ◽  
Azadeh Farzan ◽  
Philippa Gardner

We present TaDA Live, a concurrent separation logic for reasoning compositionally about the termination of blocking fine-grained concurrent programs. The crucial challenge is how to deal with abstract atomic blocking : that is, abstract atomic operations that have blocking behaviour arising from busy-waiting patterns as found in, for example, fine-grained spin locks. Our fundamental innovation is with the design of abstract specifications that capture this blocking behaviour as liveness assumptions on the environment. We design a logic that can reason about the termination of clients that use such operations without breaking their abstraction boundaries, and the correctness of the implementations of the operations with respect to their abstract specifications. We introduce a novel semantic model using layered subjective obligations to express liveness invariants and a proof system that is sound with respect to the model. The subtlety of our specifications and reasoning is illustrated using several case studies.


2016 ◽  
Vol 72 (3) ◽  
pp. 324-337 ◽  
Author(s):  
A. Janner

Considered is the coarse-grained modeling of icosahedral viruses in terms of a three-dimensional lattice (the digital modeling lattice) selected among the projected points in space of a six-dimensional icosahedral lattice. Backbone atomic positions (Cα's for the residues of the capsid and phosphorus atoms P for the genome nucleotides) are then indexed by their nearest lattice point. This leads to a fine-grained lattice point characterization of the full viral chains in the backbone approximation (denoted as digital modeling). Coarse-grained models then follow by a proper selection of the indexed backbone positions, where for each chain one can choose the desired coarseness. This approach is applied to three viruses, the Satellite tobacco mosaic virus, the bacteriophage MS2 and the Pariacoto virus, on the basis of structural data from the Brookhaven Protein Data Bank. In each case the various stages of the procedure are illustrated for a given coarse-grained model and the corresponding indexed positions are listed. Alternative coarse-grained models have been derived and compared. Comments on related results and approaches, found among the very large set of publications in this field, conclude this article.


1999 ◽  
Vol 09 (02) ◽  
pp. 243-252 ◽  
Author(s):  
O. LARSSON ◽  
M. FEIG ◽  
L. JOHNSSON

We demonstrate good metacomputing efficiency and portability for three typical large-scale parallel applications; one molecular dynamics code and two electromagnetics codes. The codes were developed for distributed memory parallel platforms using Fortran77 or Fortran90 with MPI. The performance measurements were made for a testbed of two IBM SPs connected through the vBNS. No change of the application codes were required for correct execution of the codes on the testbed using the Globus Toolkit for the required metacomputing services. However, we observe that for good performance, it may be necessary for MPI codes to make use of overlapped computation and communication. For such MPI codes, a communications library designed for hierarchical or clustered communication can yield very good metacomputing efficiencies when high-performance networks, such as the vBNS or the Abilene networks, such as the vBNS or the Abilene networks, are used for platform connectivity. We demonstrate this by inserting a thin layer between the MPI application and the MPI libraries, providing some clustering of communications between platforms.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Victoria Nyst ◽  
Marta Morgado ◽  
Timothy Mac Hadjah ◽  
Marco Nyarko ◽  
Mariana Martins ◽  
...  

Abstract This article looks at cross-linguistic variation in lexical iconicity, addressing the question of to what extent and how this variation is patterned. More than in spoken languages, iconicity is highly frequent in the lexicons of sign languages. It is also highly complex, in that often multiple motivated components jointly shape an iconic lexeme. Recent typological research on spoken languages finds tentative iconic patterning in a large number of basic lexical items, underlining once again the significance of iconicity for human language. The uncontested and widespread use of iconicity found in the lexicons of sign languages enables us to take typological research into lexical iconicity to the next level. Indeed, previous studies have shown cross-linguistic variation in: a) the use of embodying and handling handshapes in sign languages (mostly of European origin) and b) the frequency of space-based size depiction in African and European sign languages. The two types of variation may be interrelated, as handling handshapes may use space-based size depiction. In this study, we first replicate earlier studies on the distribution of embodying and handling handshapes, this time in a data set consisting of a relatively large set of sign languages (n = 11), most of which are used in Africa. The results confirm significant variation across these sign languages. These findings are then compared to the use of space-based size depiction, revealing that these patterns independently from the distribution of embodying/handling handshapes. We argue that the results call for expanding typological studies on representational strategies in iconic signs beyond the now relatively well studied instrument/manipulation alternation. Fine-grained analyses on a multitude of iconic features in signs are likely to reveal cross-linguistic variation in iconic tendencies in SL lexicons.


2020 ◽  
Vol 64 (7) ◽  
pp. 1419-1444
Author(s):  
Peter Chini ◽  
Roland Meyer ◽  
Prakash Saivasan

Abstract We study the fine-grained complexity of Leader Contributor Reachability ($${\textsf {LCR}} $$ LCR ) and Bounded-Stage Reachability ($${\textsf {BSR}} $$ BSR ), two variants of the safety verification problem for shared memory concurrent programs. For both problems, the memory is a single variable over a finite data domain. Our contributions are new verification algorithms and lower bounds. The latter are based on the Exponential Time Hypothesis ($${\textsf {ETH}} $$ ETH ), the problem $${\textsf {Set~Cover}} $$ Set Cover , and cross-compositions. $${\textsf {LCR}} $$ LCR is the question whether a designated leader thread can reach an unsafe state when interacting with a certain number of equal contributor threads. We suggest two parameterizations: (1) By the size of the data domain $${\texttt {D}}$$ D and the size of the leader $${\texttt {L}}$$ L , and (2) by the size of the contributors $${\texttt {C}}$$ C . We present algorithms for both cases. The key techniques are compact witnesses and dynamic programming. The algorithms run in $${\mathcal {O}}^*(({\texttt {L}}\cdot ({\texttt {D}}+1))^{{\texttt {L}}\cdot {\texttt {D}}} \cdot {\texttt {D}}^{{\texttt {D}}})$$ O ∗ ( ( L · ( D + 1 ) ) L · D · D D ) and $${\mathcal {O}}^*(2^{{\texttt {C}}})$$ O ∗ ( 2 C ) time, showing that both parameterizations are fixed-parameter tractable. We complement the upper bounds by (matching) lower bounds based on $${\textsf {ETH}} $$ ETH and $${\textsf {Set~Cover}} $$ Set Cover . Moreover, we prove the absence of polynomial kernels. For $${\textsf {BSR}} $$ BSR , we consider programs involving $${\texttt {t}}$$ t different threads. We restrict the analysis to computations where the write permission changes $${\texttt {s}}$$ s times between the threads. $${\textsf {BSR}} $$ BSR asks whether a given configuration is reachable via such an $${\texttt {s}}$$ s -stage computation. When parameterized by $${\texttt {P}}$$ P , the maximum size of a thread, and $${\texttt {t}}$$ t , the interesting observation is that the problem has a large number of difficult instances. Formally, we show that there is no polynomial kernel, no compression algorithm that reduces the size of the data domain $${\texttt {D}}$$ D or the number of stages $${\texttt {s}}$$ s to a polynomial dependence on $${\texttt {P}}$$ P and $${\texttt {t}}$$ t . This indicates that symbolic methods may be harder to find for this problem.


Sign in / Sign up

Export Citation Format

Share Document