The Synchronization Power of Coalesced Memory Accesses

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.

Download Full-text

Linear capabilities for fully abstract compilation of separation-logic-verified code

Journal of Functional Programming ◽

10.1017/s0956796821000022 ◽

2021 ◽

Vol 31 ◽

Author(s):

THOMAS VAN STRYDONCK ◽

FRANK PIESSENS ◽

DOMINIQUE DEVRIESE

Keyword(s):

Spatial Separation ◽

Separation Logic ◽

Target Language ◽

Fine Grained ◽

Dynamic Contract ◽

Modular Verification ◽

Source Program ◽

Memory Accesses ◽

Memory Resources ◽

Efficient Memory

Abstract Separation logic is a powerful program logic for the static modular verification of imperative programs. However, dynamic checking of separation logic contracts on the boundaries between verified and untrusted modules is hard because it requires one to enforce (among other things) that outcalls from a verified to an untrusted module do not access memory resources currently owned by the verified module. This paper proposes an approach to dynamic contract checking by relying on support for capabilities, a well-studied form of unforgeable memory pointers that enables fine-grained, efficient memory access control. More specifically, we rely on a form of capabilities called linear capabilities for which the hardware enforces that they cannot be copied. We formalize our approach as a fully abstract compiler from a statically verified source language to an unverified target language with support for linear capabilities. The key insight behind our compiler is that memory resources described by spatial separation logic predicates can be represented at run time by linear capabilities. The compiler is separation-logic-proof-directed: it uses the separation logic proof of the source program to determine how memory accesses in the source program should be compiled to linear capability accesses in the target program. The full abstraction property of the compiler essentially guarantees that compiled verified modules can interact with untrusted target language modules as if they were compiled from verified code as well. This article is an extended version of one that was presented at ICFP 2019 (Van Strydonck et al., 2019).

Download Full-text

GPrimer: a fast GPU-based pipeline for primer design for qPCR experiments

BMC Bioinformatics ◽

10.1186/s12859-021-04133-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jeongmin Bae ◽

Hajin Jeon ◽

Min-Soo Kim

Keyword(s):

Data Structures ◽

Primer Design ◽

Main Memory ◽

Design Tools ◽

Workload Balancing ◽

Qpcr Analysis ◽

Computational Speed ◽

Entire Sequence ◽

Target Sequences ◽

Coalesced Memory

Abstract Background Design of valid high-quality primers is essential for qPCR experiments. MRPrimer is a powerful pipeline based on MapReduce that combines both primer design for target sequences and homology tests on off-target sequences. It takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB. Due to the effectiveness of primers designed by MRPrimer in qPCR analysis, it has been widely used for developing many online design tools and building primer databases. However, the computational speed of MRPrimer is too slow to deal with the sizes of sequence DBs growing exponentially and thus must be improved. Results We develop a fast GPU-based pipeline for primer design (GPrimer) that takes the same input and returns the same output with MRPrimer. MRPrimer consists of a total of seven MapReduce steps, among which two steps are very time-consuming. GPrimer significantly improves the speed of those two steps by exploiting the computational power of GPUs. In particular, it designs data structures for coalesced memory access in GPU and workload balancing among GPU threads and copies the data structures between main memory and GPU memory in a streaming fashion. For human RefSeq DB, GPrimer achieves a speedup of 57 times for the entire steps and a speedup of 557 times for the most time-consuming step using a single machine of 4 GPUs, compared with MRPrimer running on a cluster of six machines. Conclusions We propose a GPU-based pipeline for primer design that takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB at once without an additional step using BLAST-like tools. The software is available at https://github.com/qhtjrmin/GPrimer.git.

Download Full-text