High performance linear algebra package for FORTRAN 90

Author(s):  
Jerzy Waśniewski ◽  
Jack Dongarra
2021 ◽  
Vol 47 (2) ◽  
pp. 1-28
Author(s):  
Goran Flegar ◽  
Hartwig Anzt ◽  
Terry Cojean ◽  
Enrique S. Quintana-Ortí

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.


2014 ◽  
Vol 40 (10) ◽  
pp. 559-573 ◽  
Author(s):  
Li Tan ◽  
Shashank Kothapalli ◽  
Longxiang Chen ◽  
Omar Hussaini ◽  
Ryan Bissiri ◽  
...  

Author(s):  
Emmanuel Agullo ◽  
Cédric Augonnet ◽  
Jack Dongarra ◽  
Hatem Ltaief ◽  
Raymond Namyst ◽  
...  

Author(s):  
Jack J. Dongarra ◽  
Iain S. Duff ◽  
Danny C. Sorensen ◽  
Henk A. van der Vorst

1997 ◽  
Vol 6 (1) ◽  
pp. 3-27 ◽  
Author(s):  
Corinne Ancourt ◽  
Fabien Coelho ◽  
FranÇois Irigoin ◽  
Ronan Keryell

High Performance Fortran (HPF) was developed to support data parallel programming for single-instruction multiple-data (SIMD) and multiple-instruction multiple-data (MIMD) machines with distributed memory. The programmer is provided a familiar uniform logical address space and specifies the data distribution by directives. The compiler then exploits these directives to allocate arrays in the local memories, to assign computations to elementary processors, and to migrate data between processors when required. We show here that linear algebra is a powerful framework to encode HPF directives and to synthesize distributed code with space-efficient array allocation, tight loop bounds, and vectorized communications forINDEPENDENTloops. The generated code includes traditional optimizations such as guard elimination, message vectorization and aggregation, and overlap analysis. The systematic use of an affine framework makes it possible to prove the compilation scheme correct.


2000 ◽  
Vol 5 (1) ◽  
pp. 44-54 ◽  
Author(s):  
J. Dongarra ◽  
J. Waśniewski

LAPACK95 is a set of FORTRAN95 subroutines which interfaces FORTRAN95 with LAPACK. All LAPACK driver subroutines (including expert drivers) and some LAPACK computationals have both generic LAPACK95 interfaces and generic LAPACK77 interfaces. The remaining computationals have only generic LAPACK77 interfaces. In both types of interfaces no distinction is made between single and double precision or between real and complex data types.


Sign in / Sign up

Export Citation Format

Share Document