scholarly journals High performance dense linear algebra on a spatially distributed processor

Author(s):  
Jeffrey R. Diamond ◽  
Behnam Robatmili ◽  
Stephen W. Keckler ◽  
Robert van de Geijn ◽  
Kazushige Goto ◽  
...  
2015 ◽  
Vol 85 ◽  
pp. 32-46 ◽  
Author(s):  
Mathieu Faverge ◽  
Julien Herrmann ◽  
Julien Langou ◽  
Bradley Lowery ◽  
Yves Robert ◽  
...  

2012 ◽  
Vol 72 (9) ◽  
pp. 1134-1143 ◽  
Author(s):  
Francisco D. Igual ◽  
Ernie Chan ◽  
Enrique S. Quintana-Ortí ◽  
Gregorio Quintana-Ortí ◽  
Robert A. van de Geijn ◽  
...  

2021 ◽  
Vol 47 (2) ◽  
pp. 1-28
Author(s):  
Goran Flegar ◽  
Hartwig Anzt ◽  
Terry Cojean ◽  
Enrique S. Quintana-Ortí

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.


2010 ◽  
Vol 45 (5) ◽  
pp. 345-346 ◽  
Author(s):  
Aparna Chandramowlishwaran ◽  
Kathleen Knobe ◽  
Richard Vuduc

2002 ◽  
Vol 28 (2) ◽  
pp. 155-185 ◽  
Author(s):  
Olivier Beaumont ◽  
Arnaud Legrand ◽  
Fabrice Rastello ◽  
Yves Robert

2018 ◽  
Vol 106 (11) ◽  
pp. 2040-2055 ◽  
Author(s):  
Jack Dongarra ◽  
Mark Gates ◽  
Jakub Kurzak ◽  
Piotr Luszczek ◽  
Yaohung M. Tsai

2014 ◽  
Vol 40 (10) ◽  
pp. 559-573 ◽  
Author(s):  
Li Tan ◽  
Shashank Kothapalli ◽  
Longxiang Chen ◽  
Omar Hussaini ◽  
Ryan Bissiri ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document