High performance dense linear algebra on a spatially distributed processor

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming - PPoPP '08 ◽

10.1145/1345206.1345218 ◽

2008 ◽

Author(s):

Jeffrey R. Diamond ◽

Behnam Robatmili ◽

Stephen W. Keckler ◽

Robert van de Geijn ◽

Kazushige Goto ◽

...

Keyword(s):

Linear Algebra ◽

High Performance ◽

Dense Linear Algebra ◽

Spatially Distributed

Download Full-text

Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency

Computer Science - Research and Development ◽

10.1007/s00450-011-0191-z ◽

2011 ◽

Vol 27 (4) ◽

pp. 277-287 ◽

Author(s):

Hatem Ltaief ◽

Piotr Luszczek ◽

Jack Dongarra

Keyword(s):

Energy Efficiency ◽

Linear Algebra ◽

High Performance ◽

Multicore Architectures ◽

Dense Linear Algebra ◽

Power And Energy

Download Full-text

New Generalized Data Structures for Matrices Lead to a Variety of High Performance Dense Linear Algebra Algorithms

Applied Parallel Computing. State of the Art in Scientific Computing - Lecture Notes in Computer Science ◽

10.1007/11558958_2 ◽

2006 ◽

pp. 11-20 ◽

Author(s):

Fred G. Gustavson

Keyword(s):

Linear Algebra ◽

Data Structures ◽

High Performance ◽

Dense Linear Algebra

Download Full-text

Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solvers

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2015.06.007 ◽

2015 ◽

Vol 85 ◽

pp. 32-46 ◽

Author(s):

Mathieu Faverge ◽

Julien Herrmann ◽

Julien Langou ◽

Bradley Lowery ◽

Yves Robert ◽

...

Keyword(s):

Linear Algebra ◽

High Performance ◽

Qr Factorization ◽

Dense Linear Algebra ◽

Factorization Algorithms

Download Full-text

The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2011.10.014 ◽

2012 ◽

Vol 72 (9) ◽

pp. 1134-1143 ◽

Author(s):

Francisco D. Igual ◽

Ernie Chan ◽

Enrique S. Quintana-Ortí ◽

Gregorio Quintana-Ortí ◽

Robert A. van de Geijn ◽

...

Keyword(s):

Linear Algebra ◽

High Performance ◽

Dense Linear Algebra

Download Full-text

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

ACM Transactions on Mathematical Software ◽

10.1145/3441850 ◽

2021 ◽

Vol 47 (2) ◽

pp. 1-28

Author(s):

Goran Flegar ◽

Hartwig Anzt ◽

Terry Cojean ◽

Enrique S. Quintana-Ortí

Keyword(s):

Linear Algebra ◽

Graphics Processing Units ◽

High Performance ◽

Numerical Algorithms ◽

Mixed Precision ◽

Before And After ◽

Memory Accesses ◽

Specialized Hardware ◽

The Individual ◽

Graphics Processing

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.

Download Full-text

Applying the concurrent collections programming model to asynchronous parallel dense linear algebra

ACM SIGPLAN Notices ◽

10.1145/1837853.1693506 ◽

2010 ◽

Vol 45 (5) ◽

pp. 345-346 ◽

Author(s):

Aparna Chandramowlishwaran ◽

Kathleen Knobe ◽

Richard Vuduc

Keyword(s):

Linear Algebra ◽

Programming Model ◽

Dense Linear Algebra ◽

Asynchronous Parallel

Download Full-text

Dense linear algebra kernels on heterogeneous platforms: Redistribution issues

Parallel Computing ◽

10.1016/s0167-8191(01)00134-x ◽

2002 ◽

Vol 28 (2) ◽

pp. 155-185 ◽

Author(s):

Olivier Beaumont ◽

Arnaud Legrand ◽

Fabrice Rastello ◽

Yves Robert

Keyword(s):

Linear Algebra ◽

Heterogeneous Platforms ◽

Dense Linear Algebra

Download Full-text

Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs

ACM Transactions on Mathematical Software ◽

10.1145/3267101 ◽

2019 ◽

Vol 45 (2) ◽

pp. 1-28 ◽

Author(s):

Ali Charara ◽

David Keyes ◽

Hatem Ltaief

Keyword(s):

Linear Algebra ◽

Dense Linear Algebra ◽

Download Full-text

Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators

Proceedings of the IEEE ◽

10.1109/jproc.2018.2868961 ◽

2018 ◽

Vol 106 (11) ◽

pp. 2040-2055 ◽

Author(s):

Jack Dongarra ◽

Mark Gates ◽

Jakub Kurzak ◽

Piotr Luszczek ◽

Yaohung M. Tsai

Keyword(s):

Linear Algebra ◽

Hardware Accelerators ◽

Dense Linear Algebra

Download Full-text

A survey of power and energy efficient techniques for high performance numerical linear algebra operations

Parallel Computing ◽

10.1016/j.parco.2014.09.001 ◽

2014 ◽

Vol 40 (10) ◽

pp. 559-573 ◽

Author(s):

Li Tan ◽

Shashank Kothapalli ◽

Longxiang Chen ◽

Omar Hussaini ◽

Ryan Bissiri ◽

...

Keyword(s):

Linear Algebra ◽

Energy Efficient ◽

High Performance ◽

Numerical Linear Algebra ◽

Power And Energy

Download Full-text