Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers

Discretization-Error-Accurate Mixed-Precision Multigrid Solvers

SIAM Journal on Scientific Computing ◽

10.1137/20m1349230 ◽

2021 ◽

pp. S420-S447 ◽

Cited By ~ 1

Author(s):

Rasmus Tamstorf ◽

Joseph Benzaken ◽

Stephen F. McCormick

Keyword(s):

Discretization Error ◽

Multigrid Solvers ◽

Mixed Precision

Download Full-text

Development of O(Nm2) preconditioned multigrid solvers for Euler and Navier-Stokes equations

AIAA Journal ◽

10.2514/3.14467 ◽

2000 ◽

Vol 38 ◽

pp. 717-720

Author(s):

Jack R. Edwards ◽

James L. Thomas

Keyword(s):

Stokes Equations ◽

Navier Stokes ◽

Navier Stokes Equations ◽

Multigrid Solvers

Download Full-text

On the challenges in programming mixed-precision deep neural networks

Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages ◽

10.1145/3394450.3397468 ◽

2020 ◽

Author(s):

Ruizhe Zhao ◽

Wayne Luk ◽

Chao Xiong ◽

Xinyu Niu ◽

Kuen Hung Tsoi

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Mixed Precision

Download Full-text

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

ACM Transactions on Mathematical Software ◽

10.1145/3441850 ◽

2021 ◽

Vol 47 (2) ◽

pp. 1-28

Author(s):

Goran Flegar ◽

Hartwig Anzt ◽

Terry Cojean ◽

Enrique S. Quintana-Ortí

Keyword(s):

Linear Algebra ◽

Graphics Processing Units ◽

High Performance ◽

Numerical Algorithms ◽

Mixed Precision ◽

Before And After ◽

Memory Accesses ◽

Specialized Hardware ◽

The Individual ◽

Graphics Processing

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.

Download Full-text

Rigorous floating-point mixed-precision tuning

Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages - POPL 2017 ◽

10.1145/3009837.3009846 ◽

2017 ◽

Cited By ~ 38

Author(s):

Wei-Fan Chiang ◽

Mark Baranowski ◽

Ian Briggs ◽

Alexey Solovyev ◽

Ganesh Gopalakrishnan ◽

...

Keyword(s):

Floating Point ◽

Mixed Precision

Download Full-text

GRAM

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3441830 ◽

2021 ◽

Vol 18 (2) ◽

pp. 1-24

Author(s):

Nhut-Minh Ho ◽

Himeshi De silva ◽

Weng-Fai Wong

Keyword(s):

Performance Improvement ◽

Trade Off ◽

Accuracy Requirement ◽

Output Error ◽

Fine Grain ◽

Mixed Precision ◽

And Performance ◽

Effective Use

This article presents GRAM (<underline>G</underline>PU-based <underline>R</underline>untime <underline>A</underline>daption for <underline>M</underline>ixed-precision) a framework for the effective use of mixed precision arithmetic for CUDA programs. Our method provides a fine-grain tradeoff between output error and performance. It can create many variants that satisfy different accuracy requirements by assigning different groups of threads to different precision levels adaptively at runtime . To widen the range of applications that can benefit from its approximation, GRAM comes with an optional half-precision approximate math library. Using GRAM, we can trade off precision for any performance improvement of up to 540%, depending on the application and accuracy requirement.

Download Full-text

GPU-FPtuner: Mixed-precision Auto-tuning for Floating-point Applications on GPU

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) ◽

10.1109/hipc50609.2020.00043 ◽

2020 ◽

Author(s):

Ruidong Gu ◽

Michela Becchi

Keyword(s):

Floating Point ◽

Mixed Precision ◽

Auto Tuning

Download Full-text

Increased space-parallelism via time-simultaneous Newton-multigrid methods for nonstationary nonlinear PDE problems

The International Journal of High Performance Computing Applications ◽

10.1177/10943420211001940 ◽

2021 ◽

Vol 35 (3) ◽

pp. 211-225

Author(s):

Jonas Dünnebacke ◽

Stefan Turek ◽

Christoph Lohmann ◽

Andriy Sokolov ◽

Peter Zajac

Keyword(s):

Krylov Subspace ◽

Solution Procedure ◽

Grid Size ◽

Waveform Relaxation ◽

Test Problems ◽

Subspace Method ◽

Multigrid Algorithm ◽

Multigrid Solvers ◽

Large Systems ◽

Linear Pdes

We discuss how “parallel-in-space & simultaneous-in-time” Newton-multigrid approaches can be designed which improve the scaling behavior of the spatial parallelism by reducing the latency costs. The idea is to solve many time steps at once and therefore solving fewer but larger systems. These large systems are reordered and interpreted as a space-only problem leading to multigrid algorithm with semi-coarsening in space and line smoothing in time direction. The smoother is further improved by embedding it as a preconditioner in a Krylov subspace method. As a prototypical application, we concentrate on scalar partial differential equations (PDEs) with up to many thousands of time steps which are discretized in time, resp., space by finite difference, resp., finite element methods. For linear PDEs, the resulting method is closely related to multigrid waveform relaxation and its theoretical framework. In our parabolic test problems the numerical behavior of this multigrid approach is robust w.r.t. the spatial and temporal grid size and the number of simultaneously treated time steps. Moreover, we illustrate how corresponding time-simultaneous fixed-point and Newton-type solvers can be derived for nonlinear nonstationary problems that require the described solution of linearized problems in each outer nonlinear step. As the main result, we are able to generate much larger problem sizes to be treated by a large number of cores so that the combination of the robustly scaling multigrid solvers together with a larger degree of parallelism allows a faster solution procedure for nonstationary problems.

Download Full-text

Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach with PaRSEC

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2021.3084071 ◽

2021 ◽

pp. 1-1

Author(s):

Sameh Abdulah ◽

Qinglei Cao ◽

Yu Pei ◽

George Bosilca ◽

Jack Dongarra ◽

...

Keyword(s):

Geostatistical Modeling ◽

High Productivity ◽

Modeling And Prediction ◽

Mixed Precision

Download Full-text

Feature Map Alignment: Towards Efficient Design of Mixed-precision Quantization Scheme

2019 IEEE Visual Communications and Image Processing (VCIP) ◽

10.1109/vcip47243.2019.8965724 ◽

2019 ◽

Author(s):

Yukun Bao ◽

Yuhui Xu ◽

Hongkai Xiong

Keyword(s):

Quantization Scheme ◽

Efficient Design ◽

Feature Map ◽

Mixed Precision ◽

Map Alignment

Download Full-text