A survey of power and energy efficient techniques for high performance numerical linear algebra operations

2014 ◽  
Vol 40 (10) ◽  
pp. 559-573 ◽  
Author(s):  
Li Tan ◽  
Shashank Kothapalli ◽  
Longxiang Chen ◽  
Omar Hussaini ◽  
Ryan Bissiri ◽  
...  
Author(s):  
Jack J. Dongarra ◽  
Iain S. Duff ◽  
Danny C. Sorensen ◽  
Henk A. van der Vorst

2020 ◽  
Vol 5 (52) ◽  
pp. 2260
Author(s):  
Hartwig Anzt ◽  
Terry Cojean ◽  
Yen-Chen Chen ◽  
Goran Flegar ◽  
Fritz Göbel ◽  
...  

2021 ◽  
Vol 21 (2) ◽  
pp. e09
Author(s):  
Federico Favaro ◽  
Ernesto Dufrechou ◽  
Pablo Ezzatti ◽  
Juan Pablo Oliver

The dissemination of multi-core architectures and the later irruption of massively parallel devices, led to a revolution in High-Performance Computing (HPC) platforms in the last decades. As a result, Field-Programmable Gate Arrays (FPGAs) are re-emerging as a versatile and more energy-efficient alternative to other platforms. Traditional FPGA design implies using low-level Hardware Description Languages (HDL) such as VHDL or Verilog, which follow an entirely different programming model than standard software languages, and their use requires specialized knowledge of the underlying hardware. In the last years, manufacturers started to make big efforts to provide High-Level Synthesis (HLS) tools, in order to allow a grater adoption of FPGAs in the HPC community.Our work studies the use of multi-core hardware and different FPGAs to address Numerical Linear Algebra (NLA) kernels such as the general matrix multiplication GEMM and the sparse matrix-vector multiplication SpMV. Specifically, we compare the behavior of fine-tuned kernels in a multi-core CPU processor and HLS implementations on FPGAs. We perform the experimental evaluation of our implementations on a low-end and a cutting-edge FPGA platform, in terms of runtime and energy consumption, and compare the results against the Intel MKL library in CPU.  


2021 ◽  
Vol 47 (2) ◽  
pp. 1-4
Author(s):  
Sarah Osborn

The article by Flegar et al. titled “Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software” presents a novel, practical implementation of an adaptive precision block-Jacobi preconditioner. Performance results using state-of-the-art GPU architectures for the block-Jacobi preconditioner generation and application demonstrate the practical usability of the method, compared to a traditional full-precision block-Jacobi preconditioner. A production-ready implementation is provided in the Ginkgo numerical linear algebra library. In this report, the Ginkgo library is reinstalled and performance results are generated to perform a comparison to the original results when using Ginkgo’s Conjugate Gradient solver with either the full or the adaptive precision block-Jacobi preconditioner for a suite of test problems on an NVIDIA GPU accelerator. After completing this process, the published results are deemed reproducible.


2015 ◽  
Vol 1 (4) ◽  
pp. 1-12
Author(s):  
Chidadala Janardhan ◽  
◽  
Bhagath Pyda ◽  
J. Manohar ◽  
K. V. Ramanaiah ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document