matrix vector multiplication Latest Research Papers

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3466823 ◽

2022 ◽

Vol 15 (2) ◽

pp. 1-33

Author(s):

Mikhail Asiatici ◽

Paolo Ienne

Keyword(s):

Large Scale ◽

Sparse Matrix ◽

Memory Systems ◽

Graph Analytics ◽

Matrix Vector Multiplication ◽

Area Reduction ◽

Cache Line ◽

Speed Up ◽

Memory Accesses ◽

On Chip

Applications such as large-scale sparse linear algebra and graph analytics are challenging to accelerate on FPGAs due to the short irregular memory accesses, resulting in low cache hit rates. Nonblocking caches reduce the bandwidth required by misses by requesting each cache line only once, even when there are multiple misses corresponding to it. However, such reuse mechanism is traditionally implemented using an associative lookup. This limits the number of misses that are considered for reuse to a few tens, at most. In this article, we present an efficient pipeline that can process and store thousands of outstanding misses in cuckoo hash tables in on-chip SRAM with minimal stalls. This brings the same bandwidth advantage as a larger cache for a fraction of the area budget, because outstanding misses do not need a data array, which can significantly speed up irregular memory-bound latency-insensitive applications. In addition, we extend nonblocking caches to generate variable-length bursts to memory, which increases the bandwidth delivered by DRAMs and their controllers. The resulting miss-optimized memory system provides up to 25% speedup with 24× area reduction on 15 large sparse matrix-vector multiplication benchmarks evaluated on an embedded and a datacenter FPGA system.

Sparse Matrix-Vector Multiplication Cache Performance Evaluation and Design Exploration

10.1109/mascots53633.2021.9614301 ◽

2021 ◽

Author(s):

Jianfeng Cui ◽

Kai Lu ◽

Sheng Liu

Keyword(s):

Performance Evaluation ◽

Sparse Matrix ◽

Cache Performance ◽

Design Exploration ◽

Matrix Vector Multiplication ◽

Matrix Vector

Optimized Data Reuse via Reordering for Sparse Matrix-Vector Multiplication on FPGAs

10.1109/iccad51958.2021.9643453 ◽

2021 ◽

Author(s):

Shiqing Li ◽

Di Liu ◽

Weichen Liu

Keyword(s):

Sparse Matrix ◽

Data Reuse ◽

Matrix Vector Multiplication ◽

Matrix Vector

Energy-efficient algebra kernels in FPGA for High Performance Computing

Journal of Computer Science and Technology ◽

10.24215/16666038.21.e09 ◽

2021 ◽

Vol 21 (2) ◽

pp. e09

Author(s):

Federico Favaro ◽

Ernesto Dufrechou ◽

Pablo Ezzatti ◽

Juan Pablo Oliver

Keyword(s):

High Performance Computing ◽

Energy Efficient ◽

High Performance ◽

Programming Model ◽

Sparse Matrix ◽

Matrix Multiplication ◽

Numerical Linear Algebra ◽

Fpga Design ◽

Matrix Vector Multiplication ◽

Performance Computing

The dissemination of multi-core architectures and the later irruption of massively parallel devices, led to a revolution in High-Performance Computing (HPC) platforms in the last decades. As a result, Field-Programmable Gate Arrays (FPGAs) are re-emerging as a versatile and more energy-efficient alternative to other platforms. Traditional FPGA design implies using low-level Hardware Description Languages (HDL) such as VHDL or Verilog, which follow an entirely different programming model than standard software languages, and their use requires specialized knowledge of the underlying hardware. In the last years, manufacturers started to make big efforts to provide High-Level Synthesis (HLS) tools, in order to allow a grater adoption of FPGAs in the HPC community.Our work studies the use of multi-core hardware and different FPGAs to address Numerical Linear Algebra (NLA) kernels such as the general matrix multiplication GEMM and the sparse matrix-vector multiplication SpMV. Specifically, we compare the behavior of fine-tuned kernels in a multi-core CPU processor and HLS implementations on FPGAs. We perform the experimental evaluation of our implementations on a low-end and a cutting-edge FPGA platform, in terms of runtime and energy consumption, and compare the results against the Intel MKL library in CPU.

ReSpar: Reordering Algorithm for ReRAM-based Sparse Matrix-Vector Multiplication Accelerator

10.1109/iccd53106.2021.00050 ◽

2021 ◽

Author(s):

Yi-Jou Hsiao ◽

Chin-Fu Nien ◽

Hsiang-Yun Cheng

Keyword(s):

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector

High Speed and Power Efficient Multiplexer based Matrix Vector Multiplication for LSTM Network

10.1109/vdat53777.2021.9601075 ◽

2021 ◽

Author(s):

Tresa Joseph ◽

T. S. Bindiya

Keyword(s):

High Speed ◽

Power Efficient ◽

Matrix Vector Multiplication ◽

Lstm Network ◽

Matrix Vector

Sequences of Sparse Matrix-Vector Multiplication on Fugaku’s A64FX processors

10.1109/cluster48925.2021.00111 ◽

2021 ◽

Author(s):

Jerome Gurhem ◽

Maxence Vandromme ◽

Miwako Tsuji ◽

Serge G. Petiton ◽

Mitsuhisa Sato

Keyword(s):

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector

PBBFMM3D: A parallel black-box algorithm for kernel matrix-vector multiplication

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2021.04.005 ◽

2021 ◽

Vol 154 ◽

pp. 64-73

Author(s):

Ruoxi Wang ◽

Chao Chen ◽

Jonghyun Lee ◽

Eric Darve

Keyword(s):

Black Box ◽

Kernel Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector

Execution‐Cache‐Memory modeling and performance tuning of sparse matrix‐vector multiplication and Lattice quantum chromodynamics on A64FX

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6512 ◽

2021 ◽

Author(s):

Christie Alappat ◽

Nils Meyer ◽

Jan Laukemann ◽

Thomas Gruber ◽

Georg Hager ◽

...

Keyword(s):

Quantum Chromodynamics ◽

Sparse Matrix ◽

Cache Memory ◽

Performance Tuning ◽

Lattice Quantum Chromodynamics ◽

Matrix Vector Multiplication ◽

Lattice Quantum ◽

And Performance ◽

Matrix Vector ◽

Memory Modeling

Adaptive diagonal sparse matrix-vector multiplication on GPU

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2021.07.007 ◽

2021 ◽

Author(s):

Jiaquan Gao ◽

Yifei Xia ◽

Renjie Yin ◽

Guixia He

Keyword(s):

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector

matrix vector multiplication
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

Sparse Matrix-Vector Multiplication Cache Performance Evaluation and Design Exploration

Optimized Data Reuse via Reordering for Sparse Matrix-Vector Multiplication on FPGAs

Energy-efficient algebra kernels in FPGA for High Performance Computing

ReSpar: Reordering Algorithm for ReRAM-based Sparse Matrix-Vector Multiplication Accelerator

High Speed and Power Efficient Multiplexer based Matrix Vector Multiplication for LSTM Network

Sequences of Sparse Matrix-Vector Multiplication on Fugaku’s A64FX processors

PBBFMM3D: A parallel black-box algorithm for kernel matrix-vector multiplication

Execution‐Cache‐Memory modeling and performance tuning of sparse matrix‐vector multiplication and Lattice quantum chromodynamics on A64FX

Adaptive diagonal sparse matrix-vector multiplication on GPU

Export Citation Format

matrix vector multiplicationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

Sparse Matrix-Vector Multiplication Cache Performance Evaluation and Design Exploration

Optimized Data Reuse via Reordering for Sparse Matrix-Vector Multiplication on FPGAs

Energy-efficient algebra kernels in FPGA for High Performance Computing

ReSpar: Reordering Algorithm for ReRAM-based Sparse Matrix-Vector Multiplication Accelerator

High Speed and Power Efficient Multiplexer based Matrix Vector Multiplication for LSTM Network

Sequences of Sparse Matrix-Vector Multiplication on Fugaku’s A64FX processors

PBBFMM3D: A parallel black-box algorithm for kernel matrix-vector multiplication

Execution‐Cache‐Memory modeling and performance tuning of sparse matrix‐vector multiplication and Lattice quantum chromodynamics on A64FX

Adaptive diagonal sparse matrix-vector multiplication on GPU

matrix vector multiplication
Recently Published Documents