Sparse matrix–vector multiplication

Special Cases ◽

The Matrix ◽

Matrix Vector ◽

Memory Architectures ◽

Shared Memory Architectures

This chapter introduces irregular algorithms and presents the example of parallel sparse matrix-vector multiplication (SpMV), which is the central operation in iterative linear system solvers. The irregular sparsity pattern of the matrix does not change during the multiplication, which may be repeated many times. This justifies putting a lot of effort into finding a good data distribution. The Mondriaan distribution of a sparse matrix is a useful non-Cartesian distribution that can be found by hypergraph-based partitioning. The Mondriaan package implements such a partitioning and also the newer medium-grain partitioning method. The chapter analyses the special cases of random sparse matrices and Laplacian matrices. It uses performance profiles and geometric means to compare different partitioning methods. Furthermore, it presents the hybrid-BSP model and a hybrid-BSP SpMV, which are aimed at hybrid distributed/shared-memory architectures. The parallel SpMV can be incorporated in applications, ranging from PageRank computation to artificial neural networks.

A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

Mathematical Problems in Engineering ◽

10.1155/2016/8471283 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Guixia He ◽

Jiaquan Gao

Keyword(s):

Sparse Matrix ◽

Sparse Matrices ◽

Poor Performance ◽

Test Results ◽

Graphic Processing Units ◽

Multiple Gpus ◽

Compressed Sparse Row ◽

Access Patterns ◽

Sparse matrix-vector multiplication (SpMV) is an important operation in scientific computations. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs), for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing) and CSR-vector (partial coalescing). Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.

Lower bounds for sparse matrix vector multiplication on hypercubic networks

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.249 ◽

1998 ◽

Vol Vol. 2 ◽

Author(s):

Giovanni Manzini

Keyword(s):

Sparse Matrix ◽

Sparse Matrices ◽

Log P ◽

Probability Measures ◽

Worst Case ◽

Average Case ◽

Local Memory ◽

International Audience ◽

International audience In this paper we consider the problem of computing on a local memory machine the product y = Ax,where A is a random n×n sparse matrix with Θ (n) nonzero elements. To study the average case communication cost of this problem, we introduce four different probability measures on the set of sparse matrices. We prove that on most local memory machines with p processors, this computation requires Ω ((n/p) \log p) time on the average. We prove that the same lower bound also holds, in the worst case, for matrices with only 2n or 3n nonzero elements.

Design Patterns for Sparse-Matrix Computations on Hybrid CPU/GPU Platforms

Scientific Programming ◽

10.1155/2014/469753 ◽

2014 ◽

Vol 22 (1) ◽

pp. 1-19 ◽

Cited By ~ 2

Author(s):

Valeria Cardellini ◽

Salvatore Filippone ◽

Damian W.I. Rouson

Keyword(s):

Software Design ◽

Design Patterns ◽

Sparse Matrix ◽

Sparse Matrices ◽

Double Precision ◽

Scientific Software ◽

Matrix Computations ◽

Software Design Patterns ◽

We apply object-oriented software design patterns to develop code for scientific software involving sparse matrices. Design patterns arise when multiple independent developments produce similar designs which converge onto a generic solution. We demonstrate how to use design patterns to implement an interface for sparse matrix computations on NVIDIA GPUs starting from PSBLAS, an existing sparse matrix library, and from existing sets of GPU kernels for sparse matrices. We also compare the throughput of the PSBLAS sparse matrix–vector multiplication on two platforms exploiting the GPU with that obtained by a CPU-only PSBLAS implementation. Our experiments exhibit encouraging results regarding the comparison between CPU and GPU executions in double precision, obtaining a speedup of up to 35.35 on NVIDIA GTX 285 with respect to AMD Athlon 7750, and up to 10.15 on NVIDIA Tesla C2050 with respect to Intel Xeon X5650.

Linear Transformation Recognition Using Radon Transform

Journal of Mathematical Sciences & Computer Applications ◽

10.5147/jmsca.v1i2.90 ◽

2017 ◽

Vol 1 (2) ◽

pp. 40-47

Author(s):

Fawaz Hjouj

Keyword(s):

Image Processing ◽

Linear Transformation ◽

Radon Transform ◽

General Linear ◽

Regular Functions ◽

Special Cases ◽

Radon Projections ◽

The Matrix ◽

Given two regular functions (images) f and g on R2 where g is formed from f by a general linear transformation, g(x) = f (Ax + b). We present a procedure to determine the transformation ‘parameters’ A and b using Radon projections of f and only two projections of g. We use these projections together with simple facts on matrix vector multiplication to recover the matrix A. The assumptions we have here are: f is nonnegative and A is nonsingular. Commonly used transformations in image processing such as rotation, scaling and others are special cases of our approach.

Developing a Multi-GPU-Enabled Preconditioned GMRES with Inexact Triangular Solves for Block Sparse Matrices

Mathematical Problems in Engineering ◽

10.1155/2021/6804723 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Wenpeng Ma ◽

Yiwen Hu ◽

Wu Yuan ◽

Xiazhen Liu

Keyword(s):

Building Block ◽

Sparse Matrix ◽

Sparse Matrices ◽

Triangular Systems ◽

Direct Technique ◽

Inexact Preconditioning ◽

Gmres Algorithm ◽

Matrix Vector ◽

Preconditioned Gmres

Solving triangular systems is the building block for preconditioned GMRES algorithm. Inexact preconditioning becomes attractive because of the feature of high parallelism on accelerators. In this paper, we propose and implement an iterative, inexact block triangular solve on multi-GPUs based on PETSc’s framework. In addition, by developing a distributed block sparse matrix-vector multiplication procedure and investigating the optimized vector operations, we form the multi-GPU-enabled preconditioned GMRES with the block Jacobi preconditioner. In the implementation, the GPU-Direct technique is employed to avoid host-device memory copies. The preconditioning step used by PETSc’s structure and the cuSPARSE library are also investigated for performance comparisons. The experiments show that the developed GMRES with inexact preconditioning on 8 GPUs can achieve up to 4.4x speedup over the CPU-only implementation with exact preconditioning using 8 MPI processes.

Sparse matrix-vector multiplication on network-on-chip

Advances in Radio Science ◽

10.5194/ars-8-289-2010 ◽

2010 ◽

Vol 8 ◽

pp. 289-294 ◽

Cited By ~ 6

Author(s):

C.-C. Sun ◽

J. Götze ◽

H.-Y. Jheng ◽

S.-J. Ruan

Keyword(s):

Parallel Implementation ◽

Sparse Matrix ◽

Sparse Matrices ◽

Network On Chip ◽

Main Step ◽

Local Data ◽

Data Transfers ◽

On Chip ◽

Abstract. In this paper, we present an idea for performing matrix-vector multiplication by using Network-on-Chip (NoC) architecture. In traditional IC design on-chip communications have been designed with dedicated point-to-point interconnections. Therefore, regular local data transfer is the major concept of many parallel implementations. However, when dealing with the parallel implementation of sparse matrix-vector multiplication (SMVM), which is the main step of all iterative algorithms for solving systems of linear equation, the required data transfers depend on the sparsity structure of the matrix and can be extremely irregular. Using the NoC architecture makes it possible to deal with arbitrary structure of the data transfers; i.e. with the irregular structure of the sparse matrices. So far, we have already implemented the proposed SMVM-NoC architecture with the size 4×4 and 5×5 in IEEE 754 single float point precision using FPGA.

2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) ◽

SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator

10.1109/hpca51647.2021.00055 ◽

2021 ◽

Author(s):

Xinfeng Xie ◽

Zheng Liang ◽

Peng Gu ◽

Abanti Basak ◽

Lei Deng ◽

...

Keyword(s):

Sparse Matrix ◽

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis ◽

Conflict-free symmetric sparse matrix-vector multiplication on multicore architectures

10.1145/3295500.3356148 ◽

2019 ◽

Cited By ~ 1

Author(s):

Athena Elafrou ◽

Georgios Goumas ◽

Nectarios Koziris

Keyword(s):

Sparse Matrix ◽

Multicore Architectures ◽

Selecting optimal SpMV realizations for GPUs via machine learning

The International Journal of High Performance Computing Applications ◽

10.1177/1094342021990738 ◽

2021 ◽

pp. 109434202199073

Author(s):

Ernesto Dufrechou ◽

Pablo Ezzatti ◽

Enrique S Quintana-Ortí

Keyword(s):

Machine Learning ◽

Sparse Matrix ◽

Machine Learning Techniques ◽

Optimal Method ◽

Learning Techniques ◽

General Rules ◽

Machine Learning Approach ◽

The Matrix ◽

Time And Energy ◽

More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumption.

Sparse Matrix-Vector Multiplication on GPGPUs

ACM Transactions on Mathematical Software ◽

10.1145/3017994 ◽

2017 ◽

Vol 43 (4) ◽

pp. 1-49 ◽

Cited By ~ 34

Author(s):

Salvatore Filippone ◽

Valeria Cardellini ◽

Davide Barbieri ◽

Alessandro Fanfarillo

Keyword(s):

Sparse Matrix ◽