On the Memory Wall and Performance of Symmetric Sparse Matrix Vector Multiplications In Different Data Structures on Shared Memory Machines

2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom) ◽

10.1109/uic-atc-scalcom-cbdcom-iop.2015.259 ◽

2015 ◽

Cited By ~ 1

Author(s):

Tongxiang Gu ◽

Xingping Liu ◽

Zeyao Mo ◽

Xiaowen Xu ◽

Shengxin Zhu

Keyword(s):

Shared Memory ◽

Data Structures ◽

Sparse Matrix ◽

Memory Wall ◽

And Performance ◽

Matrix Vector

Download Full-text

BASIC SPARSE MATRIX COMPUTATIONS ON THE CM-5

International Journal of Modern Physics C ◽

10.1142/s0129183193000082 ◽

1993 ◽

Vol 04 (01) ◽

pp. 65-83 ◽

Cited By ~ 4

Author(s):

SERGE PETITON ◽

YOUCEF SAAD ◽

KESHENG WU ◽

WILLIAM FERNG

Keyword(s):

Experimental Study ◽

Data Structures ◽

Sparse Matrix ◽

Sparse Matrices ◽

Matrix Computations ◽

Data Parallel ◽

Matrix Vector

This paper presents a preliminary experimental study of the performance of basic sparse matrix computations on the CM-5. We concentrate on examining various ways of performing general sparse matrix-vector operations and the basic primitives on which these are based. We compare various data structures for storing sparse matrices and their corresponding matrix — vector operations. Both SPMD and Data parallel modes are examined and a comparison of the two modes is made.

Download Full-text

Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016646844 ◽

2016 ◽

Vol 32 (2) ◽

pp. 220-230 ◽

Cited By ~ 3

Author(s):

Hartwig Anzt ◽

Moritz Kreutzer ◽

Eduardo Ponce ◽

Gregory D Peterson ◽

Gerhard Wellein ◽

...

Keyword(s):

Performance Evaluation ◽

Sparse Matrix ◽

Data Locality ◽

Performance Model ◽

Reduction Algorithm ◽

Comprehensive Performance ◽

And Performance ◽

Matrix Vector ◽

Gpu Implementation ◽

Optimization And Performance

In this paper, we present an optimized GPU implementation for the induced dimension reduction algorithm. We improve data locality, combine it with an efficient sparse matrix vector kernel, and investigate the potential of overlapping computation with communication as well as the possibility of concurrent kernel execution. A comprehensive performance evaluation is conducted using a suitable performance model. The analysis reveals efficiency of up to 90%, which indicates that the implementation achieves performance close to the theoretically attainable bound.

Download Full-text

PERFORMANCE EVALUATION OF DOMAIN DECOMPOSITION METHOD WITH SPARSE MATRIX STORAGE SCHEMES IN MODERN SUPERCOMPUTER

International Journal of Computational Methods ◽

10.1142/s0219876213440076 ◽

2014 ◽

Vol 11 (supp01) ◽

pp. 1344007 ◽

Cited By ~ 1

Author(s):

ABUL MUKID MOHAMMAD MUKADDES ◽

MASAO OGINO ◽

RYUJI SHIOYA

Keyword(s):

Performance Evaluation ◽

Domain Decomposition ◽

Data Structures ◽

Decomposition Method ◽

Sparse Matrix ◽

Domain Decomposition Method ◽

Storage Efficiency ◽

Matrix Vector Multiplication ◽

Performance Results ◽

Matrix Vector

The use of proper data structures with corresponding algorithms is critical to achieve good performance in scientific computing. The need of sparse matrix vector multiplication in each iteration of the iterative domain decomposition method has led to implementation of a variety of sparse matrix storage formats. Many storage formats have been presented to represent sparse matrix and integrated in the method. In this paper, the storage efficiency of those sparse matrix storage formats are evaluated and compared. The performance results of sparse matrix vector multiplication used in the domain decomposition method is considered. Based on our experiments in the FX10 supercomputer system, some useful conclusions that can serve as guidelines for the optimization of domain decomposition method are extracted.

Download Full-text

CUDA GPU libraries and novel sparse matrix-vector multiplication - implementation and performance enhancement in unstructured finite element computations

International Journal of Computational Science and Engineering ◽

10.1504/ijcse.2019.104436 ◽

2019 ◽

Vol 20 (4) ◽

pp. 501

Author(s):

Richard Haney ◽

Ram Mohan

Keyword(s):

Finite Element ◽

Performance Enhancement ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

And Performance ◽

Matrix Vector

Download Full-text

Irregular Computations in Fortran – Expression and Implementation Strategies

Scientific Programming ◽

10.1155/1999/607659 ◽

1999 ◽

Vol 7 (3-4) ◽

pp. 313-326 ◽

Cited By ~ 1

Author(s):

Jan F. Prins ◽

Siddhartha Chatterjee ◽

Martin Simons

Keyword(s):

Programming Languages ◽

High Performance ◽

Sparse Matrix ◽

Implementation Strategies ◽

Data Parallel ◽

Nested Data ◽

And Performance ◽

High Performance Computers ◽

Matrix Vector ◽

Irregular Computations

Modern dialects of Fortran enjoy wide use and good support on high‐performance computers as performance‐oriented programming languages. By providing the ability to express nested data parallelism, modern Fortran dialects enable irregular computations to be incorporated into existing applications with minimal rewriting and without sacrificing performance within the regular portions of the application. Since performance of nested data‐parallel computation is unpredictable and often poor using current compilers, we investigatethreadingandflattening, two source‐to‐source transformation techniques that can improve performance and performance stability. For experimental validation of these techniques, we explore nested data‐parallel implementations of the sparse matrix‐vector product and the Barnes–Hut n‐body algorithm by hand‐coding thread‐based (using OpenMP directives) and flattening‐based versions of these algorithms and evaluating their performance on an SGI Origin 2000 and an NEC SX‐4, two shared‐memory machines.

Download Full-text

Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU

Mathematical Problems in Engineering ◽

10.1155/2016/4596943 ◽

2016 ◽

Vol 2016 ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Jiaquan Gao ◽

Panpan Qi ◽

Guixia He

Keyword(s):

Iterative Methods ◽

Shared Memory ◽

Eigenvalue Problems ◽

Sparse Matrix ◽

Computational Science ◽

Test Results ◽

Thread Block ◽

Matrix Vector Multiplication ◽

Compressed Sparse Row ◽

Matrix Vector

Sparse matrix-vector multiplication (SpMV) is an important operation in computational science and needs be accelerated because it often represents the dominant cost in many widely used iterative methods and eigenvalue problems. We achieve this objective by proposing a novel SpMV algorithm based on the compressed sparse row (CSR) on the GPU. Our method dynamically assigns different numbers of rows to each thread block and executes different optimization implementations on the basis of the number of rows it involves for each block. The process of accesses to the CSR arrays is fully coalesced, and the GPU’s DRAM bandwidth is efficiently utilized by loading data into the shared memory, which alleviates the bottleneck of many existing CSR-based algorithms (i.e., CSR-scalar and CSR-vector). Test results on C2050 and K20c GPUs show that our method outperforms a perfect-CSR algorithm that inspires our work, the vendor tuned CUSPARSE V6.5 and CUSP V0.5.1, and three popular algorithms clSpMV, CSR5, and CSR-Adaptive.

Download Full-text

High-Level Strategies for Parallel Shared-Memory Sparse Matrix-Vector Multiplication

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2013.31 ◽

2014 ◽

Vol 25 (1) ◽

pp. 116-125 ◽

Cited By ~ 26

Author(s):

Albert-Jan Nicholas Yzelman ◽

Dirk Roose

Keyword(s):

Shared Memory ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

High Level ◽

Matrix Vector

Download Full-text

CUDA GPU libraries and novel sparse matrix-vector multiplication implementation and performance enhancement in unstructured finite element computations

International Journal of Computational Science and Engineering ◽

10.1504/ijcse.2017.10011618 ◽

2017 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Richard Haney ◽

Ram V. Mohan

Keyword(s):

Finite Element ◽

Performance Enhancement ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

And Performance ◽

Matrix Vector

Download Full-text

Use of hybrid recursive CSR/COO data structures in sparse matrix-vector multiplication

Proceedings of the International Multiconference on Computer Science and Information Technology ◽

10.1109/imcsit.2010.5680039 ◽

2010 ◽

Cited By ~ 6

Author(s):

M Martone ◽

S Filippone ◽

S Tucci ◽

P Gepner ◽

M Paprzycki

Keyword(s):

Data Structures ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector

Download Full-text

Optimization of Block Sparse Matrix-Vector Multiplication on Shared-Memory Parallel Architectures

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) ◽

10.1109/ipdpsw.2016.42 ◽

2016 ◽

Cited By ~ 1

Author(s):

Ryan Eberhardt ◽

Mark Hoemmen

Keyword(s):

Shared Memory ◽

Sparse Matrix ◽

Parallel Architectures ◽

Matrix Vector Multiplication ◽

Matrix Vector

Download Full-text