compressed sparse row Latest Research Papers

Graphics processing units (GPUs) have delivered a remarkable performance for a variety of high performance computing (HPC) applications through massive parallelism. One such application is sparse matrix-vector (SpMV) computations, which is central to many scientific, engineering, and other applications including machine learning. No single SpMV storage or computation scheme provides consistent and sufficiently high performance for all matrices due to their varying sparsity patterns. An extensive literature review reveals that the performance of SpMV techniques on GPUs has not been studied in sufficient detail. In this paper, we provide a detailed performance analysis of SpMV performance on GPUs using four notable sparse matrix storage schemes (compressed sparse row (CSR), ELLAPCK (ELL), hybrid ELL/COO (HYB), and compressed sparse row 5 (CSR5)), five performance metrics (execution time, giga floating point operations per second (GFLOPS), achieved occupancy, instructions per warp, and warp execution efficiency), five matrix sparsity features (nnz, anpr, nprvariance, maxnpr, and distavg), and 17 sparse matrices from 10 application domains (chemical simulations, computational fluid dynamics (CFD), electromagnetics, linear programming, economics, etc.). Subsequently, based on the deeper insights gained through the detailed performance analysis, we propose a technique called the heterogeneous CPU–GPU Hybrid (HCGHYB) scheme. It utilizes both the CPU and GPU in parallel and provides better performance over the HYB format by an average speedup of 1.7x. Heterogeneous computing is an important direction for SpMV and other application areas. Moreover, to the best of our knowledge, this is the first work where the SpMV performance on GPUs has been discussed in such depth. We believe that this work on SpMV performance analysis and the heterogeneous scheme will open up many new directions and improvements for the SpMV computing field in the future.

Download Full-text

Modified Compressed Sparse Row Format for Accelerated FPGA-Based Sparse Matrix Multiplication

2020 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas45731.2020.9181266 ◽

2020 ◽

Author(s):

Michail Pligouroudis ◽

Rafael Angel Gutierrez Nuno ◽

Tom Kazmierski

Keyword(s):

Sparse Matrix ◽

Matrix Multiplication ◽

Compressed Sparse Row

Download Full-text

CGAcc: A Compressed Sparse Row Representation-Based BFS Graph Traversal Accelerator on Hybrid Memory Cube

Electronics ◽

10.3390/electronics7110307 ◽

2018 ◽

Vol 7 (11) ◽

pp. 307 ◽

Cited By ~ 1

Author(s):

Cheng Qian ◽

Bruce Childers ◽

Libo Huang ◽

Hui Guo ◽

Zhiying Wang

Keyword(s):

Comprehensive Evaluation ◽

Random Access ◽

Large Data ◽

Main Memory ◽

Graph Representation ◽

Data Movement ◽

Access Latency ◽

Hybrid Memory ◽

Graph Traversal ◽

Compressed Sparse Row

Graph traversal is widely used in map routing, social network analysis, causal discovery and many more applications. Because it is a memory-bound process, graph traversal puts significant pressure on the memory subsystem. Due to poor spatial locality and the increasing size of today’s datasets, graph traversal consumes an ever-larger part of application execution time. One way to mitigate this cost is memory prefetching, which issues requests from the processor to the memory in anticipation of needing certain data. However, traditional prefetching does not work well for graph traversal due to data dependencies, the parallel nature of graphs and the need to move vast amounts of data from memory to the caches. In this paper, we propose a compressed sparse row representation-based graph accelerator on the Hybrid Memory Cube (HMC), called CGAcc. CGAcc combines Compressed Sparse Row (CSR) graph representation with in-memory prefetching and processing to improve the performance of graph traversal. Our approach integrates the prefetching and processing in the logic layer of a 3D stacked Dynamic Random-Access Memory (DRAM) architecture, based on Micron’s HMC. We selected HMC to implement CGAcc because it can provide quite high bandwidth and low access latency. Furthermore, this device has multiple DRAM layers connected to internal logic to control memory access and perform rudimentary computation. Using the CSR representation, CGAcc deploys prefetchers in the HMC to exploit the short transaction latency between the logic and DRAM layers. By doing this, it can also avoid large data movement costs. In the runtime, CGAcc pipelines the prefetching to fetch data from DRAM arrays to improve memory-level parallelism. To further reduce the access latency, several optimized internal caches are also introduced to hold the prefetched data to be Processed In-Memory (PIM). A comprehensive evaluation shows the effectiveness of CGAcc. Experimental results showed that, compared to a conventional HMC main memory equipped with a stream prefetcher, CGAcc achieved an average 3.51× speedup with moderate hardware cost.

Download Full-text

Packed Compressed Sparse Row: A Dynamic Graph Representation

2018 IEEE High Performance extreme Computing Conference (HPEC) ◽

10.1109/hpec.2018.8547566 ◽

2018 ◽

Cited By ~ 2

Author(s):

Brian Wheatman ◽

Helen Xu

Keyword(s):

Graph Representation ◽

Dynamic Graph ◽

Compressed Sparse Row

Download Full-text

Aplicação de células inativas, Compressed Sparse Row e OpenMP na simulação numérica paralelizada de reservatórios de petróleo

Revista Brasileira de Computação Aplicada ◽

10.5335/rbca.v10i2.8055 ◽

2018 ◽

Vol 10 (2) ◽

pp. 64-79

Author(s):

Rafael Machado De Salles ◽

Leonardo Figueira Werneck ◽

Grazione De Souza ◽

Helio Pedro Amaral Souto

Keyword(s):

Compressed Sparse Row

Este trabalho visa, principalmente, à simulação numérica de escoamentos monofásicos de óleo em reservatórios de petróleo do tipo anticlinal. Portanto, uma técnica específica para a representação de células inativas foi desenvolvida. Além disso, a fim de melhorar a eficiência computacional, a interface de programação OpenMP foi utilizada, juntamente com a técnica Compressed Sparse Row, com a finalidade de paralelizar-se o método dos Gradientes Conjugados, empregado na resolução do sistema algébrico de equações oriundo da discretização da Equação da Difusividade Hidráulica (EDH) que governa o escoamento. Testes de sensibilidade, convergência e desempenho foram realizados considerando-se diferentes reservatórios do tipo anticlinal.

Download Full-text

Reordering edges and elements in unstructured meshes to reduce execution time in Finite Element Computations

Nova Scientia ◽

10.21640/ns.v10i20.1317 ◽

2018 ◽

Vol 10 (20) ◽

pp. 263-279

Author(s):

Gerardo Mario Ortigoza Capetillo ◽

Alberto Pedro Lorandi Medina ◽

Alfonso Cuauhtemoc García Reynoso

Keyword(s):

Finite Element ◽

Sparse Matrix ◽

Unstructured Meshes ◽

Finite Element Formulation ◽

Element Formulation ◽

Vector Product ◽

Stiffness Matrices ◽

The Matrix ◽

Compressed Sparse Row ◽

Reduce Execution Time

Reverse Cuthill McKee (RCM) reordering can be applied to either edges or elements of unstructured meshes (triangular/tetrahedral) , in accordance to the respective finite element formulation, to reduce the bandwidth of stiffness matrices . Grid generators are mainly designed for nodal based finite elements. Their output is a list of nodes (2d or 3d) and an array describing element connectivity, be it triangles or tetrahedra. However, for edge-defined finite element formulations a numbering of the edges is required. Observations are reported for Triangle/Tetgen Delaunay grid generators and for the sparse structure of the assembled matrices in both edge- and element-defined formulations. The RCM is a renumbering algorithm traditionally applied to the nodal graph of the mesh. Thus, in order to apply this renumbering to either the edges or the elements of the respective finite element formulation, graphs of the mesh were generated. Significant bandwidth reduction was obtained. This translates to reduction in the execution effort of the sparse-matrix-times-vector product. Compressed Sparse Row format was adopted and the matrix-times-vector product was implemented in an OpenMp parallel routine.

Download Full-text

GPU Implementation of Image Convolution Using Sparse Model with Efficient Storage Format

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2018010104 ◽

2018 ◽

Vol 10 (1) ◽

pp. 54-70

Author(s):

Saira Banu Jamal Mohammed ◽

M. Rajasekhara Babu ◽

Sumithra Sriram

Keyword(s):

Edge Detection ◽

Gpu Computing ◽

Sparse Matrix ◽

Image Smoothing ◽

Matrix Vector Multiplication ◽

Research Fields ◽

Compressed Sparse Row ◽

Storage Format ◽

Csr Format ◽

Gpu Implementation

With the growth of data parallel computing, role of GPU computing in non-graphic applications such as image processing becomes a focus in research fields. Convolution is an integral operation in filtering, smoothing and edge detection. In this article, the process of convolution is realized as a sparse linear system and is solved using Sparse Matrix Vector Multiplication (SpMV). The Compressed Sparse Row (CSR) format of SPMV shows better CPU performance compared to normal convolution. To overcome the stalling of threads for short rows in the GPU implementation of CSR SpMV, a more efficient model is proposed, which uses the Adaptive-Compressed Row Storage (A-CSR) format to implement the same. Using CSR in the convolution process achieves a 1.45x and a 1.159x increase in speed compared to the normal convolution of image smoothing and edge detection operations, respectively. An average speedup of 2.05x is achieved for image smoothing technique and 1.58x for edge detection technique in GPU platform usig adaptive CSR format.

Download Full-text

Avaliação do formato de armazenamento Compressed Sparse Row para resolução de sistemas de equações lineares esparsos

Engevista ◽

10.22409/engevista.v19i4.931 ◽

2017 ◽

Vol 19 (4) ◽

pp. 1095

Author(s):

Gylles Ricardo Ströher ◽

Thays Rolim Mendes ◽

Neyva Maria Lopes Romeiro

Keyword(s):

Compressed Sparse Row

Os esquemas de compressão de matrizes possibilitam armazenar matrizes esparsas em vetores de forma que apenas os elementos não nulos das matrizes são armazenados, provendo assim uma redução significativa do consumo de memória computacional para o armazenamento de matrizes esparsas. Dentre os esquemas existentes, o implementado no desenvolvimento do presente trabalho foi o Compressed Sparse Row (CSR), o qual armazena apenas os elementos não nulos da matriz em três vetores. O esquema CSR foi implementado em associação com três métodos iterativos de resolução de sistemas lineares, Jacob, Gauss-Seidel e Gradiente Conjugado. Os resultados obtidos sinalizam para qual ordem e grau de esparsidade mínimos o esquema CSR se torna vantajoso, em relação à redução do consumo de memória computacional e os resultados também indicam que como as operações com os elementos nulos são suprimidas, o tempo de processamento para a resolução de sistemas lineares esparsos pode ser significativamente reduzido com o esquema de compressão explorado.

Download Full-text

A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

Mathematical Problems in Engineering ◽

10.1155/2016/8471283 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Guixia He ◽

Jiaquan Gao

Keyword(s):

Sparse Matrix ◽

Sparse Matrices ◽

Poor Performance ◽

Test Results ◽

Graphic Processing Units ◽

Multiple Gpus ◽

Matrix Vector Multiplication ◽

Compressed Sparse Row ◽

Access Patterns ◽

Matrix Vector

Sparse matrix-vector multiplication (SpMV) is an important operation in scientific computations. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs), for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing) and CSR-vector (partial coalescing). Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.

Download Full-text

Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU

Mathematical Problems in Engineering ◽

10.1155/2016/4596943 ◽

2016 ◽

Vol 2016 ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Jiaquan Gao ◽

Panpan Qi ◽

Guixia He

Keyword(s):

Iterative Methods ◽

Shared Memory ◽

Eigenvalue Problems ◽

Sparse Matrix ◽

Computational Science ◽

Test Results ◽

Thread Block ◽

Matrix Vector Multiplication ◽

Compressed Sparse Row ◽

Matrix Vector

Sparse matrix-vector multiplication (SpMV) is an important operation in computational science and needs be accelerated because it often represents the dominant cost in many widely used iterative methods and eigenvalue problems. We achieve this objective by proposing a novel SpMV algorithm based on the compressed sparse row (CSR) on the GPU. Our method dynamically assigns different numbers of rows to each thread block and executes different optimization implementations on the basis of the number of rows it involves for each block. The process of accesses to the CSR arrays is fully coalesced, and the GPU’s DRAM bandwidth is efficiently utilized by loading data into the shared memory, which alleviates the bottleneck of many existing CSR-based algorithms (i.e., CSR-scalar and CSR-vector). Test results on C2050 and K20c GPUs show that our method outperforms a perfect-CSR algorithm that inspires our work, the vendor tuned CUSPARSE V6.5 and CUSP V0.5.1, and three popular algorithms clSpMV, CSR5, and CSR-Adaptive.

Download Full-text

compressed sparse row
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)

Modified Compressed Sparse Row Format for Accelerated FPGA-Based Sparse Matrix Multiplication

CGAcc: A Compressed Sparse Row Representation-Based BFS Graph Traversal Accelerator on Hybrid Memory Cube

Packed Compressed Sparse Row: A Dynamic Graph Representation

Aplicação de células inativas, Compressed Sparse Row e OpenMP na simulação numérica paralelizada de reservatórios de petróleo

Reordering edges and elements in unstructured meshes to reduce execution time in Finite Element Computations

GPU Implementation of Image Convolution Using Sparse Model with Efficient Storage Format

Avaliação do formato de armazenamento Compressed Sparse Row para resolução de sistemas de equações lineares esparsos

A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU

Export Citation Format

compressed sparse rowRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)

Modified Compressed Sparse Row Format for Accelerated FPGA-Based Sparse Matrix Multiplication

CGAcc: A Compressed Sparse Row Representation-Based BFS Graph Traversal Accelerator on Hybrid Memory Cube

Packed Compressed Sparse Row: A Dynamic Graph Representation

Aplicação de células inativas, Compressed Sparse Row e OpenMP na simulação numérica paralelizada de reservatórios de petróleo

Reordering edges and elements in unstructured meshes to reduce execution time in Finite Element Computations

GPU Implementation of Image Convolution Using Sparse Model with Efficient Storage Format

Avaliação do formato de armazenamento Compressed Sparse Row para resolução de sistemas de equações lineares esparsos

A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU

compressed sparse row
Recently Published Documents