Avaliação do formato de armazenamento Compressed Sparse Row para resolução de sistemas de equações lineares esparsos

Gylles Ricardo Ströher; Thays Rolim Mendes; Neyva Maria Lopes Romeiro

doi:10.22409/engevista.v19i4.931

Avaliação do formato de armazenamento Compressed Sparse Row para resolução de sistemas de equações lineares esparsos

Engevista ◽

10.22409/engevista.v19i4.931 ◽

2017 ◽

Vol 19 (4) ◽

pp. 1095

Author(s):

Gylles Ricardo Ströher ◽

Thays Rolim Mendes ◽

Neyva Maria Lopes Romeiro

Keyword(s):

Compressed Sparse Row

Os esquemas de compressão de matrizes possibilitam armazenar matrizes esparsas em vetores de forma que apenas os elementos não nulos das matrizes são armazenados, provendo assim uma redução significativa do consumo de memória computacional para o armazenamento de matrizes esparsas. Dentre os esquemas existentes, o implementado no desenvolvimento do presente trabalho foi o Compressed Sparse Row (CSR), o qual armazena apenas os elementos não nulos da matriz em três vetores. O esquema CSR foi implementado em associação com três métodos iterativos de resolução de sistemas lineares, Jacob, Gauss-Seidel e Gradiente Conjugado. Os resultados obtidos sinalizam para qual ordem e grau de esparsidade mínimos o esquema CSR se torna vantajoso, em relação à redução do consumo de memória computacional e os resultados também indicam que como as operações com os elementos nulos são suprimidas, o tempo de processamento para a resolução de sistemas lineares esparsos pode ser significativamente reduzido com o esquema de compressão explorado.

Download Full-text

Fast iterative solvers for large compressed-sparse row linear systems on graphics processing unit

Pollack Periodica ◽

10.1556/pollack.10.2015.1.1 ◽

2015 ◽

Vol 10 (1) ◽

pp. 3-18 ◽

Cited By ~ 1

Author(s):

Frédéric Magoulès ◽

Abal-Kassim Cheik Ahamed ◽

Roman Putanowicz

Keyword(s):

Linear Systems ◽

Graphics Processing Unit ◽

Iterative Solvers ◽

Processing Unit ◽

Compressed Sparse Row ◽

Graphics Processing

Download Full-text

Sparse matrix multiplication: The distributed block-compressed sparse row library

Parallel Computing ◽

10.1016/j.parco.2014.03.012 ◽

2014 ◽

Vol 40 (5-6) ◽

pp. 47-58 ◽

Cited By ~ 70

Author(s):

Urban Borštnik ◽

Joost VandeVondele ◽

Valéry Weber ◽

Jürg Hutter

Keyword(s):

Sparse Matrix ◽

Matrix Multiplication ◽

Compressed Sparse Row

Download Full-text

Packed Compressed Sparse Row: A Dynamic Graph Representation

2018 IEEE High Performance extreme Computing Conference (HPEC) ◽

10.1109/hpec.2018.8547566 ◽

2018 ◽

Cited By ~ 2

Author(s):

Brian Wheatman ◽

Helen Xu

Keyword(s):

Graph Representation ◽

Dynamic Graph ◽

Compressed Sparse Row

Download Full-text

A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

Mathematical Problems in Engineering ◽

10.1155/2016/8471283 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Guixia He ◽

Jiaquan Gao

Keyword(s):

Sparse Matrix ◽

Sparse Matrices ◽

Poor Performance ◽

Test Results ◽

Graphic Processing Units ◽

Multiple Gpus ◽

Matrix Vector Multiplication ◽

Compressed Sparse Row ◽

Access Patterns ◽

Matrix Vector

Sparse matrix-vector multiplication (SpMV) is an important operation in scientific computations. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs), for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing) and CSR-vector (partial coalescing). Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.

Download Full-text

Fast Iterative Solvers for Large Compressed-Sparse Row Linear Systems on Graphics Processing Unit

Pollack Periodica ◽

10.1556/pollack.2015.10.1.1 ◽

2015 ◽

Vol 10 (1) ◽

pp. 3-18 ◽

Cited By ~ 3

Author(s):

Frédéric Magoulès ◽

Abal-Kassim Cheik Ahamed ◽

Roman Putanowicz

Keyword(s):

Linear Systems ◽

Graphics Processing Unit ◽

Iterative Solvers ◽

Processing Unit ◽

Compressed Sparse Row ◽

Graphics Processing

Download Full-text

Aplicação de células inativas, Compressed Sparse Row e OpenMP na simulação numérica paralelizada de reservatórios de petróleo

Revista Brasileira de Computação Aplicada ◽

10.5335/rbca.v10i2.8055 ◽

2018 ◽

Vol 10 (2) ◽

pp. 64-79

Author(s):

Rafael Machado De Salles ◽

Leonardo Figueira Werneck ◽

Grazione De Souza ◽

Helio Pedro Amaral Souto

Keyword(s):

Compressed Sparse Row

Este trabalho visa, principalmente, à simulação numérica de escoamentos monofásicos de óleo em reservatórios de petróleo do tipo anticlinal. Portanto, uma técnica específica para a representação de células inativas foi desenvolvida. Além disso, a fim de melhorar a eficiência computacional, a interface de programação OpenMP foi utilizada, juntamente com a técnica Compressed Sparse Row, com a finalidade de paralelizar-se o método dos Gradientes Conjugados, empregado na resolução do sistema algébrico de equações oriundo da discretização da Equação da Difusividade Hidráulica (EDH) que governa o escoamento. Testes de sensibilidade, convergência e desempenho foram realizados considerando-se diferentes reservatórios do tipo anticlinal.

Download Full-text

Optimization of the ILU(0) factorization algorithm with the use of compressed sparse row format

Journal of Mathematical Sciences ◽

10.1007/s10958-013-1297-6 ◽

2013 ◽

Vol 191 (1) ◽

pp. 19-27 ◽

Cited By ~ 2

Author(s):

R. R. Akhunov ◽

S. P. Kuksenko ◽

V. K. Salov ◽

T. R. Gazizov

Keyword(s):

Factorization Algorithm ◽

Compressed Sparse Row

Download Full-text

A multilevel compressed sparse row format for efficient sparse computations on multicore processors

2014 21st International Conference on High Performance Computing (HiPC) ◽

10.1109/hipc.2014.7116882 ◽

2014 ◽

Cited By ~ 2

Author(s):

Humayun Kabir ◽

Joshua Dennis Booth ◽

Padma Raghavan

Keyword(s):

Multicore Processors ◽

Sparse Computations ◽

Compressed Sparse Row

Download Full-text

Implementation Procedures of Parallel Preconditioning with Sparse Matrix Based on FEM

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.166-169.3166 ◽

2012 ◽

Vol 166-169 ◽

pp. 3166-3173

Author(s):

Guo Liang Ji ◽

Yang De Feng ◽

Wen Kai Cui ◽

Liang Gang Lu

Keyword(s):

Stiffness Matrix ◽

Sparse Matrix ◽

Iterative Solvers ◽

Parallel Solvers ◽

Assembly Method ◽

Parallel Preconditioning ◽

Restarted Gmres ◽

Compressed Sparse Row ◽

Storage Format ◽

Global Stiffness Matrix

A technique to assemble global stiffness matrix stored in sparse storage format and two parallel solvers for sparse linear systems based on FEM are presented. The assembly method uses a data structure named associated node at intermediate stages to finally arrive at the Compressed Sparse Row (CSR) format. The associated nodes record the information about the connection of nodes in the mesh. The technique can reduce large memory because it only stores the nonzero elements of the global stiffness matrix. This method is simple and effective. The solvers are Restarted GMRES iterative solvers with Jacobi and sparse appropriate inverse (SPAI) preconditioning, respectively. Some numerical experiments show that the both preconditioners can improve the convergence of the iterative method, and SPAI is more powerful than Jacobi in the sence of reducing the number of iterations and parallel efficiency. Both of the two solvers can be used to solve large sparse linear system.

Download Full-text

CGAcc: A Compressed Sparse Row Representation-Based BFS Graph Traversal Accelerator on Hybrid Memory Cube

Electronics ◽

10.3390/electronics7110307 ◽

2018 ◽

Vol 7 (11) ◽

pp. 307 ◽

Cited By ~ 1

Author(s):

Cheng Qian ◽

Bruce Childers ◽

Libo Huang ◽

Hui Guo ◽

Zhiying Wang

Keyword(s):

Comprehensive Evaluation ◽

Random Access ◽

Large Data ◽

Main Memory ◽

Graph Representation ◽

Data Movement ◽

Access Latency ◽

Hybrid Memory ◽

Graph Traversal ◽

Compressed Sparse Row

Graph traversal is widely used in map routing, social network analysis, causal discovery and many more applications. Because it is a memory-bound process, graph traversal puts significant pressure on the memory subsystem. Due to poor spatial locality and the increasing size of today’s datasets, graph traversal consumes an ever-larger part of application execution time. One way to mitigate this cost is memory prefetching, which issues requests from the processor to the memory in anticipation of needing certain data. However, traditional prefetching does not work well for graph traversal due to data dependencies, the parallel nature of graphs and the need to move vast amounts of data from memory to the caches. In this paper, we propose a compressed sparse row representation-based graph accelerator on the Hybrid Memory Cube (HMC), called CGAcc. CGAcc combines Compressed Sparse Row (CSR) graph representation with in-memory prefetching and processing to improve the performance of graph traversal. Our approach integrates the prefetching and processing in the logic layer of a 3D stacked Dynamic Random-Access Memory (DRAM) architecture, based on Micron’s HMC. We selected HMC to implement CGAcc because it can provide quite high bandwidth and low access latency. Furthermore, this device has multiple DRAM layers connected to internal logic to control memory access and perform rudimentary computation. Using the CSR representation, CGAcc deploys prefetchers in the HMC to exploit the short transaction latency between the logic and DRAM layers. By doing this, it can also avoid large data movement costs. In the runtime, CGAcc pipelines the prefetching to fetch data from DRAM arrays to improve memory-level parallelism. To further reduce the access latency, several optimized internal caches are also introduced to hold the prefetched data to be Processed In-Memory (PIM). A comprehensive evaluation shows the effectiveness of CGAcc. Experimental results showed that, compared to a conventional HMC main memory equipped with a stream prefetcher, CGAcc achieved an average 3.51× speedup with moderate hardware cost.

Download Full-text