Avaliação do formato de armazenamento Compressed Sparse Row para resolução de sistemas de equações lineares esparsos

Engevista ◽  
2017 ◽  
Vol 19 (4) ◽  
pp. 1095
Author(s):  
Gylles Ricardo Ströher ◽  
Thays Rolim Mendes ◽  
Neyva Maria Lopes Romeiro

Os esquemas de compressão de matrizes possibilitam armazenar matrizes esparsas em vetores de forma que apenas os elementos não nulos das matrizes são armazenados, provendo assim uma redução significativa do consumo de memória computacional para o armazenamento de matrizes esparsas. Dentre os esquemas existentes, o implementado no desenvolvimento do presente trabalho foi o Compressed Sparse Row (CSR), o qual armazena apenas os elementos não nulos da matriz em três vetores. O esquema CSR foi implementado em associação com três métodos iterativos de resolução de sistemas lineares, Jacob, Gauss-Seidel e Gradiente Conjugado. Os resultados obtidos sinalizam para qual ordem e grau de esparsidade mínimos o esquema CSR se torna vantajoso, em relação à redução do consumo de memória computacional e os resultados também indicam que como as operações com os elementos nulos são suprimidas, o tempo de processamento para a resolução de sistemas lineares esparsos pode ser significativamente reduzido com o esquema de compressão explorado.

2014 ◽  
Vol 40 (5-6) ◽  
pp. 47-58 ◽  
Author(s):  
Urban Borštnik ◽  
Joost VandeVondele ◽  
Valéry Weber ◽  
Jürg Hutter

2016 ◽  
Vol 2016 ◽  
pp. 1-12 ◽  
Author(s):  
Guixia He ◽  
Jiaquan Gao

Sparse matrix-vector multiplication (SpMV) is an important operation in scientific computations. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs), for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing) and CSR-vector (partial coalescing). Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.


2018 ◽  
Vol 10 (2) ◽  
pp. 64-79
Author(s):  
Rafael Machado De Salles ◽  
Leonardo Figueira Werneck ◽  
Grazione De Souza ◽  
Helio Pedro Amaral Souto

Este trabalho visa, principalmente, à simulação numérica de escoamentos monofásicos de óleo em reservatórios de petróleo do tipo anticlinal. Portanto, uma técnica específica para a representação de células inativas foi desenvolvida. Além disso, a fim de melhorar a eficiência computacional, a interface de programação OpenMP foi utilizada, juntamente com a técnica Compressed Sparse Row, com a finalidade de paralelizar-se o método dos Gradientes Conjugados, empregado na resolução do sistema algébrico de equações oriundo da discretização da Equação da Difusividade Hidráulica (EDH) que governa o escoamento. Testes de sensibilidade, convergência e desempenho foram realizados considerando-se diferentes reservatórios do tipo anticlinal.


2013 ◽  
Vol 191 (1) ◽  
pp. 19-27 ◽  
Author(s):  
R. R. Akhunov ◽  
S. P. Kuksenko ◽  
V. K. Salov ◽  
T. R. Gazizov

2012 ◽  
Vol 166-169 ◽  
pp. 3166-3173
Author(s):  
Guo Liang Ji ◽  
Yang De Feng ◽  
Wen Kai Cui ◽  
Liang Gang Lu

A technique to assemble global stiffness matrix stored in sparse storage format and two parallel solvers for sparse linear systems based on FEM are presented. The assembly method uses a data structure named associated node at intermediate stages to finally arrive at the Compressed Sparse Row (CSR) format. The associated nodes record the information about the connection of nodes in the mesh. The technique can reduce large memory because it only stores the nonzero elements of the global stiffness matrix. This method is simple and effective. The solvers are Restarted GMRES iterative solvers with Jacobi and sparse appropriate inverse (SPAI) preconditioning, respectively. Some numerical experiments show that the both preconditioners can improve the convergence of the iterative method, and SPAI is more powerful than Jacobi in the sence of reducing the number of iterations and parallel efficiency. Both of the two solvers can be used to solve large sparse linear system.


Electronics ◽  
2018 ◽  
Vol 7 (11) ◽  
pp. 307 ◽  
Author(s):  
Cheng Qian ◽  
Bruce Childers ◽  
Libo Huang ◽  
Hui Guo ◽  
Zhiying Wang

Graph traversal is widely used in map routing, social network analysis, causal discovery and many more applications. Because it is a memory-bound process, graph traversal puts significant pressure on the memory subsystem. Due to poor spatial locality and the increasing size of today’s datasets, graph traversal consumes an ever-larger part of application execution time. One way to mitigate this cost is memory prefetching, which issues requests from the processor to the memory in anticipation of needing certain data. However, traditional prefetching does not work well for graph traversal due to data dependencies, the parallel nature of graphs and the need to move vast amounts of data from memory to the caches. In this paper, we propose a compressed sparse row representation-based graph accelerator on the Hybrid Memory Cube (HMC), called CGAcc. CGAcc combines Compressed Sparse Row (CSR) graph representation with in-memory prefetching and processing to improve the performance of graph traversal. Our approach integrates the prefetching and processing in the logic layer of a 3D stacked Dynamic Random-Access Memory (DRAM) architecture, based on Micron’s HMC. We selected HMC to implement CGAcc because it can provide quite high bandwidth and low access latency. Furthermore, this device has multiple DRAM layers connected to internal logic to control memory access and perform rudimentary computation. Using the CSR representation, CGAcc deploys prefetchers in the HMC to exploit the short transaction latency between the logic and DRAM layers. By doing this, it can also avoid large data movement costs. In the runtime, CGAcc pipelines the prefetching to fetch data from DRAM arrays to improve memory-level parallelism. To further reduce the access latency, several optimized internal caches are also introduced to hold the prefetched data to be Processed In-Memory (PIM). A comprehensive evaluation shows the effectiveness of CGAcc. Experimental results showed that, compared to a conventional HMC main memory equipped with a stream prefetcher, CGAcc achieved an average 3.51× speedup with moderate hardware cost.


Sign in / Sign up

Export Citation Format

Share Document