CUDA GPU libraries and novel sparse matrix-vector multiplication implementation and performance enhancement in unstructured finite element computations

A serious computational bottle-neck in finite element analysis today is the solution of the underlying system of equations. To alleviate this problem, researchers have proposed the use of graphics programmable units (GPU) for fast iterative solution of such equations. Indeed, researchers have shown that a GPU-implementation of a double-precision sparse-matrix-vector multiplication (that underlies all iterative methods) is approximately an order of magnitude faster than that of an optimized CPU implementation. Unfortunately, fast matrix-vector multiplication alone is insufficient… a good preconditioner is necessary for rapid convergence. Furthermore, most modern preconditioners, such as incomplete Cholesky, are expensive to compute, and cannot be easily ported to the GPU. In this paper, we propose a special class of preconditioners for the analysis of thin structures, such as beams and plates. The proposed preconditioners are developed by combining the multi-grid method, with recently developed dual-representation method for thin structures. It is shown, that these preconditioners are computationally inexpensive, perform better than standard pre-conditioners, and can be easily ported to the GPU.

Download Full-text

Comparison of GPU-Based Parallel Assembly and Assembly-Free Sparse Matrix Vector Multiplication for Finite Element Analysis of Three-Dimensional Structures

Proceedings of the Fifteenth International Conference on Civil, Structural and Environmental Engineering Computing ◽

10.4203/ccp.108.222 ◽

2015 ◽

Cited By ~ 1

Author(s):

A. Akbariyeh ◽

B.H. Dennis ◽

B.P. Wang ◽

K.L. Lawrence

Keyword(s):

Finite Element Analysis ◽

Finite Element ◽

Sparse Matrix ◽

Three Dimensional ◽

Element Analysis ◽

Matrix Vector Multiplication ◽

Parallel Assembly ◽

Matrix Vector

Download Full-text

An efficient sparse matrix-vector multiplication on CUDA-enabled graphic processing units for finite element method simulations

International Journal for Numerical Methods in Engineering ◽

10.1002/nme.5346 ◽

2016 ◽

Vol 110 (1) ◽

pp. 57-78 ◽

Cited By ~ 2

Author(s):

Atakan Altinkaynak

Keyword(s):

Finite Element Method ◽

Finite Element ◽

Sparse Matrix ◽

Graphic Processing Units ◽

Matrix Vector Multiplication ◽

Matrix Vector ◽

Element Method ◽

Graphic Processing

Download Full-text

A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems

International Journal for Numerical Methods in Engineering ◽

10.1002/nme.4865 ◽

2015 ◽

Vol 102 (12) ◽

pp. 1784-1814 ◽

Cited By ~ 15

Author(s):

J. Wong ◽

E. Kuhl ◽

E. Darve

Keyword(s):

Finite Element ◽

Graphics Processing Unit ◽

Sparse Matrix ◽

Processing Unit ◽

Matrix Vector Multiplication ◽

Graphics Processing ◽

Matrix Vector

Download Full-text

Fast sparse matrix-vector multiplication on graphics processing unit for finite element analysis

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems ◽

10.1109/hpcc.2012.193 ◽

2012 ◽

Cited By ~ 11

Author(s):

Abal-Kassim Cheik Ahamed ◽

Frederic Magoules

Keyword(s):

Finite Element Analysis ◽

Finite Element ◽

Graphics Processing Unit ◽

Sparse Matrix ◽

Processing Unit ◽

Element Analysis ◽

Matrix Vector Multiplication ◽

Graphics Processing ◽

Matrix Vector

Download Full-text

Sparse Matrix-Vector Multiplication for Finite Element Method Matrices on FPGAs

2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines ◽

10.1109/fccm.2006.65 ◽

2006 ◽

Cited By ~ 13

Author(s):

Yousef El-Kurdi ◽

Warren Gross ◽

Dennis Giannacopoulos

Keyword(s):

Finite Element Method ◽

Finite Element ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

Matrix Vector ◽

Element Method

Download Full-text

FPGA architecture and implementation of sparse matrix–vector multiplication for the finite element method

Computer Physics Communications ◽

10.1016/j.cpc.2007.11.014 ◽

2008 ◽

Vol 178 (8) ◽

pp. 558-570 ◽

Cited By ~ 16

Author(s):

Yousef Elkurdi ◽

David Fernández ◽

Evgueni Souleimanov ◽

Dennis Giannacopoulos ◽

Warren J. Gross

Keyword(s):

Finite Element Method ◽

Finite Element ◽

Sparse Matrix ◽

The Finite Element Method ◽

Matrix Vector Multiplication ◽

Fpga Architecture ◽

Matrix Vector ◽

Element Method

Download Full-text

Multi-GPU implementation and performance optimization for CSR-based sparse matrix-vector multiplication

2017 3rd IEEE International Conference on Computer and Communications (ICCC) ◽

10.1109/compcomm.2017.8322969 ◽

2017 ◽

Author(s):

Ping Guo ◽

Changjiang Zhang

Keyword(s):

Performance Optimization ◽

Sparse Matrix ◽

Matrix Vector Multiplication ◽

And Performance ◽

Matrix Vector ◽

Gpu Implementation

Download Full-text

Using GPU-Based Computing to Solve Large Sparse Systems of Linear Equations

Volume 2: 31st Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2011-48452 ◽

2011 ◽

Author(s):

Travis J. Carrigan ◽

Jacob Watt ◽

Brian H. Dennis

Keyword(s):

Finite Element ◽

Domain Decomposition ◽

Graphics Processing Units ◽

Sparse Matrix ◽

Low Cost ◽

Linear Equations ◽

Parallel Architecture ◽

General Purpose ◽

Matrix Vector Multiplication ◽

Matrix Vector

Often thought of as tools for image rendering or data visualization, graphics processing units (GPU) are becoming increasingly popular in the areas of scientific computing due to their low cost massively parallel architecture. With the introduction of CUDA C by NVIDIA and CUDA enabled GPUs, the ability to perform general purpose computations without the need to utilize shading languages is now possible. One such application that benefits from the capabilities provided by NVIDIA hardware is computational continuum mechanics (CCM). The need to solve sparse linear systems of equations is common in CCM when partial differential equations are discretized. Often these systems are solved iteratively using domain decomposition among distributed processors working in parallel. In this paper we explore the benefits of using GPUs to improve the performance of sparse matrix operations, more specifically, sparse matrix-vector multiplication. Our approach does not require domain decomposition, so it is simpler than corresponding implementation for distributed memory parallel computers. We demonstrate that for matrices produced from finite element discretizations on unstructured meshes, the performance of the matrix-vector multiplication operation is just under 13 times faster than when run serially on an Intel i5 system. Furthermore, we show that when used in conjunction with the biconjugate gradient stabilized method (BiCGSTAB), a gradient based iterative linear solver, the method is over 13 times faster than the serially executed C equivalent. And lastly, we emphasize the application of such method for solving Poisson’s equation using the Galerkin finite element method, and demonstrate over 10.5 times higher performance on the GPU when compared with the Intel i5 system.

Download Full-text