GPU Accelerated Reconstruction in Compton Scattering Tomography Using Matrix Compression

An acceleration strategy for TV-ADM reconstruction algorithm in Compton scattering tomography (CST) is proposed. By analyzing the sparse characteristic of CST projection matrixes, firstly, the sparse matrix vector CSR format and ELL format are used to store them, which greatly reduce the memory consumption. Then, a Sparse Matrix Vector multiplication (SpMV) method is utilized to accelerate the projector and back projector process. Finally, based on the parallel features, the TV-ADM is computed with Graphics Processing Unit (GPU). Numerical experiments show that the TV-ADM with the presented acceleration strategy could achieve a 96 times speedup ratio and 224 times memory compression ratio without precision loss.

Download Full-text

A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems

International Journal for Numerical Methods in Engineering ◽

10.1002/nme.4865 ◽

2015 ◽

Vol 102 (12) ◽

pp. 1784-1814 ◽

Cited By ~ 15

Author(s):

J. Wong ◽

E. Kuhl ◽

E. Darve

Keyword(s):

Finite Element ◽

Graphics Processing Unit ◽

Sparse Matrix ◽

Processing Unit ◽

Matrix Vector Multiplication ◽

Graphics Processing ◽

Matrix Vector

Download Full-text

Fast sparse matrix-vector multiplication on graphics processing unit for finite element analysis

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems ◽

10.1109/hpcc.2012.193 ◽

2012 ◽

Cited By ~ 11

Author(s):

Abal-Kassim Cheik Ahamed ◽

Frederic Magoules

Keyword(s):

Finite Element Analysis ◽

Finite Element ◽

Graphics Processing Unit ◽

Sparse Matrix ◽

Processing Unit ◽

Element Analysis ◽

Matrix Vector Multiplication ◽

Graphics Processing ◽

Matrix Vector

Download Full-text

Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.2896 ◽

2012 ◽

Vol 25 (4) ◽

pp. 586-603 ◽

Cited By ~ 4

Author(s):

Bertil Schmidt ◽

Hans Aribowo ◽

Hoang-Vu Dang

Keyword(s):

Graphics Processing Unit ◽

Sparse Matrix ◽

Processing Unit ◽

Matrix Vector Multiplication ◽

Graphics Processing ◽

Matrix Vector

Download Full-text

A novel multi-graphics processing unit parallel optimization framework for the sparse matrix-vector multiplication

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.3936 ◽

2016 ◽

Vol 29 (5) ◽

pp. e3936 ◽

Cited By ~ 10

Author(s):

Jiaquan Gao ◽

Yu Wang ◽

Jun Wang

Keyword(s):

Graphics Processing Unit ◽

Sparse Matrix ◽

Parallel Optimization ◽

Processing Unit ◽

Optimization Framework ◽

Matrix Vector Multiplication ◽

Graphics Processing ◽

Matrix Vector

Download Full-text

A new diagonal storage for efficient implementation of sparse matrix–vector multiplication on graphics processing unit

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6230 ◽

2021 ◽

Author(s):

Guixia He ◽

Qi Chen ◽

Jiaquan Gao

Keyword(s):

Graphics Processing Unit ◽

Sparse Matrix ◽

Efficient Implementation ◽

Processing Unit ◽

Matrix Vector Multiplication ◽

Graphics Processing ◽

Matrix Vector

Download Full-text

Parallel computations of the step response of a floor heater with the use of a graphics processing unit. Part 2: results and their evaluation

Bulletin of the Polish Academy of Sciences Technical Sciences ◽

10.2478/bpasts-2013-0102 ◽

2013 ◽

Vol 61 (4) ◽

pp. 949-954 ◽

Cited By ~ 1

Author(s):

J. Gołębiowski ◽

J. Forenc

Keyword(s):

Graphics Processing Unit ◽

Sparse Matrix ◽

Temporal Distribution ◽

Step Response ◽

Processing Unit ◽

Commercial Program ◽

Speed Up ◽

Spatio Temporal ◽

Graphics Processing ◽

Linear Systems Of Equations

Abstract Using models and algorithms presented in the first part of the article, a spatio-temporal distribution of the step response of a floor heater was determined. The results have been presented in the form of heating curves and temperature profiles of the heater in the selected time moments. The computations results were verified through comparing them with the solution obtained with the use of a commercial program - NISA. Additionally, the distribution of the average time constant of thermal processes occurring in the heater was determined. The analysis of the use of a graphics processing unit in numerical computations based on the conjugate gradient method was done. It was proved that the use of a graphics processing unit is profitable in the case of solving linear systems of equations with dense coefficient matrices. In the case of a sparse matrix, the speed-up depends on the number of its non-zero elements.

Download Full-text

GPU-Accelerated Parallel FDTD on Distributed Heterogeneous Platform

International Journal of Antennas and Propagation ◽

10.1155/2014/321081 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Ronglin Jiang ◽

Shugang Jiang ◽

Yu Zhang ◽

Ying Xu ◽

Lei Xu ◽

...

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Problem Size ◽

Central Processing ◽

Execution Speed ◽

Speedup Ratio ◽

Electromagnetic Calculations ◽

Graphics Processing

This paper introduces a (finite difference time domain) FDTD code written in Fortran and CUDA for realistic electromagnetic calculations with parallelization methods of Message Passing Interface (MPI) and Open Multiprocessing (OpenMP). Since both Central Processing Unit (CPU) and Graphics Processing Unit (GPU) resources are utilized, a faster execution speed can be reached compared to a traditional pure GPU code. In our experiments, 64 NVIDIA TESLA K20m GPUs and 64 INTEL XEON E5-2670 CPUs are used to carry out the pure CPU, pure GPU, and CPU + GPU tests. Relative to the pure CPU calculations for the same problems, the speedup ratio achieved by CPU + GPU calculations is around 14. Compared to the pure GPU calculations for the same problems, the CPU + GPU calculations have 7.6%–13.2% performance improvement. Because of the small memory size of GPUs, the FDTD problem size is usually very small. However, this code can enlarge the maximum problem size by 25% without reducing the performance of traditional pure GPU code. Finally, using this code, a microstrip antenna array with16×18elements is calculated and the radiation patterns are compared with the ones of MoM. Results show that there is a well agreement between them.

Download Full-text