Multi-GPU implementation and performance optimization for CSR-based sparse matrix-vector multiplication

Author(s):  
Ping Guo ◽  
Changjiang Zhang
Author(s):  
Hartwig Anzt ◽  
Moritz Kreutzer ◽  
Eduardo Ponce ◽  
Gregory D Peterson ◽  
Gerhard Wellein ◽  
...  

In this paper, we present an optimized GPU implementation for the induced dimension reduction algorithm. We improve data locality, combine it with an efficient sparse matrix vector kernel, and investigate the potential of overlapping computation with communication as well as the possibility of concurrent kernel execution. A comprehensive performance evaluation is conducted using a suitable performance model. The analysis reveals efficiency of up to 90%, which indicates that the implementation achieves performance close to the theoretically attainable bound.


2017 ◽  
Vol 43 (4) ◽  
pp. 1-49 ◽  
Author(s):  
Salvatore Filippone ◽  
Valeria Cardellini ◽  
Davide Barbieri ◽  
Alessandro Fanfarillo

Sign in / Sign up

Export Citation Format

Share Document