Toward Performance Portability of Highly Parametrizable TRSM Algorithm Using SYCL

Author(s):  
Thales Sabino ◽  
Mehdi Goli
2007 ◽  
Vol 10 (2) ◽  
pp. 115-126 ◽  
Author(s):  
Weirong Zhu ◽  
Yanwei Niu ◽  
Guang R. Gao

2020 ◽  
Author(s):  
Luca Bertagna ◽  
Oksana Guba ◽  
Mark Taylor ◽  
James Foucar ◽  
Andrew Bradley ◽  
...  

Author(s):  
C. Kessler ◽  
U. Dastgeer ◽  
S. Thibault ◽  
R. Namyst ◽  
A. Richards ◽  
...  

Author(s):  
Alan Gray ◽  
Kevin Stratford

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.


Sign in / Sign up

Export Citation Format

Share Document