Enhancing the Programmability and Performance Portability of GPU Tensor Operations

Author(s):  
Arya Mazaheri ◽  
Johannes Schulte ◽  
Matthew W. Moskewicz ◽  
Felix Wolf ◽  
Ali Jannesari
Author(s):  
C. Kessler ◽  
U. Dastgeer ◽  
S. Thibault ◽  
R. Namyst ◽  
A. Richards ◽  
...  

Author(s):  
Alexey Syschikov ◽  
Boris Sedov ◽  
Konstantin Nedovodeev ◽  
Vera Ivanova

The OpenVX standard has appeared as an answer from the computer vision community to the challenge of accelerating vision applications on embedded heterogeneous platforms. It is designed to leverage the computer vision hardware potential with functional and performance portability. As long as VIPE has a powerful model of computation, it can incorporate various other models. This allows to extend facilities of a language or framework that is based on the model to be incorporated with visual programming support and provide access to the existing performance analysis and deployment tools. The authors present OpenVX integration into the VIPE IDE. VIPE addresses the need to design OpenVX graphs in a natural visual form with automatic generation of a full-fledged program, shielding a programmer from writing a bunch of boilerplate code. To the best of the authors' knowledge, this is the first use of a graphical notation for OpenVX programming. Using VIPE to develop OpenVX programs also enables the performance analysis tools.


2018 ◽  
Vol 175 ◽  
pp. 09006 ◽  
Author(s):  
Peter A. Boyle ◽  
M.A. Clark ◽  
Carleton DeTar ◽  
Meifeng Lin ◽  
Verinder Rana ◽  
...  

One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Just-In-Time compilation. Experimentation and performance on GPUs with a SU(3)×SU(3) streaming test will be reported. We will also report on the challenges of using current OpenMP 4.x for GPU offloading in the same code.


2019 ◽  
Vol 132 ◽  
pp. 383-396 ◽  
Author(s):  
S.V. Adams ◽  
R.W. Ford ◽  
M. Hambley ◽  
J.M. Hobson ◽  
I. Kavčič ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document