scholarly journals The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems

2017 ◽  
Vol 108 ◽  
pp. 495-504 ◽  
Author(s):  
Jack Dongarra ◽  
Sven Hammarling ◽  
Nicholas J. Higham ◽  
Samuel D. Relton ◽  
Pedro Valero-Lara ◽  
...  
Author(s):  
Masahiro Nakao ◽  
Hitoshi Murai ◽  
Hidetoshi Iwashita ◽  
Taisuke Boku ◽  
Mitsuhisa Sato

To improve productivity for developing parallel applications on high performance computing systems, the XcalableMP PGAS language has been proposed. XcalableMP supports both a typical parallelization under the “global-view memory model” which uses directives and a flexible parallelization under the “local-view memory model” which uses coarray features. The goal of the present paper is to clarify XcalableMP’s productivity and performance. To do so, we implement and evaluate the high performance computing challenge benchmark, namely, EP STREAM Triad, High Performance Linpack, Global fast Fourier transform, and RandomAccess on the K computer using up to 16,384 compute nodes and a generic cluster system using up to 128 compute nodes. We found that we could more easily implement the benchmarks using XcalableMP rather than using MPI. Moreover, most of the performance results using XcalableMP were almost the same as those using MPI.


Sign in / Sign up

Export Citation Format

Share Document