XcalableMP PGAS Programming Language
Latest Publications


TOTAL DOCUMENTS

10
(FIVE YEARS 10)

H-INDEX

0
(FIVE YEARS 0)

Published By Springer Singapore

9789811576829, 9789811576836

Author(s):  
Miwako Tsuji ◽  
Hitoshi Murai ◽  
Taisuke Boku ◽  
Mitsuhisa Sato ◽  
Serge G. Petiton ◽  
...  

AbstractThis chapter describes a multi-SPMD (mSPMD) programming model and a set of software and libraries to support the mSPMD programming model. The mSPMD programming model has been proposed to realize scalable applications on huge and hierarchical systems. It has been evident that simple SPMD programs such as MPI, XMP, or hybrid programs such as OpenMP/MPI cannot exploit the postpeta- or exascale systems efficiently due to the increasing complexity of applications and systems. The mSPMD programming model has been designed to adopt multiple programming models across different architecture levels. Instead of invoking a single parallel program on millions of processor cores, multiple SPMD programs of moderate sizes can be worked together in the mSPMD programming model. As components of the mSPMD programming model, XMP has been supported. Fault-tolerance features, correctness checks, and some numerical libraries’ implementations in the mSPMD programming model have been presented.


Author(s):  
Akihiro Tabuchi ◽  
Hitoshi Murai ◽  
Masahiro Nakao ◽  
Tetsuya Odajima ◽  
Taisuke Boku

AbstractXcalableACC (XACC) is an extension of XcalableMP for accelerated clusters. It is defined as a diagonal integration of XcalableMP and OpenACC, which is another directive-based language designed to program heterogeneous CPU/accelerator systems. XACC has features for handling distributed-memory parallelism, inherited from XMP, offloading tasks to accelerators, inherited from OpenACC, and two additional functions: data/work mapping among multiple accelerators and direct communication between accelerators.


Author(s):  
Keisuke Tsugane ◽  
Taisuke Boku ◽  
Hitoshi Murai ◽  
Mitsuhisa Sato ◽  
William Tang ◽  
...  

AbstractXcalableMP(XMP) supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensional gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming, while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. The performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.


Author(s):  
Atsushi Kubota ◽  
Tomohiro Matsushita ◽  
Naohisa Happo

AbstractX-ray fluorescence holography is a three-dimensional middle range local structure analysis method, which can provide three-dimensional atomic images around specific elements within a radius of a few nanometers. Three-dimensional atomic images are reconstructed by applying discrete Fourier transform (DFT) to hologram data. Presently, it takes long time to process this DFT. In this study, the DFT program is parallelized by using a parallel programming language XcalableMP. The DFT process, whose input is 21 holograms data of 179 × 360 points and output is a three-dimensional atomic image of 1923 points, is executed on PC cluster which consists of 8 nodes of Intel Xeon X5660 processors and 96 cores in total and we confirmed that the parallelized DFT execution is 94 times faster than the sequential execution.


Author(s):  
Hitoshi Sakagami

AbstractIn order to adapt parallel computers to general convenient tools for computational scientists, a high-level and easy-to-use portable parallel programming paradigm is mandatory. XcalableMP, which is proposed by the XcalableMP Specification Working Group, is a directive-based language extension for Fortran and C to easily describe parallelization in programs for distributed memory parallel computers. The Omni XcalableMP compiler, which is provided as a reference XcalableMP compiler, is currently implemented as a source-to-source translator. It converts XcalableMP programs to standard MPI programs, which can be easily compiled by the native Fortran compiler and executed on most of parallel computers. A three-dimensional Eulerian fluid code written in Fortran is parallelized by XcalableMP using two different programming models with the ordinary domain decomposition method, and its performances are measured on the K computer. Programs converted by the Omni XcalableMP compiler prevent native Fortran compiler optimizations and show lower performance than that of hand-coded MPI programs. Finally almost the same performances are obtained by using specific compiler options of the native Fortran compiler in the case of a global-view programming model, but performance degradation is not improved by specifying any native compiler options when the code is parallelized by a local-view programming model.


Author(s):  
Hidetoshi Iwashita ◽  
Masahiro Nakao

AbstractCoarray features have been implemented on the Omni XcalableMP compiler with a source-to-source translator and layered runtime libraries. Three memory allocation methods for coarrays were implemented for the GASNet and MPI-3 communication libraries and the native interface of Fujitsu. For the coarray PUT/GET communication, algorithms using DMA (zero-copy) and buffering were introduced. Important techniques for achieving high performance were the non-blocking PUT communication implemented in the runtime library and the optimization for the GET communication in the translator. Using the ping-pong benchmark and the modified version, the fundamental performance was evaluated and analyzed. The MPI version of the Himeno benchmark was ported to the coarray version and modified for fully using the non-blocking PUT. As a result of the evaluation, the non-blocking coarray version clearly outperformed the original and non-blocking MPI versions.


Author(s):  
Hitoshi Murai ◽  
Masahiro Nakao ◽  
Mitsuhisa Sato

AbstractXcalableMP (XMP) is a directive-based language extension of Fortran and C for distributed-memory parallel computers, and can be classified as a partitioned global address space (PGAS) language. One of the remarkable characteristics of XMP is that it supports both global-view and local-view parallel programming. This chapter describes the programming model and language specification of XMP.


Author(s):  
Mitsuhisa Sato ◽  
Hitoshi Murai ◽  
Masahiro Nakao ◽  
Keisuke Tsugane ◽  
Tesuya Odajima ◽  
...  

AbstractThis chapter presents the XcalableMP on the Fugaku supercomputer, the Japanese flagship supercomputer developed by FLAGSHIP2020 project in RIKEN R-CCS. The porting and the performance evaluation were done as a part of this project, and the XcalableMP is available for the Fugaku users for improving the productivity and performance of parallel programing. The performance of XcalableMP on the Fugaku is enhanced by the manycore processor and a new Tofu-D interconnect. We are now working on the next version, XcalableMP 2.0, for cutting-edge high-performance systems with manycore processors by multithreading and multi-tasking with integrations of PGAS model and synchronization models. We conclude this book with retrospectives and challenges for future PGAS models.


Author(s):  
Masahiro Nakao ◽  
Hitoshi Murai

AbstractThis chapter describes the implementation and performance evaluation of Omni compiler, which is a reference implementation of the compiler for XcalableMP. For performance evaluation, this chapter also presents how to implement the HPC Challenge benchmarks, which is a benchmark suite for an HPC parallel language. The results show that the performance of XMP is comparable to that of MPI in many cases.


Author(s):  
Masahiro Nakao

AbstractThis chapter presents the mixed-language programming with XcalableMP and other programming languages. It is supported by the linkage functions between XcalableMP and MPI library. We also demonstrate how to call XcalableMP program from Python program (M. Nakao et al., Linkage of XcalableMP and Python languages for high productivity on HPC cluster system, Proceedings of Workshops of HPC Asia, No .9, pp.39–47, 2018).


Sign in / Sign up

Export Citation Format

Share Document