scholarly journals Portable multi-node LQCD Monte Carlo simulations using OpenACC

2018 ◽  
Vol 29 (01) ◽  
pp. 1850010 ◽  
Author(s):  
Claudio Bonati ◽  
Enrico Calore ◽  
Massimo D’Elia ◽  
Michele Mesiti ◽  
Francesco Negro ◽  
...  

This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions, purposely designed to be portable across different computer architectures, including GPUs and commodity CPUs. Portability is achieved using the OpenACC parallel programming model, used to develop a code that can be compiled for several processor architectures. The paper focuses on parallelization on multiple computing nodes using OpenACC to manage parallelism within the node, and OpenMPI to manage parallelism among the nodes. We first discuss the available strategies to be adopted to maximize performances, we then describe selected relevant details of the code, and finally measure the level of performance and scaling-performance that we are able to achieve. The work focuses mainly on GPUs, which offer a significantly high level of performances for this application, but also compares with results measured on other processors.


2015 ◽  
Vol 44 (4) ◽  
pp. 832-866 ◽  
Author(s):  
Ren Li ◽  
Haibo Hu ◽  
Heng Li ◽  
Yunsong Wu ◽  
Jianxi Yang


2021 ◽  
Vol 24 (1) ◽  
pp. 157-183
Author(s):  
Никита Андреевич Катаев

Automation of parallel programming is important at any stage of parallel program development. These stages include profiling of the original program, program transformation, which allows us to achieve higher performance after program parallelization, and, finally, construction and optimization of the parallel program. It is also important to choose a suitable parallel programming model to express parallelism available in a program. On the one hand, the parallel programming model should be capable to map the parallel program to a variety of existing hardware resources. On the other hand, it should simplify the development of the assistant tools and it should allow the user to explore the parallel program the assistant tools generate in a semi-automatic way. The SAPFOR (System FOR Automated Parallelization) system combines various approaches to automation of parallel programming. Moreover, it allows the user to guide the parallelization if necessary. SAPFOR produces parallel programs according to the high-level DVMH parallel programming model which simplify the development of efficient parallel programs for heterogeneous computing clusters. This paper focuses on the approach to semi-automatic parallel programming, which SAPFOR implements. We discuss the architecture of the system and present the interactive subsystem which is useful to guide the SAPFOR through program parallelization. We used the interactive subsystem to parallelize programs from the NAS Parallel Benchmarks in a semi-automatic way. Finally, we compare the performance of manually written parallel programs with programs the SAPFOR system builds.



2018 ◽  
Vol 175 ◽  
pp. 09008
Author(s):  
Claudio Bonati ◽  
Enrico Calore ◽  
Simone Coscetti ◽  
Massimo D’Elia ◽  
Michele Mesiti ◽  
...  

Varying from multi-core CPU processors to many-core GPUs, the present scenario of HPC architectures is extremely heterogeneous. In this context, code portability is increasingly important for easy maintainability of applications; this is relevant in scientific computing where code changes are numerous and frequent. In this talk we present the design and optimization of a state-of-the-art production level LQCD Monte Carlo application, using the OpenACC directives model. OpenACC aims to abstract parallel programming to a descriptive level, where programmers do not need to specify the mapping of the code on the target machine. We describe the OpenACC implementation and show that the same code is able to target different architectures, including state-of-the-art CPUs and GPUs.



2017 ◽  
Vol 28 (05) ◽  
pp. 1750063 ◽  
Author(s):  
Claudio Bonati ◽  
Simone Coscetti ◽  
Massimo D’Elia ◽  
Michele Mesiti ◽  
Francesco Negro ◽  
...  

The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core Graphics Processor Units (GPUs), exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work, we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenAcc, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.



1996 ◽  
Vol 118 (2) ◽  
pp. 388-393 ◽  
Author(s):  
J. Zaworski ◽  
J. R. Welty ◽  
B. J. Palmer ◽  
M. K. Drost

The spatial distribution of light through a rectangular gap bounded by highly reflective, diffuse surfaces was measured and compared with the results of Monte Carlo simulations. Incorporating radiant properties for real surfaces into a Monte Carlo code was seen to be a significant problem; a number of techniques for accomplishing this are discussed. Independent results are reported for measured values of the bidirectional reflectance distribution function over incident polar angles from 0 to 90 deg for a semidiffuse surface treatment (Krylon™ flat white spray paint). The inclusion of this information into a Monte Carlo simulation yielded various levels of agreement with experimental results. The poorest agreement occurred when the incident radiation was at a grazing angle with respect to the surface and the reflectance was nearly specular.



2016 ◽  
Vol 43 ◽  
pp. 95-103 ◽  
Author(s):  
James A. Ross ◽  
David A. Richie ◽  
Song J. Park ◽  
Dale R. Shires


Author(s):  
Françoise Baude ◽  
Guy Vidal-Naquet


Sign in / Sign up

Export Citation Format

Share Document