scholarly journals Special Issue on Automatic Application Tuning for HPC Architectures

2014 ◽  
Vol 22 (4) ◽  
pp. 259-260 ◽  
Author(s):  
Siegfried Benkner ◽  
Franz Franchetti ◽  
Hans Michael Gerndt ◽  
Jeffrey K. Hollingsworth

High Performance Computing architectures have become incredibly complex and exploiting their full potential is becoming more and more challenging. As a consequence, automatic performance tuning (autotuning) of HPC applications is of growing interest and many research groups around the world are currently involved. Autotuning is still a rapidly evolving research field with many different approaches being taken. This special issue features selected papers presented at the Dagstuhl seminar on “Automatic Application Tuning for HPC Architectures” in October 2013, which brought together researchers from the areas of autotuning and performance analysis in order to exchange ideas and steer future collaborations.

Author(s):  
Thomas M Evans ◽  
Julia C White

Multiphysics coupling presents a significant challenge in terms of both computational accuracy and performance. Achieving high performance on coupled simulations can be particularly challenging in a high-performance computing context. The US Department of Energy Exascale Computing Project has the mission to prepare mission-relevant applications for the delivery of the exascale computers starting in 2023. Many of these applications require multiphysics coupling, and the implementations must be performant on exascale hardware. In this special issue we feature six articles performing advanced multiphysics coupling that span the computational science domains in the Exascale Computing Project.


2020 ◽  
Vol 92 (1) ◽  
pp. 517-527
Author(s):  
Timothy Clements ◽  
Marine A. Denolle

Abstract We introduce SeisNoise.jl, a library for high-performance ambient seismic noise cross correlation, written entirely in the computing language Julia. Julia is a new language, with syntax and a learning curve similar to MATLAB (see Data and Resources), R, or Python and performance close to Fortran or C. SeisNoise.jl is compatible with high-performance computing resources, using both the central processing unit and the graphic processing unit. SeisNoise.jl is a modular toolbox, giving researchers common tools and data structures to design custom ambient seismic cross-correlation workflows in Julia.


Physics Today ◽  
1993 ◽  
Vol 46 (3) ◽  
pp. 22-22 ◽  
Author(s):  
Steven A. Orszag ◽  
Norman J. Zabusky

Author(s):  
Masaki Iwasawa ◽  
Daisuke Namekata ◽  
Keigo Nitadori ◽  
Kentaro Nomura ◽  
Long Wang ◽  
...  

Abstract We describe algorithms implemented in FDPS (Framework for Developing Particle Simulators) to make efficient use of accelerator hardware such as GPGPUs (general-purpose computing on graphics processing units). We have developed FDPS to make it possible for researchers to develop their own high-performance parallel particle-based simulation programs without spending large amounts of time on parallelization and performance tuning. FDPS provides a high-performance implementation of parallel algorithms for particle-based simulations in a “generic” form, so that researchers can define their own particle data structure and interparticle interaction functions. FDPS compiled with user-supplied data types and interaction functions provides all the necessary functions for parallelization, and researchers can thus write their programs as though they are writing simple non-parallel code. It has previously been possible to use accelerators with FDPS by writing an interaction function that uses the accelerator. However, the efficiency was limited by the latency and bandwidth of communication between the CPU and the accelerator, and also by the mismatch between the available degree of parallelism of the interaction function and that of the hardware parallelism. We have modified the interface of the user-provided interaction functions so that accelerators are more efficiently used. We also implemented new techniques which reduce the amount of work on the CPU side and the amount of communication between CPU and accelerators. We have measured the performance of N-body simulations on a system with an NVIDIA Volta GPGPU using FDPS and the achieved performance is around 27% of the theoretical peak limit. We have constructed a detailed performance model, and found that the current implementation can achieve good performance on systems with much smaller memory and communication bandwidth. Thus, our implementation will be applicable to future generations of accelerator system.


Sign in / Sign up

Export Citation Format

Share Document