hybrid parallelization Latest Research Papers

This paper proposes an optimal strategy to parallelize the solution of large 3D magneto-quasi-static (MQS) problems, by combining the MPI and OpenMP approaches. The studied numerical problem comes from a weak-form integral formulation of a MQS problem and is finally cast in terms of a large linear system to be solved by means of a direct method. For this purpose, two main tasks are identified: the assembly and the inversion of the matrices. The paper focuses on the optimization of the resources required for assembling the matrices, by exploiting the feature of a hybrid OpenMP–MPI approach. Specifically, the job is shared between clusters of nodes in parallel by adopting an OpenMP paradigm at the node level and a MPI one at the process level between nodes. Compared with other solutions, such as pure MPI, this hybrid parallelization optimizes the available resources, with respect to the speed, allocated memory, and the communication between nodes. These advantages are clearly observed in the case studies analyzed in this paper, coming from the study of large plasma fusion machines, such as the fusion reactor ITER. Indeed, the MQS problems associated with such applications are characterized by a huge computational cost that requires parallel computing approaches.

Download Full-text

MagIC v5.10: a two-dimensional message-passing interface (MPI) distribution for pseudo-spectral magnetohydrodynamics simulations in spherical geometry

Geoscientific Model Development ◽

10.5194/gmd-14-7477-2021 ◽

2021 ◽

Vol 14 (12) ◽

pp. 7477-7495

Author(s):

Rafael Lago ◽

Thomas Gastine ◽

Tilman Dannert ◽

Markus Rampp ◽

Johannes Wicht

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Distribution Data ◽

Two Dimensional ◽

Data Layout ◽

Time Step ◽

One Dimensional ◽

Hybrid Parallelization ◽

Dimensional Distribution ◽

Pseudo Spectral

Abstract. We discuss two parallelization schemes for MagIC, an open-source, high-performance, pseudo-spectral code for the numerical solution of the magnetohydrodynamics equations in a rotating spherical shell. MagIC calculates the non-linear terms on a numerical grid in spherical coordinates, while the time step updates are performed on radial grid points with a spherical harmonic representation of the lateral directions. Several transforms are required to switch between the different representations. The established hybrid parallelization of MagIC uses message-passing interface (MPI) distribution in radius and relies on existing fast spherical transforms using OpenMP. Our new two-dimensional MPI decomposition implementation also distributes the latitudes or the azimuthal wavenumbers across the available MPI tasks and compute cores. We discuss several non-trivial algorithmic optimizations and the different data distribution layouts employed by our scheme. In particular, the two-dimensional distribution data layout yields a code that strongly scales well beyond the limit of the current one-dimensional distribution. We also show that the two-dimensional distribution implementation, although not yet fully optimized, can already be faster than the existing finely optimized hybrid parallelization when using many thousands of CPU cores. Our analysis indicates that the two-dimensional distribution variant can be further optimized to also surpass the performance of the one-dimensional distribution for a few thousand cores.

Download Full-text

Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy

10.1109/dac18074.2021.9586300 ◽

2021 ◽

Author(s):

Zihao Zeng ◽

Chubo Liu ◽

Zhuo Tang ◽

Wanli Chang ◽

Kenli Li

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Parallelization Strategy ◽

Hybrid Parallelization

Download Full-text

Optimizing the hybrid parallelization of BHAC

Astronomy and Computing ◽

10.1016/j.ascom.2021.100509 ◽

2021 ◽

pp. 100509

Author(s):

S. Cielo ◽

O. Porth ◽

L. Iapichino ◽

A. Karmakar ◽

H. Olivares ◽

...

Keyword(s):

Hybrid Parallelization

Download Full-text

Alternating directions parallel hybrid memory iGRM direct solver for non-stationary simulations

Computer Science ◽

10.7494/csci.2020.21.4.3834 ◽

2020 ◽

Vol 21 (4) ◽

Author(s):

Maciej Woźniak ◽

Anna Janina Bukowska

Keyword(s):

Isogeometric Analysis ◽

Parallel Machines ◽

Sparse Matrix ◽

Computational Cost ◽

Three Dimensional ◽

Modern Method ◽

Basis Functions ◽

Computational Domain ◽

Hybrid Memory ◽

Hybrid Parallelization

The three-dimensional isogeometric analysis (IGA-FEM) is a modern method for simulation. The idea is to utilize B-splines or NURBS basis functions for both computational domain descriptions and the engineering computations. Refined isogeometric analysis (rIGA) employs a mixture of patches of elements with B-spline basis functions, and $C^0$ separators between them. It enables a reduction of the computational cost of direct solvers. Both IGA and rIGA come with challenging sparse matrix structure, that is expensive to generate. In this paper, we show a hybrid parallelization method to reduce the computational cost of the integration phase using hybrid-memory parallel machines. The two-level parallelization includes the partitioning of the computational mesh into sub-domains on the first level (MPI), and loop parallelization on the second level (OpenMP). We show that hybrid parallelization of the integration reduces the contribution of this phase significantly. Thus, alternative algorithms for fast isogeometric integration are not necessary.

Download Full-text

petar: a high-performance N-body code for modelling massive collisional stellar systems

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa1915 ◽

2020 ◽

Vol 497 (1) ◽

pp. 536-555 ◽

Cited By ~ 1

Author(s):

Long Wang ◽

Masaki Iwasawa ◽

Keigo Nitadori ◽

Junichiro Makino

Keyword(s):

Globular Clusters ◽

High Performance ◽

Computational Time ◽

Stellar Systems ◽

Processing Unit ◽

Desktop Computer ◽

Multiple Systems ◽

Hybrid Parallelization ◽

Good Agreement

ABSTRACT The numerical simulations of massive collisional stellar systems, such as globular clusters (GCs), are very time consuming. Until now, only a few realistic million-body simulations of GCs with a small fraction of binaries ($5{{\ \rm per\ cent}}$) have been performed by using the nbody6++gpu code. Such models took half a year computational time on a Graphic Processing Unit (GPU)-based supercomputer. In this work, we develop a new N-body code, petar, by combining the methods of Barnes–Hut tree, Hermite integrator and slow-down algorithmic regularization. The code can accurately handle an arbitrary fraction of multiple systems (e.g. binaries and triples) while keeping a high performance by using the hybrid parallelization methods with mpi, openmp, simd instructions and GPU. A few benchmarks indicate that petar and nbody6++gpu have a very good agreement on the long-term evolution of the global structure, binary orbits and escapers. On a highly configured GPU desktop computer, the performance of a million-body simulation with all stars in binaries by using petar is 11 times faster than that of nbody6++gpu. Moreover, on the Cray XC50 supercomputer, petar well scales when number of cores increase. The 10 million-body problem, which covers the region of ultracompact dwarfs and nuclear star clusters, becomes possible to be solved.

Download Full-text

MPI-OpenMP Hybrid Parallelization for Multibody Peridynamic Simulations

Journal of the Computational Structural Engineering Institute of Korea ◽

10.7734/coseik.2020.33.3.171 ◽

2020 ◽

Vol 33 (3) ◽

pp. 171-178

Author(s):

Seungwoo Lee ◽

Youn Doh Ha

Keyword(s):

Hybrid Parallelization

Download Full-text

Evaluation of on-the-fly auto-tuning of hybrid parallelization on processors with integrated graphics

2019 4th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM) ◽

10.1109/seeda-cecnsm.2019.8908447 ◽

2019 ◽

Author(s):

Akiyoshi Wakatani

Keyword(s):

Hybrid Parallelization ◽

Auto Tuning

Download Full-text

High-Performance Parallel Simulation of Airflow for Complex Terrain Surface

Modelling and Simulation in Engineering ◽

10.1155/2019/5231839 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10

Author(s):

Kenji Ono ◽

Takanori Uchida

Keyword(s):

Mesh Generation ◽

High Performance ◽

Power Plants ◽

Domain Decomposition Method ◽

Simulation Method ◽

Two Stage ◽

Hybrid Parallelization ◽

Strong Scaling ◽

Computing Performance ◽

Two Stages

It is important to develop a reliable and high-throughput simulation method for predicting airflows in the installation planning phase of windmill power plants. This study proposes a two-stage mesh generation approach to reduce the meshing cost and introduces a hybrid parallelization scheme for atmospheric fluid simulations. The meshing approach splits mesh generation into two stages: in the first stage, the meshing parameters that uniquely determine the mesh distribution are extracted, and in the second stage, a mesh system is generated in parallel via an in situ approach using the parameters obtained in the initialization phase of the simulation. The proposed two-stage approach is flexible since an arbitrary number of processes can be selected at run time. An efficient OpenMP-MPI hybrid parallelization scheme using a middleware that provides a framework of parallel codes based on the domain decomposition method is also developed. The preliminary results of the meshing and computing performance show excellent scalability in the strong scaling test.

Download Full-text

hybrid parallelization
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Hybrid parallelization of molecular dynamics simulations to reduce load imbalance

Fast and Accurate Solution of Integral Formulations of Large MQS Problems Based on Hybrid OpenMP–MPI Parallelization

MagIC v5.10: a two-dimensional message-passing interface (MPI) distribution for pseudo-spectral magnetohydrodynamics simulations in spherical geometry

Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy

Optimizing the hybrid parallelization of BHAC

Alternating directions parallel hybrid memory iGRM direct solver for non-stationary simulations

petar: a high-performance N-body code for modelling massive collisional stellar systems

MPI-OpenMP Hybrid Parallelization for Multibody Peridynamic Simulations

Evaluation of on-the-fly auto-tuning of hybrid parallelization on processors with integrated graphics

High-Performance Parallel Simulation of Airflow for Complex Terrain Surface

Export Citation Format

hybrid parallelizationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Hybrid parallelization of molecular dynamics simulations to reduce load imbalance

Fast and Accurate Solution of Integral Formulations of Large MQS Problems Based on Hybrid OpenMP–MPI Parallelization

MagIC v5.10: a two-dimensional message-passing interface (MPI) distribution for pseudo-spectral magnetohydrodynamics simulations in spherical geometry

Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy

Optimizing the hybrid parallelization of BHAC

Alternating directions parallel hybrid memory iGRM direct solver for non-stationary simulations

petar: a high-performance N-body code for modelling massive collisional stellar systems

MPI-OpenMP Hybrid Parallelization for Multibody Peridynamic Simulations

Evaluation of on-the-fly auto-tuning of hybrid parallelization on processors with integrated graphics

High-Performance Parallel Simulation of Airflow for Complex Terrain Surface

hybrid parallelization
Recently Published Documents