hybrid parallelization
Recently Published Documents


TOTAL DOCUMENTS

62
(FIVE YEARS 13)

H-INDEX

8
(FIVE YEARS 1)

2022 ◽  
Vol 12 (2) ◽  
pp. 627
Author(s):  
Salvatore Ventre ◽  
Francesca Cau ◽  
Andrea Chiariello ◽  
Gaspare Giovinco ◽  
Antonio Maffucci ◽  
...  

This paper proposes an optimal strategy to parallelize the solution of large 3D magneto-quasi-static (MQS) problems, by combining the MPI and OpenMP approaches. The studied numerical problem comes from a weak-form integral formulation of a MQS problem and is finally cast in terms of a large linear system to be solved by means of a direct method. For this purpose, two main tasks are identified: the assembly and the inversion of the matrices. The paper focuses on the optimization of the resources required for assembling the matrices, by exploiting the feature of a hybrid OpenMP–MPI approach. Specifically, the job is shared between clusters of nodes in parallel by adopting an OpenMP paradigm at the node level and a MPI one at the process level between nodes. Compared with other solutions, such as pure MPI, this hybrid parallelization optimizes the available resources, with respect to the speed, allocated memory, and the communication between nodes. These advantages are clearly observed in the case studies analyzed in this paper, coming from the study of large plasma fusion machines, such as the fusion reactor ITER. Indeed, the MQS problems associated with such applications are characterized by a huge computational cost that requires parallel computing approaches.


2021 ◽  
Vol 14 (12) ◽  
pp. 7477-7495
Author(s):  
Rafael Lago ◽  
Thomas Gastine ◽  
Tilman Dannert ◽  
Markus Rampp ◽  
Johannes Wicht

Abstract. We discuss two parallelization schemes for MagIC, an open-source, high-performance, pseudo-spectral code for the numerical solution of the magnetohydrodynamics equations in a rotating spherical shell. MagIC calculates the non-linear terms on a numerical grid in spherical coordinates, while the time step updates are performed on radial grid points with a spherical harmonic representation of the lateral directions. Several transforms are required to switch between the different representations. The established hybrid parallelization of MagIC uses message-passing interface (MPI) distribution in radius and relies on existing fast spherical transforms using OpenMP. Our new two-dimensional MPI decomposition implementation also distributes the latitudes or the azimuthal wavenumbers across the available MPI tasks and compute cores. We discuss several non-trivial algorithmic optimizations and the different data distribution layouts employed by our scheme. In particular, the two-dimensional distribution data layout yields a code that strongly scales well beyond the limit of the current one-dimensional distribution. We also show that the two-dimensional distribution implementation, although not yet fully optimized, can already be faster than the existing finely optimized hybrid parallelization when using many thousands of CPU cores. Our analysis indicates that the two-dimensional distribution variant can be further optimized to also surpass the performance of the one-dimensional distribution for a few thousand cores.


2021 ◽  
pp. 100509
Author(s):  
S. Cielo ◽  
O. Porth ◽  
L. Iapichino ◽  
A. Karmakar ◽  
H. Olivares ◽  
...  

2020 ◽  
Vol 21 (4) ◽  
Author(s):  
Maciej Woźniak ◽  
Anna Janina Bukowska

The three-dimensional isogeometric analysis (IGA-FEM) is a modern method for simulation. The idea is to utilize B-splines or NURBS basis functions for both computational domain descriptions and the engineering computations. Refined isogeometric analysis (rIGA) employs a mixture of patches of elements with B-spline basis functions, and $C^0$ separators between them. It enables a reduction of the computational cost of direct solvers. Both IGA and rIGA come with challenging sparse matrix structure, that is expensive to generate. In this paper, we show a hybrid parallelization method to reduce the computational cost of the integration phase using hybrid-memory parallel machines. The two-level parallelization includes the partitioning of the computational mesh into sub-domains on the first level (MPI), and loop parallelization on the second level (OpenMP). We show that hybrid parallelization of the integration reduces the contribution of this phase significantly. Thus, alternative algorithms for fast isogeometric integration are not necessary.


2020 ◽  
Vol 497 (1) ◽  
pp. 536-555 ◽  
Author(s):  
Long Wang ◽  
Masaki Iwasawa ◽  
Keigo Nitadori ◽  
Junichiro Makino

ABSTRACT The numerical simulations of massive collisional stellar systems, such as globular clusters (GCs), are very time consuming. Until now, only a few realistic million-body simulations of GCs with a small fraction of binaries ($5{{\ \rm per\ cent}}$) have been performed by using the nbody6++gpu code. Such models took half a year computational time on a Graphic Processing Unit (GPU)-based supercomputer. In this work, we develop a new N-body code, petar, by combining the methods of Barnes–Hut tree, Hermite integrator and slow-down algorithmic regularization. The code can accurately handle an arbitrary fraction of multiple systems (e.g. binaries and triples) while keeping a high performance by using the hybrid parallelization methods with mpi, openmp, simd instructions and GPU. A few benchmarks indicate that petar and nbody6++gpu have a very good agreement on the long-term evolution of the global structure, binary orbits and escapers. On a highly configured GPU desktop computer, the performance of a million-body simulation with all stars in binaries by using petar is 11 times faster than that of nbody6++gpu. Moreover, on the Cray XC50 supercomputer, petar well scales when number of cores increase. The 10 million-body problem, which covers the region of ultracompact dwarfs and nuclear star clusters, becomes possible to be solved.


2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Kenji Ono ◽  
Takanori Uchida

It is important to develop a reliable and high-throughput simulation method for predicting airflows in the installation planning phase of windmill power plants. This study proposes a two-stage mesh generation approach to reduce the meshing cost and introduces a hybrid parallelization scheme for atmospheric fluid simulations. The meshing approach splits mesh generation into two stages: in the first stage, the meshing parameters that uniquely determine the mesh distribution are extracted, and in the second stage, a mesh system is generated in parallel via an in situ approach using the parameters obtained in the initialization phase of the simulation. The proposed two-stage approach is flexible since an arbitrary number of processes can be selected at run time. An efficient OpenMP-MPI hybrid parallelization scheme using a middleware that provides a framework of parallel codes based on the domain decomposition method is also developed. The preliminary results of the meshing and computing performance show excellent scalability in the strong scaling test.


Sign in / Sign up

Export Citation Format

Share Document