Hybrid Parallelization Techniques

Author(s):  
Takahiro Katagiri
2015 ◽  
Vol 18 (4) ◽  
pp. 1363-1377 ◽  
Author(s):  
Davi Teodoro Fernandes ◽  
Liang-Yee Cheng ◽  
Eric Henrique Favero ◽  
Kazuo Nishimoto

2009 ◽  
Vol 58 (5) ◽  
pp. 1071-1080 ◽  
Author(s):  
Vincent Heuveline ◽  
Mathias J. Krause ◽  
Jonas Latt

Author(s):  
Dorian Krause ◽  
Mark Potse ◽  
Thomas Dickopf ◽  
Rolf Krause ◽  
Angelo Auricchio ◽  
...  

2020 ◽  
Vol 497 (1) ◽  
pp. 536-555 ◽  
Author(s):  
Long Wang ◽  
Masaki Iwasawa ◽  
Keigo Nitadori ◽  
Junichiro Makino

ABSTRACT The numerical simulations of massive collisional stellar systems, such as globular clusters (GCs), are very time consuming. Until now, only a few realistic million-body simulations of GCs with a small fraction of binaries ($5{{\ \rm per\ cent}}$) have been performed by using the nbody6++gpu code. Such models took half a year computational time on a Graphic Processing Unit (GPU)-based supercomputer. In this work, we develop a new N-body code, petar, by combining the methods of Barnes–Hut tree, Hermite integrator and slow-down algorithmic regularization. The code can accurately handle an arbitrary fraction of multiple systems (e.g. binaries and triples) while keeping a high performance by using the hybrid parallelization methods with mpi, openmp, simd instructions and GPU. A few benchmarks indicate that petar and nbody6++gpu have a very good agreement on the long-term evolution of the global structure, binary orbits and escapers. On a highly configured GPU desktop computer, the performance of a million-body simulation with all stars in binaries by using petar is 11 times faster than that of nbody6++gpu. Moreover, on the Cray XC50 supercomputer, petar well scales when number of cores increase. The 10 million-body problem, which covers the region of ultracompact dwarfs and nuclear star clusters, becomes possible to be solved.


Author(s):  
Carmen Cotelo Queijo ◽  
Anders Gomez Tato ◽  
Ignacio Lopez Cabido ◽  
Jose Manuel Cotos Yanez

2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Kenji Ono ◽  
Takanori Uchida

It is important to develop a reliable and high-throughput simulation method for predicting airflows in the installation planning phase of windmill power plants. This study proposes a two-stage mesh generation approach to reduce the meshing cost and introduces a hybrid parallelization scheme for atmospheric fluid simulations. The meshing approach splits mesh generation into two stages: in the first stage, the meshing parameters that uniquely determine the mesh distribution are extracted, and in the second stage, a mesh system is generated in parallel via an in situ approach using the parameters obtained in the initialization phase of the simulation. The proposed two-stage approach is flexible since an arbitrary number of processes can be selected at run time. An efficient OpenMP-MPI hybrid parallelization scheme using a middleware that provides a framework of parallel codes based on the domain decomposition method is also developed. The preliminary results of the meshing and computing performance show excellent scalability in the strong scaling test.


2000 ◽  
Vol 8 (1) ◽  
pp. 5-12 ◽  
Author(s):  
John Michalakes

Beginning with the March 1998 release of the Penn State University/NCAR Mesoscale Model (MM5), and continuing through eight subsequent releases up to the present, the official version has run on distributed -memory (DM) parallel computers. Source translation and runtime library support minimize the impact of parallelization on the original model source code, with the result that the majority of code is line-for-line identical with the original version. Parallel performance and scaling are equivalent to earlier, hand-parallelized versions; the modifications have no effect when the code is compiled and run without the DM option. Supported computers include the IBM SP, Cray T3E, Fujitsu VPP, Compaq Alpha clusters, and clusters of PCs (so-called Beowulf clusters). The approach also is compatible with shared-memory parallel directives, allowing distributed-memory/shared-memory hybrid parallelization on distributed-memory clusters of symmetric multiprocessors.


Sign in / Sign up

Export Citation Format

Share Document