scholarly journals Parallelization of a 3-Dimensional Hydrodynamics Model Using a Hybrid Method with MPI and OpenMP

Processes ◽  
2021 ◽  
Vol 9 (9) ◽  
pp. 1548
Author(s):  
Jung Min Ahn ◽  
Hongtae Kim ◽  
Jae Gab Cho ◽  
Taegu Kang ◽  
Yong-seok Kim ◽  
...  

Process-based numerical models developed to perform hydraulic/hydrologic/water quality analysis of watersheds and rivers have become highly sophisticated, with a corresponding increase in their computation time. However, for incidents such as water pollution, rapid analysis and decision-making are critical. This paper proposes an optimized parallelization scheme to reduce the computation time of the Environmental Fluid Dynamics Code-National Institute of Environmental Research (EFDC-NIER) model, which has been continuously developed for water pollution or algal bloom prediction in rivers. An existing source code and a parallel computational code with open multi-processing (OpenMP) and a message passing interface (MPI) were optimized, and their computation times compared. Subsequently, the simulation results for the existing EFDC model and the model with the parallel computation code were compared. Furthermore, the optimal parallel combination for hybrid parallel computation was evaluated by comparing the simulation time based on the number of cores and threads. When code parallelization was applied, the performance improved by a factor of approximately five compared to the existing source code. Thus, if the parallel computational source code applied in this study is used, urgent decision-making will be easier for events such as water pollution incidents.

Author(s):  
Ning Yang ◽  
Shiaaulir Wang ◽  
Paul Schonfeld

A Parallel Genetic Algorithm (PGA) is used for a simulation-based optimization of waterway project schedules. This PGA is designed to distribute a Genetic Algorithm application over multiple processors in order to speed up the solution search procedure for a very large combinational problem. The proposed PGA is based on a global parallel model, which is also called a master-slave model. A Message-Passing Interface (MPI) is used in developing the parallel computing program. A case study is presented, whose results show how the adaption of a simulation-based optimization algorithm to parallel computing can greatly reduce computation time. Additional techniques which are found to further improve the PGA performance include: (1) choosing an appropriate task distribution method, (2) distributing simulation replications instead of different solutions, (3) avoiding the simulation of duplicate solutions, (4) avoiding running multiple simulations simultaneously in shared-memory processors, and (5) avoiding using multiple processors which belong to different clusters (physical sub-networks).


2015 ◽  
Vol 8 (3) ◽  
pp. 473-483
Author(s):  
I. Honkonen

Abstract. I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring modification of existing code. This is an advantage for the development and testing of, e.g., geoscientific software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. An implementation of the generic simulation cell method presented here, generic simulation cell class (gensimcell), also includes support for parallel programming by allowing model developers to select which simulation variables of, e.g., a domain-decomposed model to transfer between processes via a Message Passing Interface (MPI) library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class requires a C++ compiler that supports a version of the language standardized in 2011 (C++11). The code is available at https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those who do are kindly requested to acknowledge and cite this work.


2021 ◽  
Author(s):  
Oluvaseun Owojaiye

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.


2014 ◽  
Vol 20 (4) ◽  
pp. 477-484 ◽  
Author(s):  
Sarfraz Munir ◽  
Raja Rizwan Hussain ◽  
A. B. M. Saiful Islam

Parallel computing briskly diminishes computation time through simultaneous use of multiple computing resources. In this research, parallel computing techniques have been developed to parallelize a program for obtaining a response of single degree of freedom (SDOF) structure under earthquake loading. The study uses Distributed Memory Processors (DMP) hardware architecture and Message Passing Interface (MPI) compilers directives to parallelize the program. The program is made parallel by domain decomposition. Concurrency in the program is created by dividing the program into two parts to run on different computers, calculating forced response and free response of the first half and the second half. Parallel framework successfully creates concurrency and finds structural responses in significant lesser time than sequential programs.


Author(s):  
Ning Yang ◽  
Shiaaulir Wang ◽  
Paul Schonfeld

A Parallel Genetic Algorithm (PGA) is used for a simulation-based optimization of waterway project schedules. This PGA is designed to distribute a Genetic Algorithm application over multiple processors in order to speed up the solution search procedure for a very large combinational problem. The proposed PGA is based on a global parallel model, which is also called a master-slave model. A Message-Passing Interface (MPI) is used in developing the parallel computing program. A case study is presented, whose results show how the adaption of a simulation-based optimization algorithm to parallel computing can greatly reduce computation time. Additional techniques which are found to further improve the PGA performance include: (1) choosing an appropriate task distribution method, (2) distributing simulation replications instead of different solutions, (3) avoiding the simulation of duplicate solutions, (4) avoiding running multiple simulations simultaneously in shared-memory processors, and (5) avoiding using multiple processors which belong to different clusters (physical sub-networks).


2014 ◽  
Vol 16 (3) ◽  
pp. 599-611 ◽  
Author(s):  
George M. Petrov ◽  
Jack Davis

AbstractThe implicit 2D3V particle-in-cell (PIC) code developed to study the interaction of ultrashort pulse lasers with matter [G. M. Petrov and J. Davis, Computer Phys. Comm. 179, 868 (2008); Phys. Plasmas 18, 073102 (2011)] has been parallelized using MPI (Message Passing Interface). The parallelization strategy is optimized for a small number of computer cores, up to about 64. Details on the algorithm implementation are given with emphasis on code optimization by overlapping computations with communications. Performance evaluation for 1D domain decomposition has been made on a small Linux cluster with 64 computer cores for two typical regimes of PIC operation: “particle dominated”, for which the bulk of the computation time is spent on pushing particles, and “field dominated”, for which computing the fields is prevalent. For a small number of computer cores, less than 32, the MPI implementation offers a significant numerical speed-up. In the “particle dominated” regime it is close to the maximum theoretical one, while in the “field dominated” regime it is about 75-80 % of the maximum speed-up. For a number of cores exceeding 32, performance degradation takes place as a result of the adopted 1D domain decomposition. The code parallelization will allow future implementation of atomic physics and extension to three dimensions.


Author(s):  
Thomas J. Plower ◽  
Kevin Manalo ◽  
Mireille A. Rowe

Current 3-D reactor burnup simulation codes typically utilize either transport-corrected diffusion theory or Monte Carlo methods to perform flux calculations necessary for fuel depletion. Monte Carlo codes, particularly the Monte Carlo N-particle Transport Code (MCNP) from Los Alamos, have become increasingly popular with the growth of parallel computing. While achieving a criticality eigenvalue is relatively straight forward, run times for large models requiring converged fission sources from proper burnup computation quickly becomes very time consuming. Additionally, past analyses have shown difficulties in source convergence for lattice problems using Monte Carlo [1]. To invoke an alternative means of computing core burnup and decreasing computation time for large models, a deterministic tool such as the PENTRAN/PENBURN suite is necessary. PENTRAN is a multi-group, anisotropic Sn code for 3-D Cartesian geometries; it has been specifically designed for distributed memory, scalable parallel computer architectures using the MPI (Message Passing Interface) library. Automatic domain decomposition among the angular, energy, and spatial variables with an adaptive differencing algorithm and other numerical enhancements make PENTRAN an extremely robust solver with a 0.975 parallel code fraction (based on Amdahl’s law). PENBURN (Parallel Environment BURNup), a recently developed fuel depletion solver, works in conjunction with PENTRAN and performs 3-D zone based fuel burnup using the direct Bateman chain solution method. The aim of this paper is to demonstrate the capabilities and unique features of the PENTRAN/PENBURN suite through a fuel burnup study on a 3 wt% enriched UO2 fuel pin and 17×17 Westinghouse OFA assembly.


2021 ◽  
Author(s):  
Oluvaseun Owojaiye

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.


Sign in / Sign up

Export Citation Format

Share Document