Parallelization of a 3-Dimensional Hydrodynamics Model Using a Hybrid Method with MPI and OpenMP

A Parallel Genetic Algorithm (PGA) is used for a simulation-based optimization of waterway project schedules. This PGA is designed to distribute a Genetic Algorithm application over multiple processors in order to speed up the solution search procedure for a very large combinational problem. The proposed PGA is based on a global parallel model, which is also called a master-slave model. A Message-Passing Interface (MPI) is used in developing the parallel computing program. A case study is presented, whose results show how the adaption of a simulation-based optimization algorithm to parallel computing can greatly reduce computation time. Additional techniques which are found to further improve the PGA performance include: (1) choosing an appropriate task distribution method, (2) distributing simulation replications instead of different solutions, (3) avoiding the simulation of duplicate solutions, (4) avoiding running multiple simulations simultaneously in shared-memory processors, and (5) avoiding using multiple processors which belong to different clusters (physical sub-networks).

Download Full-text

Multisensor satellite data for water quality analysis and water pollution risk assessment: decision making under deep uncertainty with fuzzy algorithm in framework of multimodel approach

Remote Sensing for Agriculture, Ecosystems, and Hydrology XIX ◽

10.1117/12.2276151 ◽

2017 ◽

Author(s):

Yuriy V. Kostyuchenko ◽

Yulia Sztoyka ◽

Ivan Kopachevsky ◽

Igor Artemenko ◽

Maxim Yuschenko

Keyword(s):

Water Pollution ◽

Water Quality ◽

Risk Assessment ◽

Decision Making ◽

Satellite Data ◽

Quality Analysis ◽

Water Quality Analysis ◽

Deep Uncertainty ◽

Fuzzy Algorithm ◽

Pollution Risk

Download Full-text

A generic simulation cell method for developing extensible, efficient and readable parallel computational models

Geoscientific Model Development ◽

10.5194/gmd-8-473-2015 ◽

2015 ◽

Vol 8 (3) ◽

pp. 473-483

Author(s):

I. Honkonen

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Computational Models ◽

Source Code ◽

Computational Grid ◽

Communication Strategy ◽

Cell Method ◽

Parallel Performance ◽

Cell Class ◽

Simulation Cell

Abstract. I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring modification of existing code. This is an advantage for the development and testing of, e.g., geoscientific software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. An implementation of the generic simulation cell method presented here, generic simulation cell class (gensimcell), also includes support for parallel programming by allowing model developers to select which simulation variables of, e.g., a domain-decomposed model to transfer between processes via a Message Passing Interface (MPI) library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class requires a C++ compiler that supports a version of the language standardized in 2011 (C++11). The code is available at https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those who do are kindly requested to acknowledge and cite this work.

Download Full-text

Parallel Implementation of Non-slicing Floorplans with MPI and OpenMP

10.32920/ryerson.14647368 ◽

2021 ◽

Author(s):

Oluvaseun Owojaiye

Keyword(s):

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Parallel Implementation ◽

Computation Time ◽

Sequential Algorithm ◽

Design Stage ◽

Single Chip ◽

Solution Quality ◽

Early Design Stage

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.

Download Full-text

PARALLEL FRAMEWORK FOR EARTHQUAKE INDUCED RESPONSE COMPUTATION OF THE SDOF STRUCTURE

Journal of Civil Engineering and Management ◽

10.3846/13923730.2013.801917 ◽

2014 ◽

Vol 20 (4) ◽

pp. 477-484 ◽

Cited By ~ 1

Author(s):

Sarfraz Munir ◽

Raja Rizwan Hussain ◽

A. B. M. Saiful Islam

Keyword(s):

Parallel Computing ◽

Message Passing ◽

Message Passing Interface ◽

Computation Time ◽

Forced Response ◽

Induced Response ◽

Sequential Programs ◽

Free Response ◽

Single Degree Of Freedom ◽

Structural Responses

Parallel computing briskly diminishes computation time through simultaneous use of multiple computing resources. In this research, parallel computing techniques have been developed to parallelize a program for obtaining a response of single degree of freedom (SDOF) structure under earthquake loading. The study uses Distributed Memory Processors (DMP) hardware architecture and Message Passing Interface (MPI) compilers directives to parallelize the program. The program is made parallel by domain decomposition. Concurrency in the program is created by dividing the program into two parts to run on different computers, calculating forced response and free response of the first half and the second half. Parallel framework successfully creates concurrency and finds structural responses in significant lesser time than sequential programs.

Download Full-text

Simulation-Based Scheduling of Waterway Projects Using a Parallel Genetic Algorithm

Civil and Environmental Engineering ◽

10.4018/978-1-4666-9619-8.ch046 ◽

2016 ◽

pp. 1071-1084

Author(s):

Ning Yang ◽

Shiaaulir Wang ◽

Paul Schonfeld

Keyword(s):

Genetic Algorithm ◽

Parallel Computing ◽

Message Passing ◽

Message Passing Interface ◽

Computation Time ◽

Parallel Genetic Algorithm ◽

Simulation Based ◽

Multiple Processors ◽

Simulation Based Optimization ◽

Speed Up

A Parallel Genetic Algorithm (PGA) is used for a simulation-based optimization of waterway project schedules. This PGA is designed to distribute a Genetic Algorithm application over multiple processors in order to speed up the solution search procedure for a very large combinational problem. The proposed PGA is based on a global parallel model, which is also called a master-slave model. A Message-Passing Interface (MPI) is used in developing the parallel computing program. A case study is presented, whose results show how the adaption of a simulation-based optimization algorithm to parallel computing can greatly reduce computation time. Additional techniques which are found to further improve the PGA performance include: (1) choosing an appropriate task distribution method, (2) distributing simulation replications instead of different solutions, (3) avoiding the simulation of duplicate solutions, (4) avoiding running multiple simulations simultaneously in shared-memory processors, and (5) avoiding using multiple processors which belong to different clusters (physical sub-networks).

Download Full-text

Differentiating Message Passing Interface and Bulk Synchronous Parallel Computation Models

Lecture Notes in Computer Science - Parallel Computing Technologies ◽

10.1007/3-540-48387-x_50 ◽

1999 ◽

pp. 477-478

Author(s):

Christophe Cérin

Keyword(s):

Parallel Computation ◽

Message Passing ◽

Message Passing Interface ◽

Bulk Synchronous Parallel

Download Full-text

Parallelization of an Implicit Algorithm for Multi-Dimensional Particle-in-Cell Simulations

Communications in Computational Physics ◽

10.4208/cicp.070813.280214a ◽

2014 ◽

Vol 16 (3) ◽

pp. 599-611 ◽

Cited By ~ 6

Author(s):

George M. Petrov ◽

Jack Davis

Keyword(s):

Domain Decomposition ◽

Message Passing ◽

Message Passing Interface ◽

Computation Time ◽

Three Dimensions ◽

Maximum Speed ◽

Particle In Cell ◽

Speed Up ◽

Ultrashort Pulse Lasers ◽

Mpi Implementation

AbstractThe implicit 2D3V particle-in-cell (PIC) code developed to study the interaction of ultrashort pulse lasers with matter [G. M. Petrov and J. Davis, Computer Phys. Comm. 179, 868 (2008); Phys. Plasmas 18, 073102 (2011)] has been parallelized using MPI (Message Passing Interface). The parallelization strategy is optimized for a small number of computer cores, up to about 64. Details on the algorithm implementation are given with emphasis on code optimization by overlapping computations with communications. Performance evaluation for 1D domain decomposition has been made on a small Linux cluster with 64 computer cores for two typical regimes of PIC operation: “particle dominated”, for which the bulk of the computation time is spent on pushing particles, and “field dominated”, for which computing the fields is prevalent. For a small number of computer cores, less than 32, the MPI implementation offers a significant numerical speed-up. In the “particle dominated” regime it is close to the maximum theoretical one, while in the “field dominated” regime it is about 75-80 % of the maximum speed-up. For a number of cores exceeding 32, performance degradation takes place as a result of the adopted 1D domain decomposition. The code parallelization will allow future implementation of atomic physics and extension to three dimensions.

Download Full-text

3-D Deterministic Burnup With High Performance Computing

Volume 2: Fuel Cycle and High Level Waste Management; Computational Fluid Dynamics, Neutronics Methods and Coupled Codes; Student Paper Competition ◽

10.1115/icone16-48961 ◽

2008 ◽

Author(s):

Thomas J. Plower ◽

Kevin Manalo ◽

Mireille A. Rowe

Keyword(s):

Monte Carlo ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Diffusion Theory ◽

Computation Time ◽

Fuel Burnup ◽

Parallel Computer ◽

Fuel Pin ◽

Alternative Means

Current 3-D reactor burnup simulation codes typically utilize either transport-corrected diffusion theory or Monte Carlo methods to perform flux calculations necessary for fuel depletion. Monte Carlo codes, particularly the Monte Carlo N-particle Transport Code (MCNP) from Los Alamos, have become increasingly popular with the growth of parallel computing. While achieving a criticality eigenvalue is relatively straight forward, run times for large models requiring converged fission sources from proper burnup computation quickly becomes very time consuming. Additionally, past analyses have shown difficulties in source convergence for lattice problems using Monte Carlo [1]. To invoke an alternative means of computing core burnup and decreasing computation time for large models, a deterministic tool such as the PENTRAN/PENBURN suite is necessary. PENTRAN is a multi-group, anisotropic Sn code for 3-D Cartesian geometries; it has been specifically designed for distributed memory, scalable parallel computer architectures using the MPI (Message Passing Interface) library. Automatic domain decomposition among the angular, energy, and spatial variables with an adaptive differencing algorithm and other numerical enhancements make PENTRAN an extremely robust solver with a 0.975 parallel code fraction (based on Amdahl’s law). PENBURN (Parallel Environment BURNup), a recently developed fuel depletion solver, works in conjunction with PENTRAN and performs 3-D zone based fuel burnup using the direct Bateman chain solution method. The aim of this paper is to demonstrate the capabilities and unique features of the PENTRAN/PENBURN suite through a fuel burnup study on a 3 wt% enriched UO2 fuel pin and 17×17 Westinghouse OFA assembly.

Download Full-text

Parallel Implementation of Non-slicing Floorplans with MPI and OpenMP

10.32920/ryerson.14647368.v1 ◽

2021 ◽

Author(s):

Oluvaseun Owojaiye

Keyword(s):

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Parallel Implementation ◽

Computation Time ◽

Sequential Algorithm ◽

Design Stage ◽

Single Chip ◽

Solution Quality ◽

Early Design Stage

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.

Download Full-text