Parallel Implementation of the SHYFEM Model

Abstract. This paper presents the MPI-based parallelization of the three-dimensional hydrodynamic model SHYFEM (System of HydrodYnamic Finite Element Modules). The original sequential version of the code was parallelized in order to reduce the execution time of high-resolution configurations using state-of-the-art HPC systems. A distributed memory approach was used, based on the message passing interface (MPI). Optimized numerical libraries were used to partition the unstructured grid (with a focus on load balancing) and to solve the sparse linear system of equations in parallel in the case of semi-to-fully implicit time stepping. The parallel implementation of the model was validated by comparing the outputs with those obtained from the sequential version. The performance assessment demonstrates a good level of scalability with a realistic configuration used as benchmark.

Download Full-text

Parallel implementation for HSLO(3)-FDTD with message passing interface on Distributed Memory Architecture

2006 International Conference on Computing & Informatics ◽

10.1109/icoci.2006.5276531 ◽

2006 ◽

Author(s):

Mohammad Khatim Hasan ◽

Mohamed Othman ◽

Jalil Md Desa ◽

Zulkifly Abbas ◽

Jumat Sulaiman

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Distributed Memory ◽

Parallel Implementation ◽

Memory Architecture ◽

Distributed Memory Architecture

Download Full-text

Parallel Implementation of a Line-Implicit Time-Stepping Algorithm

Parallel Computational Fluid Dynamics 2001 ◽

10.1016/b978-044450672-6/50061-x ◽

2002 ◽

pp. 79-86

Author(s):

Lars Carlsson ◽

Stefan Nilsson

Keyword(s):

Parallel Implementation ◽

Implicit Time ◽

Time Stepping

Download Full-text

Reducing communication in algebraic multigrid with multi-step node aware communication

The International Journal of High Performance Computing Applications ◽

10.1177/1094342020925535 ◽

2020 ◽

Vol 34 (5) ◽

pp. 547-561

Author(s):

Amanda Bienz ◽

William D Gropp ◽

Luke N Olson

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation ◽

Algebraic Multigrid ◽

Sparse Linear Systems ◽

Parallel Scalability ◽

Strong Scaling ◽

The Cost ◽

Communication Schedule ◽

Inter Process Communication

Algebraic multigrid (AMG) is often viewed as a scalable [Formula: see text] solver for sparse linear systems. Yet, AMG lacks parallel scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid hierarchy and in the iterative solve phase. This work introduces a parallel implementation of AMG that reduces the cost of communication, yielding improved parallel scalability. It is common in Message Passing Interface (MPI), particularly in the MPI-everywhere approach, to arrange inter-process communication, so that communication is transported regardless of the location of the send and receive processes. Performance tests show notable differences in the cost of intra- and internode communication, motivating a restructuring of communication. In this case, the communication schedule takes advantage of the less costly intra-node communication, reducing both the number and the size of internode messages. Node-centric communication extends to the range of components in both the setup and solve phase of AMG, yielding an increase in the weak and strong scaling of the entire method.

Download Full-text

A FULLY IMPLICIT SOLVER OF 3-D EULER EQUATIONS ON MULTIBLOCK CURVILINEAR GRIDS

Modern Physics Letters B ◽

10.1142/s0217984905009717 ◽

2005 ◽

Vol 19 (28n29) ◽

pp. 1483-1486 ◽

Cited By ~ 1

Author(s):

HAI-QING SI ◽

TONG-GUANG WANG ◽

XIAO-YUN LUO

Keyword(s):

Euler Equations ◽

Three Dimensional ◽

Implicit Time ◽

Tvd Scheme ◽

Time Step ◽

Vector Operation ◽

Fully Implicit ◽

Curvilinear Grids ◽

The Matrix ◽

Implicit Solver

A fully implicit unfactored algorithm for three-dimensional Euler equations is developed and tested on multi-block curvilinear meshes. The convective terms are discretized using an upwind TVD scheme. The large sparse linear system generated at each implicit time step is solved by GMRES* method combined with the block incomplete lower-upper preconditioner. In order to reduce the memory requirements and the matrix-vector operation counts, an approximate method is used to derive the Jacobian matrix, which only costs half of the computational efforts of the exact Jacobian calculation. The comparison between the numerical results and the experimental data shows good agreement, which demonstrates that the implicit algorithm presented is effective and efficient.

Download Full-text

A highly portable parallel implementation of AMBER4 using the message passing interface standard

Journal of Computational Chemistry ◽

10.1002/jcc.540161110 ◽

1995 ◽

Vol 16 (11) ◽

pp. 1420-1427 ◽

Cited By ~ 18

Author(s):

James J. Vincent ◽

Kenneth M. Merz

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation

Download Full-text

Optimized parallel simulations of analytic bond-order potentials on hybrid shared/distributed memory with MPI and OpenMP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017727060 ◽

2017 ◽

Vol 33 (2) ◽

pp. 227-241 ◽

Cited By ~ 1

Author(s):

Carlos Teijeiro ◽

Thomas Hammerschmidt ◽

Ralf Drautz ◽

Godehard Sutmann

Keyword(s):

Message Passing ◽

Bond Order ◽

Message Passing Interface ◽

Parallel Implementation ◽

Computational Cost ◽

Parallel Simulations ◽

Decomposition Scheme ◽

Significant Performance ◽

Restricted Volume ◽

Bond Order Potentials

Analytic bond-order potentials (BOPs) allow to obtain a highly accurate description of interatomic interactions at a reasonable computational cost. However, for simulations with very large systems, the high memory demands require the use of a parallel implementation, which at the same time also optimizes the use of computational resources. The calculations of analytic BOPs are performed for a restricted volume around every atom and therefore have shown to be well suited for a message passing interface (MPI)-based parallelization based on a domain decomposition scheme, in which one process manages one big domain using the entire memory of a compute node. On the basis of this approach, the present work focuses on the analysis and enhancement of its performance on shared memory by using OpenMP threads on each MPI process, in order to use many cores per node to speed up computations and minimize memory bottlenecks. Different algorithms are described and their corresponding performance results are presented, showing significant performance gains for highly parallel systems with hybrid MPI/OpenMP simulations up to several thousands of threads.

Download Full-text

CIP and Parallel Computing Based Numerical Solutions of 3-D Slamming Problems

Volume 11: Prof. Robert F. Beck Honoring Symposium on Marine Hydrodynamics ◽

10.1115/omae2015-41292 ◽

2015 ◽

Author(s):

Peng Wen ◽

Wei Qiu

Keyword(s):

Parallel Computing ◽

Message Passing ◽

Message Passing Interface ◽

Numerical Solutions ◽

Three Dimensional ◽

Simulation Method ◽

Water Entry ◽

Computational Domain ◽

Cip Method ◽

Constrained Interpolation

This paper presents the further development of numerical simulation method to solve 3-D highly non-linear slamming problems using parallel computing algorithms. The water entry problems are treated as multi-phase problems (solid, water and air) and governed by the Navier-Stokes (N-S) equations. They are solved by the three-dimensional constrained interpolation profile (CIP) method. The interfaces between different phases are captured using density functions. In the computation, the 3-D CIP method is employed for the advection phase of the N-S equations and a pressure-based algorithm is applied for the non-advection phase. The bi-conjugate gradient stabilized method (BiCGSTAB) is utilized to solve the linear equation systems. A Message Passing Interface (MPI) parallel computing scheme was implemented in the computations. For the parallel computations, the three-dimensional Cartesian decomposition of the computational domain was used. The speed-up performance of various decomposition schemes were studied. Validation studies were carried out for the water entry of a 3-D wedge and a 3-D ship section with prescribed velocities. The computed slamming force, pressure distribution and free-surface elevations are compared with experimental results and numerical results by other methods.

Download Full-text

Parallel Implementation of Non-slicing Floorplans with MPI and OpenMP

10.32920/ryerson.14647368 ◽

2021 ◽

Author(s):

Oluvaseun Owojaiye

Keyword(s):

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Parallel Implementation ◽

Computation Time ◽

Sequential Algorithm ◽

Design Stage ◽

Single Chip ◽

Solution Quality ◽

Early Design Stage

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.

Download Full-text

BigMPI4py: Python module for parallelization of Big Data objects

10.1101/517441 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alex M. Ascension ◽

Marcos J. Araúzo-Bravo

Keyword(s):

Big Data ◽

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation ◽

Germ Layers ◽

Link Type ◽

Big Data Applications ◽

Genome Bisulfite Sequencing ◽

Data Objects ◽

Parallel Evaluation

AbstractBig Data analysis is a discipline with a growing number of areas where huge amounts of data is extracted and analyzed. Parallelization in Python integrates Message Passing Interface via mpi4py module. Since mpi4py does not support parallelization of objects greater than 231 bytes, we developed BigMPI4py, a Python module that wraps mpi4py, supporting object sizes beyond this boundary. BigMPI4py automatically determines the optimal object distribution strategy, and also uses vectorized methods, achieving higher parallelization efficiency. BigMPI4py facilitates the implementation of Python for Big Data applications in multicore workstations and HPC systems. We validated BigMPI4py on whole genome bisulfite sequencing (WGBS) DNA methylation ENCODE data of 59 samples from 27 human tissues. We categorized them on the three germ layers and developed a parallel implementation of the Kruskall-Wallis test to find CpGs with differential methylation across germ layers. We observed a differentiation of the germ layers, and a set of hypermethylated genes in ectoderm and mesoderm-related tissues, and another set in endoderm-related tissues. The parallel evaluation of the significance of 55 million CpG achieved a 22x speedup with 25 cores. BigMPI4py is available at https://gitlab.com/alexmascension/bigmpi4py and the Jupyter Notebook with WGBS analysis at https://gitlab.com/alexmascension/wgbs-analysis

Download Full-text

A DISSIPATION-FREE TIME-DOMAIN DISCONTINUOUS GALERKIN METHOD APPLIED TO THREE-DIMENSIONAL LINEARIZED EULER EQUATIONS AROUND A STEADY-STATE NON-UNIFORM INVISCID FLOW

Journal of Computational Acoustics ◽

10.1142/s0218396x0600313x ◽

2006 ◽

Vol 14 (04) ◽

pp. 445-467 ◽

Cited By ~ 6

Author(s):

MARC BERNACKI ◽

SERGE PIPERNO

Keyword(s):

Steady State ◽

Euler Equations ◽

Discontinuous Galerkin ◽

Time Domain ◽

Balance Equation ◽

Message Passing ◽

Parallel Implementation ◽

Three Dimensional ◽

General Context ◽

Linearized Euler Equations

We present in this paper a time-domain discontinuous Galerkin dissipation-free method for the transient solution of the three-dimensional linearized Euler equations around a steady-state solution. In the general context of a nonuniform supporting flow, we prove, using the well-known symmetrization of Euler equations, that some aeroacoustic energy satisfies a balance equation with source term at the continuous level, and that our numerical framework satisfies an equivalent balance equation at the discrete level and is genuinely dissipation-free. In the case of ℙ1 Lagrange basis functions and tetrahedral unstructured meshes, a parallel implementation of the method has been developed, based on message passing and mesh partitioning. Three-dimensional numerical results confirm the theoretical properties of the method. They include test-cases where Kelvin–Helmholtz instabilities appear.

Download Full-text