scholarly journals Parallel Implementation of the SHYFEM Model

2021 ◽  
Author(s):  
Giorgio Micaletto ◽  
Ivano Barletta ◽  
Silvia Mocavero ◽  
Ivan Federico ◽  
Italo Epicoco ◽  
...  

Abstract. This paper presents the MPI-based parallelization of the three-dimensional hydrodynamic model SHYFEM (System of HydrodYnamic Finite Element Modules). The original sequential version of the code was parallelized in order to reduce the execution time of high-resolution configurations using state-of-the-art HPC systems. A distributed memory approach was used, based on the message passing interface (MPI). Optimized numerical libraries were used to partition the unstructured grid (with a focus on load balancing) and to solve the sparse linear system of equations in parallel in the case of semi-to-fully implicit time stepping. The parallel implementation of the model was validated by comparing the outputs with those obtained from the sequential version. The performance assessment demonstrates a good level of scalability with a realistic configuration used as benchmark.

Author(s):  
Amanda Bienz ◽  
William D Gropp ◽  
Luke N Olson

Algebraic multigrid (AMG) is often viewed as a scalable [Formula: see text] solver for sparse linear systems. Yet, AMG lacks parallel scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid hierarchy and in the iterative solve phase. This work introduces a parallel implementation of AMG that reduces the cost of communication, yielding improved parallel scalability. It is common in Message Passing Interface (MPI), particularly in the MPI-everywhere approach, to arrange inter-process communication, so that communication is transported regardless of the location of the send and receive processes. Performance tests show notable differences in the cost of intra- and internode communication, motivating a restructuring of communication. In this case, the communication schedule takes advantage of the less costly intra-node communication, reducing both the number and the size of internode messages. Node-centric communication extends to the range of components in both the setup and solve phase of AMG, yielding an increase in the weak and strong scaling of the entire method.


2005 ◽  
Vol 19 (28n29) ◽  
pp. 1483-1486 ◽  
Author(s):  
HAI-QING SI ◽  
TONG-GUANG WANG ◽  
XIAO-YUN LUO

A fully implicit unfactored algorithm for three-dimensional Euler equations is developed and tested on multi-block curvilinear meshes. The convective terms are discretized using an upwind TVD scheme. The large sparse linear system generated at each implicit time step is solved by GMRES* method combined with the block incomplete lower-upper preconditioner. In order to reduce the memory requirements and the matrix-vector operation counts, an approximate method is used to derive the Jacobian matrix, which only costs half of the computational efforts of the exact Jacobian calculation. The comparison between the numerical results and the experimental data shows good agreement, which demonstrates that the implicit algorithm presented is effective and efficient.


Author(s):  
Carlos Teijeiro ◽  
Thomas Hammerschmidt ◽  
Ralf Drautz ◽  
Godehard Sutmann

Analytic bond-order potentials (BOPs) allow to obtain a highly accurate description of interatomic interactions at a reasonable computational cost. However, for simulations with very large systems, the high memory demands require the use of a parallel implementation, which at the same time also optimizes the use of computational resources. The calculations of analytic BOPs are performed for a restricted volume around every atom and therefore have shown to be well suited for a message passing interface (MPI)-based parallelization based on a domain decomposition scheme, in which one process manages one big domain using the entire memory of a compute node. On the basis of this approach, the present work focuses on the analysis and enhancement of its performance on shared memory by using OpenMP threads on each MPI process, in order to use many cores per node to speed up computations and minimize memory bottlenecks. Different algorithms are described and their corresponding performance results are presented, showing significant performance gains for highly parallel systems with hybrid MPI/OpenMP simulations up to several thousands of threads.


Author(s):  
Peng Wen ◽  
Wei Qiu

This paper presents the further development of numerical simulation method to solve 3-D highly non-linear slamming problems using parallel computing algorithms. The water entry problems are treated as multi-phase problems (solid, water and air) and governed by the Navier-Stokes (N-S) equations. They are solved by the three-dimensional constrained interpolation profile (CIP) method. The interfaces between different phases are captured using density functions. In the computation, the 3-D CIP method is employed for the advection phase of the N-S equations and a pressure-based algorithm is applied for the non-advection phase. The bi-conjugate gradient stabilized method (BiCGSTAB) is utilized to solve the linear equation systems. A Message Passing Interface (MPI) parallel computing scheme was implemented in the computations. For the parallel computations, the three-dimensional Cartesian decomposition of the computational domain was used. The speed-up performance of various decomposition schemes were studied. Validation studies were carried out for the water entry of a 3-D wedge and a 3-D ship section with prescribed velocities. The computed slamming force, pressure distribution and free-surface elevations are compared with experimental results and numerical results by other methods.


2021 ◽  
Author(s):  
Oluvaseun Owojaiye

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.


2019 ◽  
Author(s):  
Alex M. Ascension ◽  
Marcos J. Araúzo-Bravo

AbstractBig Data analysis is a discipline with a growing number of areas where huge amounts of data is extracted and analyzed. Parallelization in Python integrates Message Passing Interface via mpi4py module. Since mpi4py does not support parallelization of objects greater than 231 bytes, we developed BigMPI4py, a Python module that wraps mpi4py, supporting object sizes beyond this boundary. BigMPI4py automatically determines the optimal object distribution strategy, and also uses vectorized methods, achieving higher parallelization efficiency. BigMPI4py facilitates the implementation of Python for Big Data applications in multicore workstations and HPC systems. We validated BigMPI4py on whole genome bisulfite sequencing (WGBS) DNA methylation ENCODE data of 59 samples from 27 human tissues. We categorized them on the three germ layers and developed a parallel implementation of the Kruskall-Wallis test to find CpGs with differential methylation across germ layers. We observed a differentiation of the germ layers, and a set of hypermethylated genes in ectoderm and mesoderm-related tissues, and another set in endoderm-related tissues. The parallel evaluation of the significance of 55 million CpG achieved a 22x speedup with 25 cores. BigMPI4py is available at https://gitlab.com/alexmascension/bigmpi4py and the Jupyter Notebook with WGBS analysis at https://gitlab.com/alexmascension/wgbs-analysis


2006 ◽  
Vol 14 (04) ◽  
pp. 445-467 ◽  
Author(s):  
MARC BERNACKI ◽  
SERGE PIPERNO

We present in this paper a time-domain discontinuous Galerkin dissipation-free method for the transient solution of the three-dimensional linearized Euler equations around a steady-state solution. In the general context of a nonuniform supporting flow, we prove, using the well-known symmetrization of Euler equations, that some aeroacoustic energy satisfies a balance equation with source term at the continuous level, and that our numerical framework satisfies an equivalent balance equation at the discrete level and is genuinely dissipation-free. In the case of ℙ1 Lagrange basis functions and tetrahedral unstructured meshes, a parallel implementation of the method has been developed, based on message passing and mesh partitioning. Three-dimensional numerical results confirm the theoretical properties of the method. They include test-cases where Kelvin–Helmholtz instabilities appear.


Sign in / Sign up

Export Citation Format

Share Document