scholarly journals Parallel computing efficiency of SWAN 40.91

2021 ◽  
Vol 14 (7) ◽  
pp. 4241-4247
Author(s):  
Christo Rautenbach ◽  
Julia C. Mullarney ◽  
Karin R. Bryan

Abstract. Effective and accurate ocean and coastal wave predictions are necessary for engineering, safety and recreational purposes. Refining predictive capabilities is increasingly critical to reduce the uncertainties faced with a changing global wave climatology. Simulating WAves in the Nearshore (SWAN) is a widely used spectral wave modelling tool employed by coastal engineers and scientists, including for operational wave forecasting purposes. Fore- and hindcasts can span hours to decades, and a detailed understanding of the computational efficiencies is required to design optimized operational protocols and hindcast scenarios. To date, there exists limited knowledge on the relationship between the size of a SWAN computational domain and the optimal amount of parallel computational threads/cores required to execute a simulation effectively. To test the scalability, a hindcast cluster of 28 computational threads/cores (1 node) was used to determine the computation efficiencies of a SWAN model configuration for southern Africa. The model extent and resolution emulate the current operational wave forecasting configuration developed by the South African Weather Service (SAWS). We implemented and compared both OpenMP and the Message Passing Interface (MPI) distributing memory architectures. Three sequential simulations (corresponding to typical grid cell numbers) were compared to various permutations of parallel computations using the speed-up ratio, time-saving ratio and efficiency tests. Generally, a computational node configuration of six threads/cores produced the most effective computational set-up based on wave hindcasts of 1-week duration. The use of more than 20 threads/cores resulted in a decrease in speed-up ratio for the smallest computation domain, owing to the increased sub-domain communication times for limited domain sizes.

2020 ◽  
Author(s):  
Christo Rautenbach ◽  
Julia C. Mullarney ◽  
Karin R. Bryan

Abstract. Effective and accurate ocean and coastal wave predictions are necessary for engineering, safety and recreational purposes. Refining predictive capabilities is increasingly critical to reduce the uncertainties faced with a changing global wave climatology. Simulating WAves in the Nearshore (SWAN) is a widely used spectral wave modelling tool employed by coastal engineers and scientists, including for operational wave forecasting purposes. Fore- and hindcasts can span hours to decades and a detailed understanding of the computational efficiencies is required to design optimized operational protocols and hindcast scenarios. To date, there exists limited knowledge on the relationship between the size of a SWAN computational domain and the optimal amount of parallel computational threads required to execute a simulation effectively. To test this, a hindcast cluster of 28 computational threads (1 node) was used to determine the computation efficiencies of a SWAN model configuration for southern Africa. The model extent and resolution emulate the current operational wave forecasting configuration developed by the South African Weather Service (SAWS). We implemented and compared both OpenMP and the Message Passing Interface (MPI) distributing memory architectures. Three sequential simulations (corresponding to typical grid cell numbers) were compared to various permutations of parallel computations via the speed-up ratio, time saving ratio and efficiency tests. Generally, a computational node configuration of 6 threads produced the most effective computational set-up based on wave hindcasts of one-week duration. The use of more than 20 threads resulted in a decrease in speed-up ratio for the smallest computation domain, owing to the increased sub-domain communication times for limited domain sizes.


2014 ◽  
Vol 493 ◽  
pp. 215-220
Author(s):  
Vivien Djanali ◽  
Steven W. Armfield ◽  
Michael P. Kirkpatrick ◽  
Stuart Norris

Parallel performance of a fractional step Navier-Stokes solver is investigated. Parallelisation is performed using Message Passing Interface, with domain partitioning. Block preconditioning is applied to the solution of the pressure Poisson equation, which is often the bottleneck in the computation of the fractional step method. Preconditioners tested are classes of incomplete matrix decompositions and sparse approximate inverses. The computational domain is decomposed into eight parts of about equal size in terms of the number of cells, and solved on eight parallel processors. Several aspects of the parallelisation, such as domain splitting directions, speed-up and scalability of the preconditioners, are discussed.


Author(s):  
Ning Yang ◽  
Shiaaulir Wang ◽  
Paul Schonfeld

A Parallel Genetic Algorithm (PGA) is used for a simulation-based optimization of waterway project schedules. This PGA is designed to distribute a Genetic Algorithm application over multiple processors in order to speed up the solution search procedure for a very large combinational problem. The proposed PGA is based on a global parallel model, which is also called a master-slave model. A Message-Passing Interface (MPI) is used in developing the parallel computing program. A case study is presented, whose results show how the adaption of a simulation-based optimization algorithm to parallel computing can greatly reduce computation time. Additional techniques which are found to further improve the PGA performance include: (1) choosing an appropriate task distribution method, (2) distributing simulation replications instead of different solutions, (3) avoiding the simulation of duplicate solutions, (4) avoiding running multiple simulations simultaneously in shared-memory processors, and (5) avoiding using multiple processors which belong to different clusters (physical sub-networks).


2012 ◽  
Vol 263-266 ◽  
pp. 1315-1318
Author(s):  
Kun Ming Yu ◽  
Ming Gong Lee

This paper is to discuss how Python can be used in designing a cluster parallel computation environment in numerical solution of some block predictor-corrector method for ordinary differential equations. In the parallel process, MPI-2(message passing interface) is used as a standard of MPICH2 to communicate between CPUs. The operation of data receiving and sending are operated and controlled by mpi4py which is based on Python. Implementation of a block predictor-corrector numerical method with one and two CPUs respectively is used to test the performance of some initial value problem. Minor speed up is obtained due to small size problems and few CPUs used in the scheme, though the establishment of this scheme by Python is valuable due to very few research has been carried in this kind of parallel structure under Python.


Author(s):  
Sotirios S. Sarakinos ◽  
Georgios N. Lygidakis ◽  
Ioannis K. Nikolos

In this study an academic Computational Fluid Dynamics (CFD) code, named Galatea-I, is described, which employs the Reynolds Averaged Navier–Stokes (RANS) equations along with the artificial compressibility method and the SST (Shear Stress Transport) turbulence model for the prediction of incompressible viscous flows. For the representation of the computational domain unstructured hybrid grids are utilized, composed of tetrahedral, prismatic and pyramidical elements, while for its discretization a node-centered finite-volume scheme is implemented. Galatea-I is enhanced with a parallelization method, which employs spatial domain decomposition, while the data exchange between processors/processes is performed with the use of the Message Passing Interface (MPI) protocol. In addition, a parallel agglomeration multigrid methodology has been incorporated to improve further its computational performance. The proposed code is validated against steady-state flow benchmark test cases, concerning laminar flow over a cubic cavity and a cylindrical surface, as well as turbulent flow over a rectangular wing with a NACA0012 airfoil. The obtained results, compared with these of corresponding reference solvers, reveal Galatea-I’s potential for simulation of inviscid, viscous laminar and turbulent incompressible flows.


Author(s):  
Peng Wen ◽  
Wei Qiu

This paper presents the further development of numerical simulation method to solve 3-D highly non-linear slamming problems using parallel computing algorithms. The water entry problems are treated as multi-phase problems (solid, water and air) and governed by the Navier-Stokes (N-S) equations. They are solved by the three-dimensional constrained interpolation profile (CIP) method. The interfaces between different phases are captured using density functions. In the computation, the 3-D CIP method is employed for the advection phase of the N-S equations and a pressure-based algorithm is applied for the non-advection phase. The bi-conjugate gradient stabilized method (BiCGSTAB) is utilized to solve the linear equation systems. A Message Passing Interface (MPI) parallel computing scheme was implemented in the computations. For the parallel computations, the three-dimensional Cartesian decomposition of the computational domain was used. The speed-up performance of various decomposition schemes were studied. Validation studies were carried out for the water entry of a 3-D wedge and a 3-D ship section with prescribed velocities. The computed slamming force, pressure distribution and free-surface elevations are compared with experimental results and numerical results by other methods.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 240
Author(s):  
Frédéric Jarlier ◽  
Nicolas Joly ◽  
Nicolas Fedy ◽  
Thomas Magalhaes ◽  
Leonor Sirotti ◽  
...  

Life science has entered the so-called 'big data era' where biologists, clinicians and bioinformaticians are overwhelmed with high-throughput sequencing data. While they offer new insights to decipher the genome structure they also raise major challenges to use them for daily clinical practice care and diagnosis purposes as they are bigger and bigger. Therefore, we implemented a software to reduce the time to delivery for the alignment and the sorting of high-throughput sequencing data.  Our solution is implemented using Message Passing Interface and is intended for high-performance computing architecture. The software scales linearly with respect to the size of the data and ensures a total reproducibility with the traditional tools. For example, a 300X whole genome can be aligned and sorted within less than 9 hours with 128 cores. The software offers significant speed-up using multi-cores and multi-nodes parallelization.


Author(s):  
Vladimir Mironov ◽  
Alexander Moskovsky ◽  
Michael D’Mello ◽  
Yuri Alexeev

The Hartree–Fock method in the General Atomic and Molecular Structure System (GAMESS) quantum chemistry package represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals and the building of the Fock matrix. These are the central components of the main self consistent field (SCF) loop, the key hot spot in electronic structure codes. By threading the Message Passing Interface (MPI) ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4× to 6× for large systems) but also achieve a significant ([Formula: see text]×) reduction in the overall memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel® Xeon Phi™ supercomputer. Scaling numbers are reported on up to 7680 cores on Intel Xeon Phi coprocessors.


Author(s):  
Peng Wen ◽  
Wei Qiu

A constrained interpolation profile (CIP) method has been developed to solve 2-D water entry problems. This paper presents the further development of the numerical method using staggered grids and a parallel computing algorithm. In this work, the multi-phase slamming problems, governed by the Navier-Stokes (N-S) equations, are solved by a CIP-based finite difference method. The interfaces between different phases (solid, water and air) are captured using density functions. A parallel computing algorithm based on the Message Passing Interface (MPI) method and the domain decomposition scheme was implemented to speed up the computations. The effect of decomposition scheme on the solution and the speed-up were studied. Validation studies were carried out for the water entry of various 2-D wedges and a ship section. The predicted slamming force, pressure distribution and free surface elevation are compared with experimental results and other numerical results.


2017 ◽  
Vol 139 (2) ◽  
Author(s):  
Taehyo Park ◽  
Shengjie Li ◽  
Mina Lee ◽  
Moonho Tak

Nowadays, the numerical method has become a very important approach for solving complex problems in engineering and science. Some grid-based methods such as the finite difference method (FDM) and finite element method (FEM) have already been widely applied to various areas; however, they still suffer from inherent difficulties which limit their applications to many problems. Therefore, a strong interest is focused on the meshfree methods such as smoothed particle hydrodynamics (SPH) to simulate fluid flow recently due to the advantages in dealing with some complicated problems. In the SPH method, a great number of particles will be used because the whole domain is represented by a set of arbitrarily distributed particles. To improve the numerical efficiency, parallelization using message-passing interface (MPI) is applied to the problems with the large computational domain. In parallel computing, the whole domain is decomposed by the parallel method for continuity of subdomain boundary under the single instruction multiple data (SIMD) and also based on the procedure of the SPH computations. In this work, a new scheme of parallel computing is employed into the SPH method to analyze SPH particle fluid. In this scheme, the whole domain is decomposed into subdomains under the SIMD process and it composes the boundary conditions to the interface particles which will improve the detection of neighbor particles near the boundary. With the method of parallel computing, the SPH method is to be more flexible and perform better.


Sign in / Sign up

Export Citation Format

Share Document