MPI to Coarray Fortran: Experiences with a CFD Solver for Unstructured Meshes

High-resolution numerical methods and unstructured meshes are required in many applications of Computational Fluid Dynamics (CFD). These methods are quite computationally expensive and hence benefit from being parallelized. Message Passing Interface (MPI) has been utilized traditionally as a parallelization strategy. However, the inherent complexity of MPI contributes further to the existing complexity of the CFD scientific codes. The Partitioned Global Address Space (PGAS) parallelization paradigm was introduced in an attempt to improve the clarity of the parallel implementation. We present our experiences of converting an unstructured high-resolution compressible Navier-Stokes CFD solver from MPI to PGAS Coarray Fortran. We present the challenges, methodology, and performance measurements of our approach using Coarray Fortran. With the Cray compiler, we observe Coarray Fortran as a viable alternative to MPI. We are hopeful that Intel and open-source implementations could be utilized in the future.

Download Full-text

Parallel implementation for HSLO(3)-FDTD with message passing interface on Distributed Memory Architecture

2006 International Conference on Computing & Informatics ◽

10.1109/icoci.2006.5276531 ◽

2006 ◽

Author(s):

Mohammad Khatim Hasan ◽

Mohamed Othman ◽

Jalil Md Desa ◽

Zulkifly Abbas ◽

Jumat Sulaiman

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Distributed Memory ◽

Parallel Implementation ◽

Memory Architecture ◽

Distributed Memory Architecture

Download Full-text

Reducing communication in algebraic multigrid with multi-step node aware communication

The International Journal of High Performance Computing Applications ◽

10.1177/1094342020925535 ◽

2020 ◽

Vol 34 (5) ◽

pp. 547-561

Author(s):

Amanda Bienz ◽

William D Gropp ◽

Luke N Olson

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation ◽

Algebraic Multigrid ◽

Sparse Linear Systems ◽

Parallel Scalability ◽

Strong Scaling ◽

The Cost ◽

Communication Schedule ◽

Inter Process Communication

Algebraic multigrid (AMG) is often viewed as a scalable [Formula: see text] solver for sparse linear systems. Yet, AMG lacks parallel scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid hierarchy and in the iterative solve phase. This work introduces a parallel implementation of AMG that reduces the cost of communication, yielding improved parallel scalability. It is common in Message Passing Interface (MPI), particularly in the MPI-everywhere approach, to arrange inter-process communication, so that communication is transported regardless of the location of the send and receive processes. Performance tests show notable differences in the cost of intra- and internode communication, motivating a restructuring of communication. In this case, the communication schedule takes advantage of the less costly intra-node communication, reducing both the number and the size of internode messages. Node-centric communication extends to the range of components in both the setup and solve phase of AMG, yielding an increase in the weak and strong scaling of the entire method.

Download Full-text

Compact Modeling Framework v3.0 for high-resolution global ocean–ice–atmosphere models

Geoscientific Model Development ◽

10.5194/gmd-11-3983-2018 ◽

2018 ◽

Vol 11 (10) ◽

pp. 3983-3997 ◽

Cited By ~ 6

Author(s):

Vladimir V. Kalmykov ◽

Rashit A. Ibrayev ◽

Maxim N. Kaurkin ◽

Konstantin V. Ushakov

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Global Ocean ◽

Optimal Interpolation ◽

Compact Modeling ◽

Software Environment ◽

Modeling Framework ◽

Partitioned Global Address Space ◽

Version 2.0 ◽

High Level

Abstract. We present a new version of the Compact Modeling Framework (CMF3.0) developed for the software environment of stand-alone and coupled global geophysical fluid models. The CMF3.0 is designed for use on high- and ultrahigh-resolution models on massively parallel supercomputers.The key features of the previous CMF, version 2.0, are mentioned to reflect progress in our research. In CMF3.0, the message passing interface (MPI) approach with a high-level abstract driver, optimized coupler interpolation and I/O algorithms is replaced with the Partitioned Global Address Space (PGAS) paradigm communications scheme, while the central hub architecture evolves into a set of simultaneously working services. Performance tests for both versions are carried out. As an addition, some information about the parallel realization of the EnOI (Ensemble Optimal Interpolation) data assimilation method and the nesting technology, as program services of the CMF3.0, is presented.

Download Full-text

Asynchronous Parallelization of a CFD Solver

Journal of Computational Engineering ◽

10.1155/2015/295393 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Daniel S. Abdi ◽

Girma T. Bitsuamlak

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Stokes Equations ◽

Domain Decomposition Method ◽

Asynchronous Communication ◽

Navier Stokes ◽

Navier Stokes Equations ◽

Asynchronous Iterations ◽

Alternative Approach ◽

Asynchronous Methods

A Navier-Stokes equations solver is parallelized to run on a cluster of computers using the domain decomposition method. Two approaches of communication and computation are investigated, namely, synchronous and asynchronous methods. Asynchronous communication between subdomains is not commonly used in CFD codes; however, it has a potential to alleviate scaling bottlenecks incurred due to processors having to wait for each other at designated synchronization points. A common way to avoid this idle time is to overlap asynchronous communication with computation. For this to work, however, there must be something useful and independent a processor can do while waiting for messages to arrive. We investigate an alternative approach of computation, namely, conducting asynchronous iterations to improve local subdomain solution while communication is in progress. An in-house CFD code is parallelized using message passing interface (MPI), and scalability tests are conducted that suggest asynchronous iterations are a viable way of parallelizing CFD code.

Download Full-text

A highly portable parallel implementation of AMBER4 using the message passing interface standard

Journal of Computational Chemistry ◽

10.1002/jcc.540161110 ◽

1995 ◽

Vol 16 (11) ◽

pp. 1420-1427 ◽

Cited By ~ 18

Author(s):

James J. Vincent ◽

Kenneth M. Merz

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation

Download Full-text

Vortex core identification in viscous hydrodynamics

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2005.1620 ◽

2005 ◽

Vol 363 (1833) ◽

pp. 1937-1948 ◽

Cited By ~ 8

Author(s):

Lucas I Finn ◽

Bruce M Boghosian ◽

Christopher N Kottke

Keyword(s):

Software Package ◽

Message Passing ◽

Message Passing Interface ◽

Stokes Equations ◽

Navier Stokes ◽

Computational Steering ◽

Navier Stokes Equations ◽

Viscous Hydrodynamics ◽

Knots And Links ◽

User Intervention

We describe a software package designed for the investigation of topological fluid dynamics with a novel algorithm for locating and tracking vortex cores. The package is equipped with modules for generating desired vortex knots and links and evolving them according to the Navier–Stokes equations, while tracking and visualizing them. The package is parallelized using a message passing interface for a multiprocessor environment and makes use of a computational steering library for dynamic user intervention.

Download Full-text

Kemari: A Portable High Performance Fortran System for Distributed Memory Parallel Processors

Scientific Programming ◽

10.1155/1997/743965 ◽

1997 ◽

Vol 6 (1) ◽

pp. 41-58 ◽

Cited By ~ 2

Author(s):

T. Kamachi ◽

A. MÜller ◽

R. RÜhl ◽

Y. Seo ◽

K. Suehiro ◽

...

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Data Distribution ◽

Programming Environment ◽

Performance Measurements ◽

High Performance Fortran ◽

Additional Control ◽

Compilation Process ◽

Structured Problems

We have developed a compilation system which extends High Performance Fortran (HPF) in various aspects. We support the parallelization of well-structured problems with loop distribution and alignment directives similar to HPF's data distribution directives. Such directives give both additional control to the user and simplify the compilation process. For the support of unstructured problems, we provide directives for dynamic data distribution through user-defined mappings. The compiler also allows integration of message-passing interface (MPI) primitives. The system is part of a complete programming environment which also comprises a parallel debugger and a performance monitor and analyzer. After an overview of the compiler, we describe the language extensions and related compilation mechanisms in detail. Performance measurements demonstrate the compiler's applicability to a variety of application classes.

Download Full-text

Evaluation of a Parallel Agglomeration Multigrid Finite-Volume Algorithm, Named Galatea-I, for the Simulation of Incompressible Flows on 3D Hybrid Unstructured Grids

Volume 1: Advances in Aerospace Technology ◽

10.1115/imece2014-39759 ◽

2014 ◽

Author(s):

Sotirios S. Sarakinos ◽

Georgios N. Lygidakis ◽

Ioannis K. Nikolos

Keyword(s):

Finite Volume ◽

Message Passing ◽

Data Exchange ◽

Message Passing Interface ◽

Incompressible Flows ◽

Navier Stokes ◽

Finite Volume Scheme ◽

Computational Domain ◽

Naca0012 Airfoil ◽

Computational Performance

In this study an academic Computational Fluid Dynamics (CFD) code, named Galatea-I, is described, which employs the Reynolds Averaged Navier–Stokes (RANS) equations along with the artificial compressibility method and the SST (Shear Stress Transport) turbulence model for the prediction of incompressible viscous flows. For the representation of the computational domain unstructured hybrid grids are utilized, composed of tetrahedral, prismatic and pyramidical elements, while for its discretization a node-centered finite-volume scheme is implemented. Galatea-I is enhanced with a parallelization method, which employs spatial domain decomposition, while the data exchange between processors/processes is performed with the use of the Message Passing Interface (MPI) protocol. In addition, a parallel agglomeration multigrid methodology has been incorporated to improve further its computational performance. The proposed code is validated against steady-state flow benchmark test cases, concerning laminar flow over a cubic cavity and a cylindrical surface, as well as turbulent flow over a rectangular wing with a NACA0012 airfoil. The obtained results, compared with these of corresponding reference solvers, reveal Galatea-I’s potential for simulation of inviscid, viscous laminar and turbulent incompressible flows.

Download Full-text

Optimized parallel simulations of analytic bond-order potentials on hybrid shared/distributed memory with MPI and OpenMP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017727060 ◽

2017 ◽

Vol 33 (2) ◽

pp. 227-241 ◽

Cited By ~ 1

Author(s):

Carlos Teijeiro ◽

Thomas Hammerschmidt ◽

Ralf Drautz ◽

Godehard Sutmann

Keyword(s):

Message Passing ◽

Bond Order ◽

Message Passing Interface ◽

Parallel Implementation ◽

Computational Cost ◽

Parallel Simulations ◽

Decomposition Scheme ◽

Significant Performance ◽

Restricted Volume ◽

Bond Order Potentials

Analytic bond-order potentials (BOPs) allow to obtain a highly accurate description of interatomic interactions at a reasonable computational cost. However, for simulations with very large systems, the high memory demands require the use of a parallel implementation, which at the same time also optimizes the use of computational resources. The calculations of analytic BOPs are performed for a restricted volume around every atom and therefore have shown to be well suited for a message passing interface (MPI)-based parallelization based on a domain decomposition scheme, in which one process manages one big domain using the entire memory of a compute node. On the basis of this approach, the present work focuses on the analysis and enhancement of its performance on shared memory by using OpenMP threads on each MPI process, in order to use many cores per node to speed up computations and minimize memory bottlenecks. Different algorithms are described and their corresponding performance results are presented, showing significant performance gains for highly parallel systems with hybrid MPI/OpenMP simulations up to several thousands of threads.

Download Full-text

A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc

Geoscientific Model Development Discussions ◽

10.5194/gmdd-8-2369-2015 ◽

2015 ◽

Vol 8 (3) ◽

pp. 2369-2402

Author(s):

W. He ◽

C. Beyer ◽

J. H. Fleckenstein ◽

E. Jang ◽

O. Kolditz ◽

...

Keyword(s):

Message Passing ◽

Reactive Transport ◽

Message Passing Interface ◽

Transport Processes ◽

Coupled Processes ◽

Scientific Software ◽

Geochemical Reactions ◽

Optimized Allocation ◽

And Performance ◽

The One

Abstract. This technical paper presents an efficient and performance-oriented method to model reactive mass transport processes in environmental and geotechnical subsurface systems. The open source scientific software packages OpenGeoSys and IPhreeqc have been coupled, to combine their individual strengths and features to simulate thermo-hydro-mechanical-chemical coupled processes in porous and fractured media with simultaneous consideration of aqueous geochemical reactions. Furthermore, a flexible parallelization scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand, and the underlying processes such as for groundwater flow or solute transport on the other hand. The coupling interface and parallelization scheme have been tested and verified in terms of precision and performance.

Download Full-text