scholarly journals MPI to Coarray Fortran: Experiences with a CFD Solver for Unstructured Meshes

2017 ◽  
Vol 2017 ◽  
pp. 1-12 ◽  
Author(s):  
Anuj Sharma ◽  
Irene Moulitsas

High-resolution numerical methods and unstructured meshes are required in many applications of Computational Fluid Dynamics (CFD). These methods are quite computationally expensive and hence benefit from being parallelized. Message Passing Interface (MPI) has been utilized traditionally as a parallelization strategy. However, the inherent complexity of MPI contributes further to the existing complexity of the CFD scientific codes. The Partitioned Global Address Space (PGAS) parallelization paradigm was introduced in an attempt to improve the clarity of the parallel implementation. We present our experiences of converting an unstructured high-resolution compressible Navier-Stokes CFD solver from MPI to PGAS Coarray Fortran. We present the challenges, methodology, and performance measurements of our approach using Coarray Fortran. With the Cray compiler, we observe Coarray Fortran as a viable alternative to MPI. We are hopeful that Intel and open-source implementations could be utilized in the future.

Author(s):  
Amanda Bienz ◽  
William D Gropp ◽  
Luke N Olson

Algebraic multigrid (AMG) is often viewed as a scalable [Formula: see text] solver for sparse linear systems. Yet, AMG lacks parallel scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid hierarchy and in the iterative solve phase. This work introduces a parallel implementation of AMG that reduces the cost of communication, yielding improved parallel scalability. It is common in Message Passing Interface (MPI), particularly in the MPI-everywhere approach, to arrange inter-process communication, so that communication is transported regardless of the location of the send and receive processes. Performance tests show notable differences in the cost of intra- and internode communication, motivating a restructuring of communication. In this case, the communication schedule takes advantage of the less costly intra-node communication, reducing both the number and the size of internode messages. Node-centric communication extends to the range of components in both the setup and solve phase of AMG, yielding an increase in the weak and strong scaling of the entire method.


2018 ◽  
Vol 11 (10) ◽  
pp. 3983-3997 ◽  
Author(s):  
Vladimir V. Kalmykov ◽  
Rashit A. Ibrayev ◽  
Maxim N. Kaurkin ◽  
Konstantin V. Ushakov

Abstract. We present a new version of the Compact Modeling Framework (CMF3.0) developed for the software environment of stand-alone and coupled global geophysical fluid models. The CMF3.0 is designed for use on high- and ultrahigh-resolution models on massively parallel supercomputers.The key features of the previous CMF, version 2.0, are mentioned to reflect progress in our research. In CMF3.0, the message passing interface (MPI) approach with a high-level abstract driver, optimized coupler interpolation and I/O algorithms is replaced with the Partitioned Global Address Space (PGAS) paradigm communications scheme, while the central hub architecture evolves into a set of simultaneously working services. Performance tests for both versions are carried out. As an addition, some information about the parallel realization of the EnOI (Ensemble Optimal Interpolation) data assimilation method and the nesting technology, as program services of the CMF3.0, is presented.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Daniel S. Abdi ◽  
Girma T. Bitsuamlak

A Navier-Stokes equations solver is parallelized to run on a cluster of computers using the domain decomposition method. Two approaches of communication and computation are investigated, namely, synchronous and asynchronous methods. Asynchronous communication between subdomains is not commonly used in CFD codes; however, it has a potential to alleviate scaling bottlenecks incurred due to processors having to wait for each other at designated synchronization points. A common way to avoid this idle time is to overlap asynchronous communication with computation. For this to work, however, there must be something useful and independent a processor can do while waiting for messages to arrive. We investigate an alternative approach of computation, namely, conducting asynchronous iterations to improve local subdomain solution while communication is in progress. An in-house CFD code is parallelized using message passing interface (MPI), and scalability tests are conducted that suggest asynchronous iterations are a viable way of parallelizing CFD code.


Author(s):  
Lucas I Finn ◽  
Bruce M Boghosian ◽  
Christopher N Kottke

We describe a software package designed for the investigation of topological fluid dynamics with a novel algorithm for locating and tracking vortex cores. The package is equipped with modules for generating desired vortex knots and links and evolving them according to the Navier–Stokes equations, while tracking and visualizing them. The package is parallelized using a message passing interface for a multiprocessor environment and makes use of a computational steering library for dynamic user intervention.


1997 ◽  
Vol 6 (1) ◽  
pp. 41-58 ◽  
Author(s):  
T. Kamachi ◽  
A. MÜller ◽  
R. RÜhl ◽  
Y. Seo ◽  
K. Suehiro ◽  
...  

We have developed a compilation system which extends High Performance Fortran (HPF) in various aspects. We support the parallelization of well-structured problems with loop distribution and alignment directives similar to HPF's data distribution directives. Such directives give both additional control to the user and simplify the compilation process. For the support of unstructured problems, we provide directives for dynamic data distribution through user-defined mappings. The compiler also allows integration of message-passing interface (MPI) primitives. The system is part of a complete programming environment which also comprises a parallel debugger and a performance monitor and analyzer. After an overview of the compiler, we describe the language extensions and related compilation mechanisms in detail. Performance measurements demonstrate the compiler's applicability to a variety of application classes.


Author(s):  
Sotirios S. Sarakinos ◽  
Georgios N. Lygidakis ◽  
Ioannis K. Nikolos

In this study an academic Computational Fluid Dynamics (CFD) code, named Galatea-I, is described, which employs the Reynolds Averaged Navier–Stokes (RANS) equations along with the artificial compressibility method and the SST (Shear Stress Transport) turbulence model for the prediction of incompressible viscous flows. For the representation of the computational domain unstructured hybrid grids are utilized, composed of tetrahedral, prismatic and pyramidical elements, while for its discretization a node-centered finite-volume scheme is implemented. Galatea-I is enhanced with a parallelization method, which employs spatial domain decomposition, while the data exchange between processors/processes is performed with the use of the Message Passing Interface (MPI) protocol. In addition, a parallel agglomeration multigrid methodology has been incorporated to improve further its computational performance. The proposed code is validated against steady-state flow benchmark test cases, concerning laminar flow over a cubic cavity and a cylindrical surface, as well as turbulent flow over a rectangular wing with a NACA0012 airfoil. The obtained results, compared with these of corresponding reference solvers, reveal Galatea-I’s potential for simulation of inviscid, viscous laminar and turbulent incompressible flows.


Author(s):  
Carlos Teijeiro ◽  
Thomas Hammerschmidt ◽  
Ralf Drautz ◽  
Godehard Sutmann

Analytic bond-order potentials (BOPs) allow to obtain a highly accurate description of interatomic interactions at a reasonable computational cost. However, for simulations with very large systems, the high memory demands require the use of a parallel implementation, which at the same time also optimizes the use of computational resources. The calculations of analytic BOPs are performed for a restricted volume around every atom and therefore have shown to be well suited for a message passing interface (MPI)-based parallelization based on a domain decomposition scheme, in which one process manages one big domain using the entire memory of a compute node. On the basis of this approach, the present work focuses on the analysis and enhancement of its performance on shared memory by using OpenMP threads on each MPI process, in order to use many cores per node to speed up computations and minimize memory bottlenecks. Different algorithms are described and their corresponding performance results are presented, showing significant performance gains for highly parallel systems with hybrid MPI/OpenMP simulations up to several thousands of threads.


2015 ◽  
Vol 8 (3) ◽  
pp. 2369-2402
Author(s):  
W. He ◽  
C. Beyer ◽  
J. H. Fleckenstein ◽  
E. Jang ◽  
O. Kolditz ◽  
...  

Abstract. This technical paper presents an efficient and performance-oriented method to model reactive mass transport processes in environmental and geotechnical subsurface systems. The open source scientific software packages OpenGeoSys and IPhreeqc have been coupled, to combine their individual strengths and features to simulate thermo-hydro-mechanical-chemical coupled processes in porous and fractured media with simultaneous consideration of aqueous geochemical reactions. Furthermore, a flexible parallelization scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand, and the underlying processes such as for groundwater flow or solute transport on the other hand. The coupling interface and parallelization scheme have been tested and verified in terms of precision and performance.


Sign in / Sign up

Export Citation Format

Share Document