Parallel Solvers for Finite-Difference Modeling of Large-Scale, High-Resolution Electromagnetic Problems in MRI

With the movement of magnetic resonance imaging (MRI) technology towards higher field (and therefore frequency) systems, the interaction of the fields generated by the system with patients, healthcare workers, and internally within the system is attracting more attention. Due to the complexity of the interactions, computational modeling plays an essential role in the analysis, design, and development of modern MRI systems. As a result of the large computational scale associated with most of the MRI models, numerical schemes that rely on a single computer processing unit often require a significant amount of memory and long computational times, which makes modeling of these problems quite inefficient. This paper presents dedicated message passing interface (MPI), OPENMP parallel computing solvers for finite-difference time-domain (FDTD), and quasistatic finite-difference (QSFD) schemes. The FDTD and QSFD methods have been widely used to model/ analyze the induction of electric fields/ currents in voxel phantoms and MRI system components at high and low frequencies, respectively. The power of the optimized parallel computing architectures is illustrated by distinct, large-scale field calculation problems and shows significant computational advantages over conventional single processing platforms.

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text

Interpretive MPI for Parallel Computing

Volume 3: 28th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2008-49996 ◽

2008 ◽

Author(s):

Yu-Cheng Chou ◽

Harry H. Cheng

Keyword(s):

Parallel Computing ◽

Programming Languages ◽

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Rapid Development ◽

Web Based ◽

Heterogeneous Platforms ◽

C Programs ◽

Computation Speedup

Message Passing Interface (MPI) is a standardized library specification designed for message-passing parallel programming on large-scale distributed systems. A number of MPI libraries have been implemented to allow users to develop portable programs using the scientific programming languages, Fortran, C and C++. Ch is an embeddable C/C++ interpreter that provides an interpretive environment for C/C++ based scripts and programs. Combining Ch with any MPI C/C++ library provides the functionality for rapid development of MPI C/C++ programs without compilation. In this article, the method of interfacing Ch scripts with MPI C implementations is introduced by using the MPICH2 C library as an example. The MPICH2-based Ch MPI package provides users with the ability to interpretively run MPI C program based on the MPICH2 C library. Running MPI programs through the MPICH2-based Ch MPI package across heterogeneous platforms consisting of Linux and Windows machines is illustrated. Comparisons for the bandwidth, latency, and parallel computation speedup between C MPI, Ch MPI, and MPI for Python in an Ethernet-based environment comprising identical Linux machines are presented. A Web-based example is given to demonstrate the use of Ch and MPICH2 in C based CGI scripting to facilitate the development of Web-based applications for parallel computing.

Download Full-text

Parallelizing a serial code: open–source module, EZ Parallel 1.0, and geophysics examples

10.5194/gmd-2020-257 ◽

2020 ◽

Author(s):

Jason Louis Turner ◽

Samuel N. Stechmann

Keyword(s):

Parallel Computing ◽

Finite Difference ◽

Message Passing ◽

Message Passing Interface ◽

Three Dimensional ◽

Computer Architectures ◽

Strong Scaling ◽

Pseudo Spectral ◽

Fortran Programming Language ◽

Single Processor

Abstract. Parallel computing can offer substantial speedup of numerical simulations in comparison to serial computing, as parallel computing uses many processors simultaneously rather than a single processor. However, it typically also requires substantial time and effort to convert a serial code into a parallel code. Here, a new module is developed to reduce the time and effort required to parallelize a serial code. The tested version of the module is written in the Fortran programming language,while the framework could also be extended to other languages (C++, Python, Julia, etc.). The Message Passing Interface is used to allow for either shared-memory or distributed-memory computer architectures. The software is designed for solving partial differential equations on a rectangular two-dimensional or three-dimensional domain, using finite difference, finite volume, pseudo-spectral, or other similar numerical methods. Examples are provided for two idealized models of atmospheric and oceanic fluid dynamics: the two-level quasi-geostrophic equations, and the stochastic heat equation as a model for turbulent advection–diffusion of either water vapor and clouds or sea surface height variability. In tests of the parallelized code, the strong scaling efficiency for the finite difference code is seen to be roughly 80 % to 90 %, which is achieved by adding roughly only 10 new lines to the serial code. Therefore, EZ Parallel provides great benefits with minimal additional effort.

Download Full-text

Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016410080 ◽

2016 ◽

Vol 14 (02) ◽

pp. 1641008 ◽

Cited By ~ 8

Author(s):

Dmitry Suplatov ◽

Nina Popova ◽

Sergey Zhumatiy ◽

Vladimir Voevodin ◽

Vytas Švedas

Keyword(s):

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Parallel Implementation ◽

Biological Data ◽

Rapid Expansion ◽

Biological Macromolecules ◽

Processing Unit ◽

Common Input ◽

Systematic Analysis

Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads — one for task management and communication, and another for subtask execution — are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .

Download Full-text

Simulation-Based Scheduling of Waterway Projects Using a Parallel Genetic Algorithm

Transportation Systems and Engineering ◽

10.4018/978-1-4666-8473-7.ch016 ◽

2015 ◽

pp. 334-347 ◽

Cited By ~ 2

Author(s):

Ning Yang ◽

Shiaaulir Wang ◽

Paul Schonfeld

Keyword(s):

Genetic Algorithm ◽

Parallel Computing ◽

Message Passing ◽

Message Passing Interface ◽

Computation Time ◽

Parallel Genetic Algorithm ◽

Simulation Based ◽

Multiple Processors ◽

Simulation Based Optimization ◽

Speed Up

A Parallel Genetic Algorithm (PGA) is used for a simulation-based optimization of waterway project schedules. This PGA is designed to distribute a Genetic Algorithm application over multiple processors in order to speed up the solution search procedure for a very large combinational problem. The proposed PGA is based on a global parallel model, which is also called a master-slave model. A Message-Passing Interface (MPI) is used in developing the parallel computing program. A case study is presented, whose results show how the adaption of a simulation-based optimization algorithm to parallel computing can greatly reduce computation time. Additional techniques which are found to further improve the PGA performance include: (1) choosing an appropriate task distribution method, (2) distributing simulation replications instead of different solutions, (3) avoiding the simulation of duplicate solutions, (4) avoiding running multiple simulations simultaneously in shared-memory processors, and (5) avoiding using multiple processors which belong to different clusters (physical sub-networks).

Download Full-text

Parallelization of the Lattice Boltzmann Method in Simulating Buoyancy-Driven Convection Heat Transfer

Heat Transfer, Volume 2 ◽

10.1115/imece2004-61871 ◽

2004 ◽

Author(s):

Anoosheh Niavarani-Kheirier ◽

Masoud Darbandi ◽

Gerry E. Schneider

Keyword(s):

Lattice Boltzmann Method ◽

Lattice Boltzmann ◽

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Parallel Machines ◽

Convection Heat Transfer ◽

Wide Range ◽

Buoyancy Driven Convection ◽

Boltzmann Method

The main objective of the current work is to utilize Lattice Boltzmann Method (LBM) for simulating buoyancy-driven flow considering the hybrid thermal lattice Boltzmann equation (HTLBE). After deriving the required formulations, they are validated against a wide range of Rayleigh numbers in buoyancy-driven square cavity problem. The performance of the method is investigated on parallel machines using Message Passing Interface (MPI) library and implementing domain decomposition technique to solve problems with large order of computations. The achieved results show that the code is highly efficient to solve large scale problems with excellent speedup.

Download Full-text

HPC simulations of brownout: A noninteracting particles dynamic model

The International Journal of High Performance Computing Applications ◽

10.1177/1094342020905971 ◽

2020 ◽

Vol 34 (3) ◽

pp. 267-281

Author(s):

Roberto Porcù ◽

Edie Miglio ◽

Nicola Parolini ◽

Mattia Penati ◽

Noemi Vergopolan

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Time Integration ◽

Computational Cost ◽

Aircraft Design ◽

Euler Method ◽

Processing Unit ◽

Integration Algorithm

Helicopters can experience brownout when flying close to a dusty surface. The uplifting of dust in the air can remarkably restrict the pilot’s visibility area. Consequently, a brownout can disorient the pilot and lead to the helicopter collision against the ground. Given its risks, brownout has become a high-priority problem for civil and military operations. Proper helicopter design is thus critical, as it has a strong influence over the shape and density of the cloud of dust that forms when brownout occurs. A way forward to improve aircraft design against brownout is the use of particle simulations. For simulations to be accurate and comparable to the real phenomenon, billions of particles are required. However, using a large number of particles, serial simulations can be slow and too computationally expensive to be performed. In this work, we investigate an message passing interface (MPI) + graphics processing unit (multi-GPU) approach to simulate brownout. In specific, we use a semi-implicit Euler method to consider the particle dynamics in a Lagrangian way, and we adopt a precomputed aerodynamic field. Here, we do not include particle–particle collisions in the model; this allows for independent trajectories and effective model parallelization. To support our methodology, we provide a speedup analysis of the parallelization concerning the serial and pure-MPI simulations. The results show (i) very high speedups of the MPI + multi-GPU implementation with respect to the serial and pure-MPI ones, (ii) excellent weak and strong scalability properties of the implemented time-integration algorithm, and (iii) the possibility to run realistic simulations of brownout with billions of particles at a relatively small computational cost. This work paves the way toward more realistic brownout simulations, and it highlights the potential of high-performance computing for aiding and advancing aircraft design for brownout mitigation.

Download Full-text

Comparing Message Passing Interface and MapReduce for large-scale parallel ranking and selection

2015 Winter Simulation Conference (WSC) ◽

10.1109/wsc.2015.7408542 ◽

2015 ◽

Cited By ~ 2

Author(s):

Eric C. Ni ◽

Dragos F. Ciocan ◽

Shane G. Henderson ◽

Susan R. Hunter

Keyword(s):

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Ranking And Selection

Download Full-text

Analysis of the Calculation of a Plasma Sheath Using the Parallel SO-DGTD Method

International Journal of Antennas and Propagation ◽

10.1155/2019/7160913 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9

Author(s):

Qian Yang ◽

Bing Wei ◽

Linqian Li ◽

Debiao Ge

Keyword(s):

Discontinuous Galerkin ◽

Cross Section ◽

Time Domain ◽

Message Passing ◽

High Speed ◽

Large Scale ◽

Message Passing Interface ◽

Shift Operator ◽

Plasma Sheath ◽

Blunt Cone

The plasma sheath is known as a popular topic of computational electromagnetics, and the plasma case is more resource-intensive than the non-plasma case. In this paper, a parallel shift-operator discontinuous Galerkin time-domain method using the MPI (Message Passing Interface) library is proposed to solve the large-scale plasma problems. To demonstrate our algorithm, a plasma sheath model of the high-speed blunt cone was established based on the results of the multiphysics software, and our algorithm was used to extract the radar cross-section (RCS) versus different incident angles of the model.

Download Full-text

Parallel Performance of MPI Based Parallel FDTD on NUMA Architecture Workstation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.1115 ◽

2012 ◽

Vol 532-533 ◽

pp. 1115-1119

Author(s):

Xiao Mei Guo ◽

Wei Zhao ◽

Li Hong Zhang ◽

Wen Hua Yu

Keyword(s):

Finite Difference ◽

Time Domain ◽

Finite Difference Time Domain ◽

Message Passing ◽

Message Passing Interface ◽

Memory Access ◽

Parallel Method ◽

Parallel Performance ◽

Parallel Environment ◽

Difference Time

This paper introduces a parallel FDTD (Finite Difference Time Domain) algorithm based on MPI (Message Passing Interface) parallel environment and NUMA (Non-Uniform Memory Access) architecture workstation. The FDTD computation is carried out independently in local meshes in each process. The data are exchanged by communication between adjacent subdomains to achieve the FDTD parallel method. The results show the consistency between serial and parallel algorithms, and the computing efficiency is improved effectively.

Download Full-text