MPI-FT: PORTABLE FAULT TOLERANCE SCHEME FOR MPI

In this paper, we propose the design and development of a fault tolerant and recovery scheme for the Message Passing Interface (MPI). The proposed scheme consists of a detection mechanism for detecting process failures, and a recovery mechanism. Two different cases are considered, both assuming the existence of a monitoring process, the Observer which triggers the recovery procedure in case of failure. In the first case, each process keeps a buffer with its own message traffic to be used in case of failure, while the implementor uses periodical tests for notification of failure by the Observer. The recovery function simulates all the communication of the processes with the dead one by re-sending to the replacement process all the messages destined for the dead one. In the second case, the Observer receives and stores all message traffic, and sends to the replacement all the buffered messages destined for the dead process. Solutions are provided to the dead communicator problem caused by the death of a process. A description of the prototype developed is provided along with the results of the experiments performed for efficiency and performance.

Download Full-text

Design, implementation and performance of fault-tolerant message passing interface (MPI)

Proceedings. Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region, 2004. ◽

10.1109/hpcasia.2004.1324026 ◽

2004 ◽

Author(s):

A.D. Selvakumar ◽

P.M. Sobha ◽

G.C. Ravindra ◽

R. Pitchiah

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Fault Tolerant ◽

And Performance

Download Full-text

A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc

Geoscientific Model Development Discussions ◽

10.5194/gmdd-8-2369-2015 ◽

2015 ◽

Vol 8 (3) ◽

pp. 2369-2402

Author(s):

W. He ◽

C. Beyer ◽

J. H. Fleckenstein ◽

E. Jang ◽

O. Kolditz ◽

...

Keyword(s):

Message Passing ◽

Reactive Transport ◽

Message Passing Interface ◽

Transport Processes ◽

Coupled Processes ◽

Scientific Software ◽

Geochemical Reactions ◽

Optimized Allocation ◽

And Performance ◽

The One

Abstract. This technical paper presents an efficient and performance-oriented method to model reactive mass transport processes in environmental and geotechnical subsurface systems. The open source scientific software packages OpenGeoSys and IPhreeqc have been coupled, to combine their individual strengths and features to simulate thermo-hydro-mechanical-chemical coupled processes in porous and fractured media with simultaneous consideration of aqueous geochemical reactions. Furthermore, a flexible parallelization scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand, and the underlying processes such as for groundwater flow or solute transport on the other hand. The coupling interface and parallelization scheme have been tested and verified in terms of precision and performance.

Download Full-text

The MPI and OpenMP Implementation of Parallel Algorithm for Generating Mandelbrot Set

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.26 ◽

2014 ◽

Vol 571-572 ◽

pp. 26-29

Author(s):

Xiang Wei Duan ◽

Wei Chang Shen ◽

Jun Guo

Keyword(s):

Parallel Algorithm ◽

Shared Memory ◽

Message Passing ◽

Message Passing Interface ◽

Algorithm Design ◽

Performance Testing ◽

Mandelbrot Set ◽

The Difference ◽

And Performance

The paper introduce the Mandelbrot Set and the message passing interface (MPI) and shared-memory (OpenMP), analyses the characteristic of algorithm design in the MPI and OpenMP environment, describes the implementation of parallel algorithm about Mandelbrot Set in the MPI environment and the OpenMP environment, conducted a series of evaluation and performance testing during the process of running, then the difference between the two system implementations is compared.

Download Full-text

Unified fault-tolerance framework for hybrid task-parallel message-passing applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016669416 ◽

2016 ◽

Vol 32 (5) ◽

pp. 641-657 ◽

Cited By ~ 5

Author(s):

Omer Subasi ◽

Tatiana Martsinkevich ◽

Ferad Zyulkyarov ◽

Osman Unsal ◽

Jesus Labarta ◽

...

Keyword(s):

Fault Tolerance ◽

Performance Improvement ◽

Message Passing ◽

Message Passing Interface ◽

Fault Tolerant ◽

Performance Score ◽

Fine Grained ◽

Transient Errors ◽

Task Parallel ◽

Complete Failure

We present a unified fault-tolerance framework for task-parallel message-passing applications to mitigate transient errors. First, we propose a fault-tolerant message-logging protocol that only requires the restart of the task that experienced the error and transparently handles any message passing interface calls inside the task. In our experiments we demonstrate that our fault-tolerant solution has a reasonable overhead, with a maximum observed overhead of 4.5%. We also show that fine-grained parallelization is important for hiding the overheads related to the protocol as well as the recovery of tasks. Secondly, we develop a mathematical model to unify task-level checkpointing and our protocol with system-wide checkpointing in order to provide complete failure coverage. We provide closed formulas for the optimal checkpointing interval and the performance score of the unified scheme. Experimental results show that the performance improvement can be as high as 98% with the unified scheme.

Download Full-text

MPI to Coarray Fortran: Experiences with a CFD Solver for Unstructured Meshes

Scientific Programming ◽

10.1155/2017/3409647 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Anuj Sharma ◽

Irene Moulitsas

Keyword(s):

High Resolution ◽

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation ◽

Unstructured Meshes ◽

Navier Stokes ◽

Performance Measurements ◽

Partitioned Global Address Space ◽

Computational Fluid Dynamics Cfd ◽

And Performance

High-resolution numerical methods and unstructured meshes are required in many applications of Computational Fluid Dynamics (CFD). These methods are quite computationally expensive and hence benefit from being parallelized. Message Passing Interface (MPI) has been utilized traditionally as a parallelization strategy. However, the inherent complexity of MPI contributes further to the existing complexity of the CFD scientific codes. The Partitioned Global Address Space (PGAS) parallelization paradigm was introduced in an attempt to improve the clarity of the parallel implementation. We present our experiences of converting an unstructured high-resolution compressible Navier-Stokes CFD solver from MPI to PGAS Coarray Fortran. We present the challenges, methodology, and performance measurements of our approach using Coarray Fortran. With the Cray compiler, we observe Coarray Fortran as a viable alternative to MPI. We are hopeful that Intel and open-source implementations could be utilized in the future.

Download Full-text

VISUAL PROGRAMMING FOR MESSAGE-PASSING SYSTEMS

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194099000231 ◽

1999 ◽

Vol 09 (04) ◽

pp. 397-423 ◽

Cited By ~ 8

Author(s):

NENAD STANKOVIC ◽

KANG ZHANG

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Direct Interaction ◽

Visual Programming ◽

Levels Of Abstraction ◽

Concrete Objects ◽

Real Objects ◽

And Performance ◽

Flow Graphs ◽

Parallel Debugging

The attractiveness of visual programming stems in large part from the direct interaction with program elements as if they were real objects, since people deal better with concrete objects than with the abstract. This paper describes a new graph based software visualization tool for parallel message-passing programming named Visper that combines the levels of abstraction at which message-passing parallel programs are expressed and makes use of compositional programming. Central to the tool is the Process Communication Graph that correlates both the control and data flow graphs into a single graph formalism, without a need for complex textual annotation. The graph can express static and runtime communication and replication structures, as found in Message Passing Interface (MPI) and Parallel Virtual Machine (PVM). It also forms the basis for visualizing parallel debugging and performance.

Download Full-text

Porting the AVS/Express scientific visualization software to Cray XT4

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2011.0133 ◽

2011 ◽

Vol 369 (1949) ◽

pp. 3398-3412 ◽

Cited By ~ 2

Author(s):

George W. Leaver ◽

Martin J. Turner ◽

James S. Perrin ◽

Paul M. Mummery ◽

Philip J. Withers

Keyword(s):

Performance Analysis ◽

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Materials Science ◽

Scientific Visualization ◽

Visualization Software ◽

Science Community ◽

Interactive Application ◽

And Performance

Remote scientific visualization, where rendering services are provided by larger scale systems than are available on the desktop, is becoming increasingly important as dataset sizes increase beyond the capabilities of desktop workstations. Uptake of such services relies on access to suitable visualization applications and the ability to view the resulting visualization in a convenient form. We consider five rules from the e-Science community to meet these goals with the porting of a commercial visualization package to a large-scale system. The application uses message-passing interface (MPI) to distribute data among data processing and rendering processes. The use of MPI in such an interactive application is not compatible with restrictions imposed by the Cray system being considered. We present details, and performance analysis, of a new MPI proxy method that allows the application to run within the Cray environment yet still support MPI communication required by the application. Example use cases from materials science are considered.

Download Full-text

Designing Zero-Copy Message Passing Interface Derived Datatype Communication Over Infiniband: Alternative Approaches and Performance Evaluation

The International Journal of High Performance Computing Applications ◽

10.1177/1094342005054259 ◽

2005 ◽

Vol 19 (2) ◽

pp. 129-142 ◽

Cited By ~ 4

Author(s):

Gopalakrishnan Santhanaraman ◽

Jiesheng Wu ◽

Wei Huang ◽

Dhabaleswar K. Panda

Keyword(s):

Performance Evaluation ◽

Message Passing ◽

Message Passing Interface ◽

Zero Copy ◽

Alternative Approaches ◽

And Performance

Download Full-text

Shared Memory Transport for ALFA

EPJ Web of Conferences ◽

10.1051/epjconf/201921405029 ◽

2019 ◽

Vol 214 ◽

pp. 05029 ◽

Cited By ~ 2

Author(s):

Alexey Rybalchenko ◽

Dennis Klein ◽

Mohammad Al-Turany ◽

Thorsten Kollegger

Keyword(s):

Shared Memory ◽

Particle Physics ◽

Message Passing ◽

Message Passing Interface ◽

Large Data ◽

Building Blocks ◽

Data Transport ◽

High Data ◽

And Performance ◽

Physics Experiments

The high data rates expected for the next generation of particle physics experiments (e.g.: new experiments at FAIR/GSI and the upgrade of CERN experiments) call for dedicated attention with respect to design of the needed computing infrastructure. The common ALICE-FAIR framework ALFA is a modern software layer, that serves as a platform for simulation, reconstruction and analysis of particle physics experiments. Beside standard services needed for simulation and reconstruction of particle physics experiments, ALFA also provides tools for data transport, configuration and deployment. The FairMQ module in ALFA offers building blocks for creating distributed software components (processes) that communicate between each other via message passing. The abstract "message passing" interface in FairMQ has at the moment three implementations: ZeroMQ, nanomsg and shared memory. The newly developed shared memory transport will be presented, that provides significant per-formance benefits for transferring large data chunks between components on the same node. The implementation in FairMQ allows users to switch between the different transports via a trivial configuration change. The design decisions, im-plementation details and performance numbers of the shared memory transport in FairMQ/ALFA will be highlighted.

Download Full-text

CAN AGENT INTELLIGENCE BE USED TO ACHIEVE FAULT TOLERANT PARALLEL COMPUTING SYSTEMS?

Parallel Processing Letters ◽

10.1142/s012962641100028x ◽

2011 ◽

Vol 21 (04) ◽

pp. 379-396 ◽

Cited By ~ 4

Author(s):

BLESSON VARGHESE ◽

GERARD MCKEE ◽

VASSIL ALEXANDROV

Keyword(s):

Parallel Computing ◽

Fault Tolerance ◽

Message Passing ◽

Message Passing Interface ◽

Fault Tolerant ◽

Cognitive Agent ◽

Computing Systems ◽

Agent Based ◽

Alternative Approach ◽

Agent Intelligence

The work reported in this paper is motivated towards validating an alternative approach for fault tolerance over traditional methods like checkpointing that constrain efficacious fault tolerance. Can agent intelligence be used to achieve fault tolerant parallel computing systems? If so, "What agent capabilities are required for fault tolerance?", "What parallel computational tasks can benefit from such agent capabilities?" and "How can agent capabilities be implemented for fault tolerance?" need to be addressed. Cognitive capabilities essential for achieving fault tolerance through agents are considered. Parallel reduction algorithms are identified as a class of algorithms that can benefit from cognitive agent capabilities. The Message Passing Interface is utilized for implementing an intelligent agent based approach. Preliminary results obtained from the experiments validate the feasibility of an agent based approach for achieving fault tolerance in parallel computing systems.

Download Full-text