CAN AGENT INTELLIGENCE BE USED TO ACHIEVE FAULT TOLERANT PARALLEL COMPUTING SYSTEMS?

The work reported in this paper is motivated towards validating an alternative approach for fault tolerance over traditional methods like checkpointing that constrain efficacious fault tolerance. Can agent intelligence be used to achieve fault tolerant parallel computing systems? If so, "What agent capabilities are required for fault tolerance?", "What parallel computational tasks can benefit from such agent capabilities?" and "How can agent capabilities be implemented for fault tolerance?" need to be addressed. Cognitive capabilities essential for achieving fault tolerance through agents are considered. Parallel reduction algorithms are identified as a class of algorithms that can benefit from cognitive agent capabilities. The Message Passing Interface is utilized for implementing an intelligent agent based approach. Preliminary results obtained from the experiments validate the feasibility of an agent based approach for achieving fault tolerance in parallel computing systems.

Download Full-text

Unified fault-tolerance framework for hybrid task-parallel message-passing applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016669416 ◽

2016 ◽

Vol 32 (5) ◽

pp. 641-657 ◽

Cited By ~ 5

Author(s):

Omer Subasi ◽

Tatiana Martsinkevich ◽

Ferad Zyulkyarov ◽

Osman Unsal ◽

Jesus Labarta ◽

...

Keyword(s):

Fault Tolerance ◽

Performance Improvement ◽

Message Passing ◽

Message Passing Interface ◽

Fault Tolerant ◽

Performance Score ◽

Fine Grained ◽

Transient Errors ◽

Task Parallel ◽

Complete Failure

We present a unified fault-tolerance framework for task-parallel message-passing applications to mitigate transient errors. First, we propose a fault-tolerant message-logging protocol that only requires the restart of the task that experienced the error and transparently handles any message passing interface calls inside the task. In our experiments we demonstrate that our fault-tolerant solution has a reasonable overhead, with a maximum observed overhead of 4.5%. We also show that fine-grained parallelization is important for hiding the overheads related to the protocol as well as the recovery of tasks. Secondly, we develop a mathematical model to unify task-level checkpointing and our protocol with system-wide checkpointing in order to provide complete failure coverage. We provide closed formulas for the optimal checkpointing interval and the performance score of the unified scheme. Experimental results show that the performance improvement can be as high as 98% with the unified scheme.

Download Full-text

Mobile Agent Based Autonomic Dynamic Parallel Computing

Volume 3: ASME/IEEE 2009 International Conference on Mechatronic and Embedded Systems and Applications; 20th Reliability, Stress Analysis, and Failure Prevention Conference ◽

10.1115/detc2009-87750 ◽

2009 ◽

Author(s):

Yu-Cheng Chou ◽

David Ko ◽

Harry H. Cheng

Keyword(s):

Parallel Computing ◽

Mobile Agent ◽

Message Passing ◽

Message Passing Interface ◽

Matrix Multiplication ◽

Computing Environment ◽

Heterogeneous Platforms ◽

Agent Based ◽

Networked Computers ◽

Platform Independence

Parallel computing is widely adotped in scientific and engineering applications to enhance the efficiency. Moreover, there are increasing research interests focusing on utilizing distributed networked computers for parallel computing. The Message Passing Interface (MPI) standard was designed to support portability and platform independence of a developed parallel program. However, the procedure to start an MPI-based parallel computation among distributed computers lacks autonomicity and flexibility. This article presents an autonomic dynamic parallel computing framework that provides autonomicity and flexibility that are important and necessary to some parallel computing applications involving resource constrained and heterogeneous platforms. In this framework, an MPI parallel computing environment consisting of multiple computing entities is dynamically established through inter-agent communications using the IEEE Foundation for Intelligent Physical Agents (FIPA) compliant Agent Communication Language (ACL) messages. For each computing entity in the MPI parallel computing environment, a load-balanced MPI program C source code along with the MPI environment configuration statements are dynamically composed as a mobile agent code. A mobile agent wrapping the mobile agent code is created and sent to the computing entity where the mobile agent code is retrieved and interpretively executed. An example of autonomic parallel matrix multiplication is given to demonstrate the self-configuration and self-optimization properties of the presented framework.

Download Full-text

A General Framework of Algorithm-Based Fault Tolerance Technique for Computing Systems

Analyzing Security, Trust, and Crime in the Digital World - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-4856-2.ch001 ◽

2014 ◽

pp. 1-21 ◽

Cited By ~ 1

Author(s):

Hodjatollah Hamidi

Keyword(s):

Fault Tolerance ◽

Error Correction ◽

General Framework ◽

Fault Tolerant ◽

Convolutional Code ◽

Numerical Algorithms ◽

Convolutional Codes ◽

Computing Systems ◽

Specific Level ◽

Computing Paradigm

The Algorithm-Based Fault Tolerance (ABFT) approach transforms a system that does not tolerate a specific type of faults, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. The ABFT philosophy leads directly to a model from which error correction can be developed. By employing an ABFT scheme with effective convolutional code, the design allows high throughput as well as high fault coverage. The ABFT techniques that detect errors rely on the comparison of parity values computed in two ways. The parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs and can apply convolutional codes for the redundancy. This method is a new approach to concurrent error correction in fault-tolerant computing systems. This chapter proposes a novel computing paradigm to provide fault tolerance for numerical algorithms. The authors also present, implement, and evaluate early detection in ABFT.

Download Full-text

Simulation-Based Scheduling of Waterway Projects Using a Parallel Genetic Algorithm

Transportation Systems and Engineering ◽

10.4018/978-1-4666-8473-7.ch016 ◽

2015 ◽

pp. 334-347 ◽

Cited By ~ 2

Author(s):

Ning Yang ◽

Shiaaulir Wang ◽

Paul Schonfeld

Keyword(s):

Genetic Algorithm ◽

Parallel Computing ◽

Message Passing ◽

Message Passing Interface ◽

Computation Time ◽

Parallel Genetic Algorithm ◽

Simulation Based ◽

Multiple Processors ◽

Simulation Based Optimization ◽

Speed Up

A Parallel Genetic Algorithm (PGA) is used for a simulation-based optimization of waterway project schedules. This PGA is designed to distribute a Genetic Algorithm application over multiple processors in order to speed up the solution search procedure for a very large combinational problem. The proposed PGA is based on a global parallel model, which is also called a master-slave model. A Message-Passing Interface (MPI) is used in developing the parallel computing program. A case study is presented, whose results show how the adaption of a simulation-based optimization algorithm to parallel computing can greatly reduce computation time. Additional techniques which are found to further improve the PGA performance include: (1) choosing an appropriate task distribution method, (2) distributing simulation replications instead of different solutions, (3) avoiding the simulation of duplicate solutions, (4) avoiding running multiple simulations simultaneously in shared-memory processors, and (5) avoiding using multiple processors which belong to different clusters (physical sub-networks).

Download Full-text

Interpretive MPI for Parallel Computing

Volume 3: 28th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2008-49996 ◽

2008 ◽

Author(s):

Yu-Cheng Chou ◽

Harry H. Cheng

Keyword(s):

Parallel Computing ◽

Programming Languages ◽

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Rapid Development ◽

Web Based ◽

Heterogeneous Platforms ◽

C Programs ◽

Computation Speedup

Message Passing Interface (MPI) is a standardized library specification designed for message-passing parallel programming on large-scale distributed systems. A number of MPI libraries have been implemented to allow users to develop portable programs using the scientific programming languages, Fortran, C and C++. Ch is an embeddable C/C++ interpreter that provides an interpretive environment for C/C++ based scripts and programs. Combining Ch with any MPI C/C++ library provides the functionality for rapid development of MPI C/C++ programs without compilation. In this article, the method of interfacing Ch scripts with MPI C implementations is introduced by using the MPICH2 C library as an example. The MPICH2-based Ch MPI package provides users with the ability to interpretively run MPI C program based on the MPICH2 C library. Running MPI programs through the MPICH2-based Ch MPI package across heterogeneous platforms consisting of Linux and Windows machines is illustrated. Comparisons for the bandwidth, latency, and parallel computation speedup between C MPI, Ch MPI, and MPI for Python in an Ethernet-based environment comprising identical Linux machines are presented. A Web-based example is given to demonstrate the use of Ch and MPICH2 in C based CGI scripting to facilitate the development of Web-based applications for parallel computing.

Download Full-text

Study on the Numerical Simulation of Explosion and Impact Processes Using PC Cluster System

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.2892 ◽

2012 ◽

Vol 433-440 ◽

pp. 2892-2898

Author(s):

Guang Lei Fei ◽

Jian Guo Ning ◽

Tian Bao Ma

Keyword(s):

Numerical Simulation ◽

Operating System ◽

Parallel Computing ◽

Message Passing ◽

Message Passing Interface ◽

Parallel Program ◽

Pc Cluster ◽

Computing Platform ◽

Impact Processes ◽

Platform System

Parallel computing has been applied in many fields, and the parallel computing platform system, PC cluster based on MPI (Message Passing Interface) library under Linux operating system is a cost-effectiveness approach to parallel compute. In this paper, the key algorithm of parallel program of explosion and impact is presented. The techniques of solving data dependence and realizing communication between subdomain are proposed. From the test of program, the portability of MMIC-3D parallel program is satisfied, and compared with the single computer, PC cluster can improve the calculation speed and enlarge the scale greatly.

Download Full-text

Asynchronous Parallelization of a CFD Solver

Journal of Computational Engineering ◽

10.1155/2015/295393 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Daniel S. Abdi ◽

Girma T. Bitsuamlak

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Stokes Equations ◽

Domain Decomposition Method ◽

Asynchronous Communication ◽

Navier Stokes ◽

Navier Stokes Equations ◽

Asynchronous Iterations ◽

Alternative Approach ◽

Asynchronous Methods

A Navier-Stokes equations solver is parallelized to run on a cluster of computers using the domain decomposition method. Two approaches of communication and computation are investigated, namely, synchronous and asynchronous methods. Asynchronous communication between subdomains is not commonly used in CFD codes; however, it has a potential to alleviate scaling bottlenecks incurred due to processors having to wait for each other at designated synchronization points. A common way to avoid this idle time is to overlap asynchronous communication with computation. For this to work, however, there must be something useful and independent a processor can do while waiting for messages to arrive. We investigate an alternative approach of computation, namely, conducting asynchronous iterations to improve local subdomain solution while communication is in progress. An in-house CFD code is parallelized using message passing interface (MPI), and scalability tests are conducted that suggest asynchronous iterations are a viable way of parallelizing CFD code.

Download Full-text

Efficient Message Passing Interface (MPI) for Parallel Computing on Clusters of Workstations

Journal of Parallel and Distributed Computing ◽

10.1006/jpdc.1996.1267 ◽

1997 ◽

Vol 40 (1) ◽

pp. 19-34 ◽

Cited By ~ 32

Author(s):

Jehoshua Bruck ◽

Danny Dolev ◽

Ching-Tien Ho ◽

Marcel-Cătălin Roşu ◽

Ray Strong

Keyword(s):

Parallel Computing ◽

Message Passing ◽

Message Passing Interface

Download Full-text

Fault tolerant memory design for HW/SW co-reliability in massively parallel computing systems

Second IEEE International Symposium on Network Computing and Applications, 2003. NCA 2003. ◽

10.1109/nca.2003.1201173 ◽

2003 ◽

Author(s):

M. Choi ◽

N.-J. Park ◽

K.M. George ◽

B. Jin ◽

N. Park ◽

...

Keyword(s):

Parallel Computing ◽

Fault Tolerant ◽

Massively Parallel ◽

Computing Systems ◽

Memory Design ◽

Massively Parallel Computing

Download Full-text

CIP and Parallel Computing Based Numerical Solutions of 3-D Slamming Problems

Volume 11: Prof. Robert F. Beck Honoring Symposium on Marine Hydrodynamics ◽

10.1115/omae2015-41292 ◽

2015 ◽

Author(s):

Peng Wen ◽

Wei Qiu

Keyword(s):

Parallel Computing ◽

Message Passing ◽

Message Passing Interface ◽

Numerical Solutions ◽

Three Dimensional ◽

Simulation Method ◽

Water Entry ◽

Computational Domain ◽

Cip Method ◽

Constrained Interpolation

This paper presents the further development of numerical simulation method to solve 3-D highly non-linear slamming problems using parallel computing algorithms. The water entry problems are treated as multi-phase problems (solid, water and air) and governed by the Navier-Stokes (N-S) equations. They are solved by the three-dimensional constrained interpolation profile (CIP) method. The interfaces between different phases are captured using density functions. In the computation, the 3-D CIP method is employed for the advection phase of the N-S equations and a pressure-based algorithm is applied for the non-advection phase. The bi-conjugate gradient stabilized method (BiCGSTAB) is utilized to solve the linear equation systems. A Message Passing Interface (MPI) parallel computing scheme was implemented in the computations. For the parallel computations, the three-dimensional Cartesian decomposition of the computational domain was used. The speed-up performance of various decomposition schemes were studied. Validation studies were carried out for the water entry of a 3-D wedge and a 3-D ship section with prescribed velocities. The computed slamming force, pressure distribution and free-surface elevations are compared with experimental results and numerical results by other methods.

Download Full-text