scholarly journals Designing Fault Tolerance Strategy by Iterative Redundancy for Component-Based Distributed Computing Systems

2014 ◽  
Vol 2014 ◽  
pp. 1-11
Author(s):  
Hui Wang ◽  
Yun Wang

Reliability is a critical issue for component-based distributed computing systems, some distributed software allows the existence of large numbers of potentially faulty components on an open network. Faults are inevitable in this large-scale, complex, distributed components setting, which may include a lot of untrustworthy parts. How to provide highly reliable component-based distributed systems is a challenging problem and a critical research. Generally, redundancy and replication are utilized to realize the goal of fault tolerance. In this paper, we propose a CFI (critical fault iterative) redundancy technique, by which the efficiency can be guaranteed to make use of resources (e.g., computation and storage) and to create fault-tolerance applications. When operating in an environment with unknown components’ reliability, CFI redundancy is more efficient and adaptive than other techniques (e.g., K-Modular Redundancy and N-Version Programming). In the CFI strategy of redundancy, the function invocation relationships and invocation frequencies are employed to rank the functions’ importance and identify the most vulnerable function implemented via functionally equivalent components. A tradeoff has to be made between efficiency and reliability. In this paper, a formal theoretical analysis and an experimental analysis are presented. Compared with the existing methods, the reliability of components-based distributed system can be greatly improved by tolerating a small part of significant components.

2007 ◽  
Vol 08 (02) ◽  
pp. 163-178 ◽  
Author(s):  
FATOS XHAFA ◽  
JAVIER CARRETERO ◽  
LEONARD BAROLLI ◽  
ARJAN DURRESI

In this paper we present a study on the requirements for the design and implementation of simulation packages for Grid systems. Grids are emerging as new distributed computing systems whose main objective is to manage and allocate geographically distributed computing resources to applications and users in an efficient and transparent manner. Grid systems are at present very difficult and complex to use for experimental studies of large-scale distributed applications. Although the field of simulation of distributed computing systems is mature, recent developments in large-scale distributed systems are raising needs not present in the simulation of the traditional distributed systems. Motivated by this, we present in this work a set of basic requirements that any simulation package for Grid computing should offer. This set of functionalities is obtained after a careful review of most important existing Grid simulation packages and includes new requirements not considered in such simulation packages. Based on the identified set of requirements, a Grid simulator is developed and exemplified for the Grid scheduling problem.


Sign in / Sign up

Export Citation Format

Share Document