Increasing the fault tolerance of distributed systems for the Hyper de Bruijn topology with excess code

Data replication is a well-known technique used in distributed systems in order to improve fault tolerance and make data access faster. Several copies of a dataset are created and placed at different nodes, so that users can access the replica closest to them, and at the same time the data access load is distributed among the replicas. In today’s Grid middleware solutions, data management services allow users to replicate datasets (i.e., flat files or databases) among storage elements within a Grid, but replicas are often considered read-only because of the absence of mechanisms able to propagate updates and enforce replica consistency. This entry analyzes the replica consistency problem and provides hints for the development of a Replica Consistency Service, highlighting the main issues and pros and cons of several approaches.

Download Full-text

Fault Tolerance in Distributed Systems [Book Reviews]

IEEE Parallel & Distributed Technology Systems & Applications ◽

10.1109/m-pdt.1996.494606 ◽

1996 ◽

Vol 4 (2) ◽

pp. 83

Author(s):

N.K. Jha

Keyword(s):

Distributed Systems ◽

Fault Tolerance

Download Full-text

De Bruijn graph-based communication modeling for fault tolerance in smart grids

2012 IEEE Asia Pacific Conference on Circuits and Systems ◽

10.1109/apccas.2012.6419112 ◽

2012 ◽

Cited By ~ 1

Author(s):

Bo-Chuan Cheng ◽

Katherine Shu-Min Li ◽

Sying-Jyan Wang

Keyword(s):

Fault Tolerance ◽

Smart Grids ◽

De Bruijn Graph ◽

De Bruijn

Download Full-text

Integrating Fault Tolerance and Load Balancing in Distributed Systems Based on CORBA

Dependable Computing - EDCC 5 - Lecture Notes in Computer Science ◽

10.1007/11408901_11 ◽

2005 ◽

pp. 154-166 ◽

Cited By ~ 2

Author(s):

A. V. Singh ◽

L. E. Moser ◽

P. M. Melliar-Smith

Keyword(s):

Distributed Systems ◽

Fault Tolerance ◽

Load Balancing

Download Full-text

Soft-Checkpointing Based Hybrid Synchronous Checkpointing Protocol for Mobile Distributed Systems

International Journal of Distributed Systems and Technologies ◽

10.4018/jdst.2011010101 ◽

2011 ◽

Vol 2 (1) ◽

pp. 1-13 ◽

Cited By ~ 6

Author(s):

Parveen Kumar ◽

Rachit Garg

Keyword(s):

Distributed Systems ◽

Fault Tolerance ◽

Wireless Channels ◽

Probabilistic Approach ◽

Fixed Number ◽

Mobile Host ◽

Single Process ◽

Coordinated Checkpointing ◽

Coordinated Checkpoint

Minimum-process coordinated checkpointing is a suitable approach to introduce fault tolerance in mobile distributed systems transparently. In order to balance the checkpointing overhead and the loss of computation on recovery, the authors propose a hybrid checkpointing algorithm, wherein an all-process coordinated checkpoint is taken after the execution of minimum-process coordinated checkpointing algorithm for a fixed number of times. In coordinated checkpointing, if a single process fails to take its checkpoint; all the checkpointing effort goes waste, because, each process has to abort its tentative checkpoint. In order to take the tentative checkpoint, an MH (Mobile Host) needs to transfer large checkpoint data to its local MSS over wireless channels. In this regard, the authors propose that in the first phase, all concerned MHs will take soft checkpoint only. Soft checkpoint is similar to mutable checkpoint. In this case, if some process fails to take checkpoint in the first phase, then MHs need to abort their soft checkpoints only. The effort of taking a soft checkpoint is negligibly small as compared to the tentative one. In the minimum-process coordinated checkpointing algorithm, an effort has been made to minimize the number of useless checkpoints and blocking of processes using probabilistic approach.

Download Full-text

Fault tolerance support in distributed systems

10.1145/504136.504145 ◽

1990 ◽

Author(s):

Andrew Birrell

Keyword(s):

Distributed Systems ◽

Fault Tolerance

Download Full-text