Increasing the fault tolerance of distributed systems for the Hyper de Bruijn topology with excess code

Author(s):  
Heorhii Loutskii ◽  
Artem Volokyta ◽  
Pavlo Rehida ◽  
Oleksandr Honcharenko ◽  
Bohdan Ivanishchev ◽  
...  
Author(s):  
Gianni Pucciani ◽  
Flavia Donno ◽  
Andrea Domenici ◽  
Heinz Stockinger

Data replication is a well-known technique used in distributed systems in order to improve fault tolerance and make data access faster. Several copies of a dataset are created and placed at different nodes, so that users can access the replica closest to them, and at the same time the data access load is distributed among the replicas. In today’s Grid middleware solutions, data management services allow users to replicate datasets (i.e., flat files or databases) among storage elements within a Grid, but replicas are often considered read-only because of the absence of mechanisms able to propagate updates and enforce replica consistency. This entry analyzes the replica consistency problem and provides hints for the development of a Replica Consistency Service, highlighting the main issues and pros and cons of several approaches.


Author(s):  
Parveen Kumar ◽  
Rachit Garg

Minimum-process coordinated checkpointing is a suitable approach to introduce fault tolerance in mobile distributed systems transparently. In order to balance the checkpointing overhead and the loss of computation on recovery, the authors propose a hybrid checkpointing algorithm, wherein an all-process coordinated checkpoint is taken after the execution of minimum-process coordinated checkpointing algorithm for a fixed number of times. In coordinated checkpointing, if a single process fails to take its checkpoint; all the checkpointing effort goes waste, because, each process has to abort its tentative checkpoint. In order to take the tentative checkpoint, an MH (Mobile Host) needs to transfer large checkpoint data to its local MSS over wireless channels. In this regard, the authors propose that in the first phase, all concerned MHs will take soft checkpoint only. Soft checkpoint is similar to mutable checkpoint. In this case, if some process fails to take checkpoint in the first phase, then MHs need to abort their soft checkpoints only. The effort of taking a soft checkpoint is negligibly small as compared to the tentative one. In the minimum-process coordinated checkpointing algorithm, an effort has been made to minimize the number of useless checkpoints and blocking of processes using probabilistic approach.


Sign in / Sign up

Export Citation Format

Share Document