scholarly journals Performance and effectiveness trade-off for checkpointing in fault-tolerant distributed systems

2006 ◽  
Vol 19 (1) ◽  
pp. 37-63 ◽  
Author(s):  
Panagiotis Katsaros ◽  
Lefteris Angelis ◽  
Constantine Lazos
Author(s):  
Andreas Bolfing

Chapter 5 considers distributed systems by their properties. The first section studies the classification of software systems, which is usually distinguished in centralized, decentralized and distributed systems. It studies the differences between these three major approaches, showing there is a rather multidimensional classification instead of a linear one. The most important case are distributed systems that enable spreading of computational tasks across several autonomous, independently acting computational entities. A very important result of this case is the CAP theorem that considers the trade-off between consistency, availability and partition tolerance. The last section deals with the possibility to reach consensus in distributed systems, discussing how fault tolerant consensus mechanisms enable mutual agreement among the individual entities in presence of failures. One very special case are so-called Byzantine failures that are discussed in great detail. The main result is the so-called FLP Impossibility Result which states that there is no deterministic algorithm that guarantees solution to the consensus problem in the asynchronous case. The chapter concludes by considering practical solutions that circumvent the impossibility result in order to reach consensus.


1984 ◽  
Vol IV (3) ◽  
pp. 53-64 ◽  
Author(s):  
John C. Knight ◽  
John I. A. Urquhart

1987 ◽  
Vol VII (6) ◽  
pp. 61-63
Author(s):  
John C. Knight

2020 ◽  
Vol 65 (2) ◽  
pp. 66
Author(s):  
M. Petrescu ◽  
R. Petrescu

The implementation of a fault-tolerant system requires some type of consensus algorithm for correct operation. From Paxos to View-stamped Replication and Raft multiple algorithms have been developed to handle this problem. This paper presents and compares the Raft algorithm and Apache Kafka, a distributed messaging system which, although at a higher level, implements many concepts present in Raft (strong leadership, append-only log, log compaction, etc.).This shows that mechanisms conceived to handle one class of problems (consensus algorithms) are very useful to handle a larger category in the context of distributed systems.


Sign in / Sign up

Export Citation Format

Share Document