Performance and effectiveness trade-off for checkpointing in fault-tolerant distributed systems

Chapter 5 considers distributed systems by their properties. The first section studies the classification of software systems, which is usually distinguished in centralized, decentralized and distributed systems. It studies the differences between these three major approaches, showing there is a rather multidimensional classification instead of a linear one. The most important case are distributed systems that enable spreading of computational tasks across several autonomous, independently acting computational entities. A very important result of this case is the CAP theorem that considers the trade-off between consistency, availability and partition tolerance. The last section deals with the possibility to reach consensus in distributed systems, discussing how fault tolerant consensus mechanisms enable mutual agreement among the individual entities in presence of failures. One very special case are so-called Byzantine failures that are discussed in great detail. The main result is the so-called FLP Impossibility Result which states that there is no deterministic algorithm that guarantees solution to the consensus problem in the asynchronous case. The chapter concludes by considering practical solutions that circumvent the impossibility result in order to reach consensus.

Download Full-text

DAG Reliability Model and Fault-Tolerant Algorithm for Heterogeneous Distributed Systems

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2013.02019 ◽

2014 ◽

Vol 36 (10) ◽

pp. 2019-2032 ◽

Cited By ~ 4

Author(s):

Guo-Qi XIE ◽

Ren-Fa LI ◽

Lin LIU ◽

Fan YANG

Keyword(s):

Distributed Systems ◽

Fault Tolerant ◽

Reliability Model ◽

Heterogeneous Distributed Systems

Download Full-text

On the implementation and use of Ada on fault-tolerant distributed systems

ACM SIGAda Ada Letters ◽

10.1145/998410.998414 ◽

1984 ◽

Vol IV (3) ◽

pp. 53-64 ◽

Cited By ~ 4

Author(s):

John C. Knight ◽

John I. A. Urquhart

Keyword(s):

Distributed Systems ◽

Fault Tolerant

Download Full-text

Ada on fault-tolerant distributed systems

ACM SIGAda Ada Letters ◽

10.1145/36792.36804 ◽

1987 ◽

Vol VII (6) ◽

pp. 61-63

Author(s):

John C. Knight

Keyword(s):

Distributed Systems ◽

Fault Tolerant

Download Full-text

Message-optimal protocols for fault-tolerant broadcasts/multicasts in distributed systems with crash failures

IEEE Transactions on Computers ◽

10.1109/12.364545 ◽

1995 ◽

Vol 44 (2) ◽

pp. 346-352 ◽

Cited By ~ 3

Author(s):

Hong-Yi Tzeng ◽

Kai-Yeung Siu

Keyword(s):

Distributed Systems ◽

Fault Tolerant ◽

Crash Failures

Download Full-text

Workshop on fault-tolerant parallel and distributed systems

18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. ◽

10.1109/ipdps.2004.1303231 ◽

2004 ◽

Keyword(s):

Distributed Systems ◽

Fault Tolerant

Download Full-text

Units of computation in fault-tolerant distributed systems

14th International Conference on Distributed Computing Systems ◽

10.1109/icdcs.1994.302480 ◽

2002 ◽

Author(s):

M. Ahuja ◽

S. Mishra

Keyword(s):

Distributed Systems ◽

Fault Tolerant

Download Full-text

Fault Tolerant Leader Election in Distributed Systems

International Journal of Computer Science and Information Technology ◽

10.5121/ijcsit.2017.9102 ◽

2017 ◽

Vol 9 (1) ◽

pp. 13-20 ◽

Cited By ~ 1

Author(s):

Marius Rafailescu

Keyword(s):

Distributed Systems ◽

Fault Tolerant ◽

Leader Election

Download Full-text

Log Replication in Raft vs Kafka

Studia Universitatis Babeș-Bolyai Informatica ◽

10.24193/subbi.2020.2.05 ◽

2020 ◽

Vol 65 (2) ◽

pp. 66

Author(s):

M. Petrescu ◽

R. Petrescu

Keyword(s):

Distributed Systems ◽

Fault Tolerant ◽

Consensus Algorithm ◽

Correct Operation ◽

Consensus Algorithms ◽

Fault Tolerant System ◽

Multiple Algorithms

The implementation of a fault-tolerant system requires some type of consensus algorithm for correct operation. From Paxos to View-stamped Replication and Raft multiple algorithms have been developed to handle this problem. This paper presents and compares the Raft algorithm and Apache Kafka, a distributed messaging system which, although at a higher level, implements many concepts present in Raft (strong leadership, append-only log, log compaction, etc.).This shows that mechanisms conceived to handle one class of problems (consensus algorithms) are very useful to handle a larger category in the context of distributed systems.

Download Full-text

Basic concepts and issues in fault-tolerant distributed systems

Operating Systems of the 90s and Beyond - Lecture Notes in Computer Science ◽

10.1007/bfb0024534 ◽

2005 ◽

pp. 118-149 ◽

Cited By ~ 1

Author(s):

Flaviu Cristian

Keyword(s):

Distributed Systems ◽

Fault Tolerant ◽

Basic Concepts

Download Full-text