Development of functional fault-tolerant system

The evolvable hardware (EHW) is widely used in the design of fault-tolerant system. Fault-tolerant system is really a real-time system, and the recovery time is necessary in fault detection and recovery. However, when applying EHW, real-time characteristic is usually ignored. In this paper, a fault-tolerant strategy based on EHW is proposed. The recovery time, predicted by the fault tree analysis (FTA), is considered as a constraint condition. A configuration library is set up in the design phase to accelerate the repair process of the anticipated faults. An evolvable algorithm (EA) based on similarity is applied to evolve the repair circuit for the unanticipated faults. When the library reaches the upper, the target system is reconfigured by the EA-repair technology. Extensive experiments are conducted to show that our method can improve the fault-tolerance of the system while satisfying the real-time requirement on FPGA platform. In a long run system, our method can keep a higher fault recovery rate.

Download Full-text

Does your fault-tolerant system tolerate faults?

Proceedings of the ACM Symposium on Cloud Computing - SoCC '18 ◽

10.1145/3267809.3275451 ◽

2018 ◽

Author(s):

Kamala Ramasubramanian ◽

Peter Alvaro

Keyword(s):

Fault Tolerant ◽

Fault Tolerant System

Download Full-text

FT-EST Framework: Reliability Estimation for the Purposes of Fault-Tolerant System Design Automation

2018 21st Euromicro Conference on Digital System Design (DSD) ◽

10.1109/dsd.2018.00053 ◽

2018 ◽

Cited By ~ 4

Author(s):

Jakub Lojda ◽

Jakub Podivinsky ◽

Ondrej Cekan ◽

Richard Panek ◽

Zdenek Kotasek

Keyword(s):

System Design ◽

Design Automation ◽

Fault Tolerant ◽

Reliability Estimation ◽

Fault Tolerant System

Download Full-text

Overview of a Fault-Tolerant System

Fault-Tolerant Parallel and Distributed Systems ◽

10.1007/978-1-4615-5449-3_6 ◽

1998 ◽

pp. 109-121

Author(s):

Angelo Pruscino

Keyword(s):

Fault Tolerant ◽

Fault Tolerant System

Download Full-text

Fault-Tolerant system design in multiple operating modes using a structural model

Advances in Safety, Reliability and Risk Management ◽

10.1201/b11433-77 ◽

2011 ◽

pp. 549-556 ◽

Cited By ~ 4

Author(s):

B Conrard ◽

V Cocquempot ◽

S Mili

Keyword(s):

System Design ◽

Structural Model ◽

Fault Tolerant ◽

Fault Tolerant System ◽

Operating Modes

Download Full-text

Log Replication in Raft vs Kafka

Studia Universitatis Babeș-Bolyai Informatica ◽

10.24193/subbi.2020.2.05 ◽

2020 ◽

Vol 65 (2) ◽

pp. 66

Author(s):

M. Petrescu ◽

R. Petrescu

Keyword(s):

Distributed Systems ◽

Fault Tolerant ◽

Consensus Algorithm ◽

Correct Operation ◽

Consensus Algorithms ◽

Fault Tolerant System ◽

Multiple Algorithms

The implementation of a fault-tolerant system requires some type of consensus algorithm for correct operation. From Paxos to View-stamped Replication and Raft multiple algorithms have been developed to handle this problem. This paper presents and compares the Raft algorithm and Apache Kafka, a distributed messaging system which, although at a higher level, implements many concepts present in Raft (strong leadership, append-only log, log compaction, etc.).This shows that mechanisms conceived to handle one class of problems (consensus algorithms) are very useful to handle a larger category in the context of distributed systems.

Download Full-text

SOFTWARE IMPLEMENTED HARDWARE-TRANSIENT FAULTS DETECTION

International Journal of Computing ◽

10.47839/ijc.5.1.377 ◽

2014 ◽

pp. 26-30

Author(s):

Goutam Kumar Saha

Keyword(s):

Error Correction ◽

Fault Tolerant ◽

Low Cost ◽

Transient Faults ◽

Fault Tolerant System ◽

Cost Approach ◽

Run Time ◽

Commodity Systems ◽

Fail Safe ◽

Register Error

This paper examines a software implemented self-checking technique that is capable of detecting processorregisters' hardware-transient faults. The proposed approach is intended to detect run-time transient bit-errors in memory and processor status register. Error correction is not considered here. However, this low-cost approach is intended to be adopted in commodity systems that use ordinary off-the-shelf microprocessors, for the purpose of operational faults detection towards gaining fail-safe kind of fault tolerant system.

Download Full-text