SOFTWARE IMPLEMENTED HARDWARE-TRANSIENT FAULTS DETECTION

This Chapter presents a solution for fault-tolerance in Multi-Valued Logic (MVL) circuits comprised of Carbon Nano-Tube Field Effect Transistors (CNTFET). This chapter reviews basic primitives of MVL and describes ternary implementations of CNTFET circuits. Finally, this chapter describes a method for error correction called Restorative Feedback (RFB). The RFB method is a variant of Triple-Modular Redundancy (TMR) that utilizes the fault masking capabilities of the Muller C element to provide added protection against noisy transient faults. Fault tolerant properties of Muller C element is discussed and error correction capability of RFB method is demonstrated in detail.

Download Full-text

A new architecture for online error detection and isolation in network on chip

Journal of High Speed Networks ◽

10.3233/jhs-200646 ◽

2020 ◽

Vol 26 (4) ◽

pp. 307-323

Author(s):

Chakib Nehnouh

Keyword(s):

Error Detection ◽

Fault Tolerant ◽

High Reliability ◽

Low Cost ◽

Network On Chip ◽

Fault Detection And Isolation ◽

Main Concern ◽

Transient Faults ◽

Protection Factor ◽

On Chip

The Network-on-Chip (NoC) has become a promising communication infrastructure for Multiprocessors-System-on-Chip (MPSoC). Reliability is a main concern in NoC and performance is degraded when NoC is susceptible to faults. A fault can be determined as a cause of deviation from the desired operation of the system (error). To deal with these reliability challenges, this work propose OFDIM (Online Fault Detection and Isolation Mechanism),a novel combined methodology to tolerate multiple permanent and transient faults. The new router architecture uses two modules to assure highly reliable and low-cost fault-tolerant strategy. In contrast to existing works, our architecture presents less area, more fault tolerance, and high reliability. The reliability comparison using Silicon Protection Factor (SPF), shows 22-time improvement and that additional circuitry incurs an area overhead of 27%, which is better than state-of-the-art reliable router architectures. Also, the results show that the throughput decreases only by 5.19% and minor increase in average latency 2.40% while providing high reliability.

Download Full-text

Lessons from FTM: an experiment in design and implementation of a low-cost fault tolerant system

IEEE Transactions on Reliability ◽

10.1109/24.510822 ◽

1996 ◽

Vol 45 (2) ◽

pp. 332-340 ◽

Cited By ~ 10

Author(s):

G. Muller ◽

M. Banatre ◽

N. Peyrouze ◽

B. Rochat

Keyword(s):

Fault Tolerant ◽

Low Cost ◽

Fault Tolerant System ◽

Design And Implementation

Download Full-text

Fault-Tolerant and Fail-Safe Design Based on Reconfiguration

Design and Test Technology for Dependable Systems-on-Chip - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-60960-212-3.ch008 ◽

2011 ◽

pp. 175-194 ◽

Cited By ~ 1

Author(s):

Hana Kubatova ◽

Pavel Kubalik

Keyword(s):

Fault Detection ◽

High Probability ◽

Fault Tolerant ◽

Main Property ◽

Transient Faults ◽

Trade Off ◽

Reliability Models ◽

On Line ◽

Mission Critical ◽

Fail Safe

The main aim of this chapter is to present the way, how to design fault-tolerant or fail-safe systems in programmable hardware (FPGAs) and therefore to use FPGAs in mission-critical applications, too. RAM based FPGAs are usually taken for unreliable due to high probability of transient faults (SEU) and therefore inapplicable in this area. But FPGAs can be easily reconfigured. The authors’ aim is to utilize appropriate type of FPGA reconfiguration and to combine it with well-known methods for fail-safe and fault-tolerant design (duplex, TMR) including on-line testing methods for fault detection and then startup of the reconfiguration process. Dependability parameters’ calculations based on reliability models is integral part of proposed methodology. The trade-off between the requested level of dependability characteristics of a designed system and area overhead with respect to FPGA possible faults the main property and advantage of proposed methodology.

Download Full-text

Fault-Tolerant and Fail-Safe Design based on Reconfiguration

Small and Medium Enterprises ◽

10.4018/978-1-4666-3886-0.ch035 ◽

2013 ◽

pp. 695-714

Author(s):

Hana Kubatova ◽

Pavel Kubalik

Keyword(s):

Fault Detection ◽

High Probability ◽

Fault Tolerant ◽

Main Property ◽

Transient Faults ◽

Trade Off ◽

Reliability Models ◽

On Line ◽

Mission Critical ◽

Fail Safe

The main aim of this chapter is to present the way, how to design fault-tolerant or fail-safe systems in programmable hardware (FPGAs) and therefore to use FPGAs in mission-critical applications, too. RAM based FPGAs are usually taken for unreliable due to high probability of transient faults (SEU) and therefore inapplicable in this area. But FPGAs can be easily reconfigured. The authors’ aim is to utilize appropriate type of FPGA reconfiguration and to combine it with well-known methods for fail-safe and fault-tolerant design (duplex, TMR) including on-line testing methods for fault detection and then startup of the reconfiguration process. Dependability parameters’ calculations based on reliability models is integral part of proposed methodology. The trade-off between the requested level of dependability characteristics of a designed system and area overhead with respect to FPGA possible faults the main property and advantage of proposed methodology.

Download Full-text

A Single-Version Algorithmic Approach to Fault Tolerant Computing Using Static Redundancy

CLEI electronic journal ◽

10.19153/cleiej.9.2.9 ◽

2006 ◽

Vol 9 (2) ◽

Author(s):

Goutam Kumar Saha

Keyword(s):

Fault Tolerant ◽

Low Cost ◽

Malicious Code ◽

Transient Faults ◽

Computing Systems ◽

Algorithmic Approach ◽

Application Program ◽

Proposed Model ◽

Multiple Copies ◽

Recovery Block

This paper describes a single-version algorithmic approach to design in fault tolerant computing in various computing systems by using static redundancy in order to mask transient bit errors in processor-memory and registers. This low-cost single-version scheme relies on a time redundancy approach. This software scheme does not intend to tolerate software design bugs. Instead of using multiple and independent versions of an application program, this single-version approach uses multiple copies of an application program. This low-cost approach is useful to tolerate various malicious code modifications and transient-faults during the run time of a computing application system without incurring any additional cost for extra hardware and extra software versions as an N-version programming scheme (NVP) or a Recovery block scheme (RBS). This proposed model is a practical and usable one that demands an affordable redundancy in time and space. The proposed scheme is capable of tolerating various operational faults that might occur during the execution time of an application.

Download Full-text

Exponential suppression of bit or phase errors with cyclic error correction

Nature ◽

10.1038/s41586-021-03588-y ◽

2021 ◽

Vol 595 (7867) ◽

pp. 383-387

Author(s):

◽

Zijun Chen ◽

Kevin J. Satzinger ◽

Juan Atalaya ◽

Alexander N. Korotkov ◽

...

Keyword(s):

Error Correction ◽

Error Detection ◽

Fault Tolerant ◽

Error Rates ◽

Quantum Error Correction ◽

Superconducting Qubits ◽

Two Dimensional ◽

Logical Error ◽

Quantum Error ◽

Logical Qubit

AbstractRealizing the potential of quantum computing requires sufficiently low logical error rates1. Many applications call for error rates as low as 10−15 (refs. 2–9), but state-of-the-art quantum platforms typically have physical error rates near 10−3 (refs. 10–14). Quantum error correction15–17 promises to bridge this divide by distributing quantum logical information across many physical qubits in such a way that errors can be detected and corrected. Errors on the encoded logical qubit state can be exponentially suppressed as the number of physical qubits grows, provided that the physical error rates are below a certain threshold and stable over the course of a computation. Here we implement one-dimensional repetition codes embedded in a two-dimensional grid of superconducting qubits that demonstrate exponential suppression of bit-flip or phase-flip errors, reducing logical error per round more than 100-fold when increasing the number of qubits from 5 to 21. Crucially, this error suppression is stable over 50 rounds of error correction. We also introduce a method for analysing error correlations with high precision, allowing us to characterize error locality while performing quantum error correction. Finally, we perform error detection with a small logical qubit using the 2D surface code on the same device18,19 and show that the results from both one- and two-dimensional codes agree with numerical simulations that use a simple depolarizing error model. These experimental demonstrations provide a foundation for building a scalable fault-tolerant quantum computer with superconducting qubits.

Download Full-text

A fault-tolerant system incorporating an Ada executive and 1750A processors

10.1145/339665.339692 ◽

1986 ◽

Author(s):

David Butler

Keyword(s):

Fault Tolerant ◽

Fault Tolerant System

Download Full-text