MobileRE: A replicas prioritized hybrid fault tolerance strategy for mobile distributed system

Real-time digital signal processing (DSP) applications require high performance parallel architectures that are also reliable. VLSI arrays are good candidates for providing the required high throughput for these applications. These arrays which consist of a number of regularly interconnected processing elements (PEs) will not function correctly in the presence of even a single fault in any of the PEs. Fault tolerance has therefore become a vital design criterion for VLSI arrays. In this paper, a fault tolerance strategy for VLSI arrays is proposed, which significantly improves the reliability of the system. The fault tolerance scheme is composed of two phases: testing and locating faults (fault detection and diagnosis), and reconfiguration. The first phase employs an on-line error detection technique which achieves a compromise between the space and time redundancy approaches. This concurrent error detection technique reduces the rollback time considerably. The reconfiguration phase is achieved by using a global control responsible for changing the states of the switches in the interconnection network. Backtracking is introduced into the algorithm for maximizing the processor utilization, at the same time keeping the complexity of the interconnection network as simple as possible. Finally, a reliability analysis of this scheme using a Markov model and a comparison with some previous schemes are given.

Download Full-text

A reliable distributed system using dual level fault tolerance

Proceedings IEEE Southeastcon '92 ◽

10.1109/secon.1992.202268 ◽

2003 ◽

Cited By ~ 1

Author(s):

J.W. Hanna ◽

J.D. Johannes

Keyword(s):

Fault Tolerance ◽

Distributed System

Download Full-text

CONSISTENCY OF DISTRIBUTED SYSTEM WITH ACTIVE INITIATOR PROCESS WITHOUT USELESS CHECKPOINTS

International Journal of Computing ◽

10.47839/ijc.5.1.387 ◽

2014 ◽

pp. 92-99

Author(s):

N. P. Gopalan ◽

K. Nagarajan

Keyword(s):

Fault Tolerance ◽

Distributed System ◽

Message Passing ◽

Domino Effect ◽

Exchange Of Information ◽

Global Consistency ◽

Software Fault Tolerance ◽

Software Fault ◽

The One ◽

Active Initiator

Checkpointing mechanism is the one of the best attractive approach for providing software fault tolerance in distributed message passing systems. This paper aims to implement a distributed checkpointing technique, which eliminates the drawbacks of the centralized approach like “domino effect”, “useless checkpoint” (checkpoints that do not contribute to global consistency), and “hidden and zigzag” dependencies. The proposed checkpointing protocol has a checkpoint initiator, but, coordination among the local checkpoints is done in a distributed fashion. This guaranty that no message would be lost in case of failure occurs, has been maintained in this work by exchange of information among the processes. However, there is no central checkpoint initiator, but each of the processes takes turn to act as an initiator. Processes take local checkpoints only after being notified by the initiator. The processes synchronize their activities of the current checkpointing interval before finally committing their checkpoints. Thus, the checkpointing pattern described in this paper takes only those checkpoints that will contribute to the consistent global snapshot thereby eliminating the number of useless checkpoints.

Download Full-text

Fault Tolerance Mechanism of a Distributed System for Marine Communication Network

Journal of Coastal Research ◽

10.2112/si106-137.1 ◽

2020 ◽

Vol 106 (sp1) ◽

pp. 605

Author(s):

Jingwei Sun

Keyword(s):

Communication Network ◽

Fault Tolerance ◽

Distributed System ◽

Tolerance Mechanism

Download Full-text

Current Sensor Active Fault Tolerance Control Based on Feedback Gain Reconfiguration

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.511-512.1012 ◽

2014 ◽

Vol 511-512 ◽

pp. 1012-1016 ◽

Cited By ~ 2

Author(s):

Zhi Qiang Wang ◽

Xiao Long Li ◽

Qing Zhen Wang

Keyword(s):

Fault Tolerance ◽

Fault Diagnosis ◽

Control Strategy ◽

Active Fault ◽

State Observer ◽

Current Sensor ◽

Feedback Gain ◽

Maglev Train ◽

Tolerance Strategy ◽

Tolerance Control

For the failure of current sensor on maglev train, an active fault tolerance control strategy based on feedback gain reconfiguration is proposed. Fault diagnosis unit based on state observer is designed to detect the output of current sensor, the diagnosis result is used to switch the control strategy. Simulation result indicates that the fault tolerance strategy meets the demands of the system.

Download Full-text

A cooperative fault tolerance strategy for distributed object lifting robots

IEEE/RSJ International Conference on Intelligent Robots and System ◽

10.1109/irds.2002.1041681 ◽

2003 ◽

Author(s):

F. Ghaderi ◽

M.N. Ahmadabadi

Keyword(s):

Fault Tolerance ◽

Distributed Object ◽

Tolerance Strategy ◽

Object Lifting

Download Full-text