Optimizing real-time fault tolerance design in WSI

The Recovery Language Approach

Application-Layer Fault-Tolerance Protocols ◽

10.4018/978-1-60566-182-7.ch006 ◽

2009 ◽

pp. 175-241

Author(s):

Vincenzo De Florio

Keyword(s):

Fault Tolerance ◽

Real Time ◽

Parallel Computers ◽

Fault Modeling ◽

Tolerance Design ◽

General Terms ◽

Fixed Set ◽

Time Requirements ◽

Resilient Computing ◽

Distributed Codes

After having discussed the general approach of fault-tolerance languages and their main features, the focus is now set on one particular case: The ARIEL1 recovery language. It is also described as an approach towards resilient computing based on ARIEL and therefore dubbed the “recovery language approach” (ReL). In this chapter, first the main elements of ReL are introduced in general terms, coupling each concept to the technical foundations behind it. After this a quite extensive description of ARIEL and of a compliant architecture are provided. Target applications for such architecture are distributed codes, characterized by non-strict real-time requirements, written in a procedural language such as C, to be executed on distributed or parallel computers consisting of a predefined (fixed) set of processing nodes. The reason for giving special emphasis to ARIEL and its approach is not in their special qualities but more on the fact that, due to the first-hand experience of the author, who conceived, designed, and implemented ARIEL in the course of his studies, it was possible for him to provide the reader with what may be considered as a sort of practical exercise in system and fault modeling and in application-level fault-tolerance design, recalling and applying several of the concepts introduced.

Download Full-text

Reinforcement Learning Based Framework for Real Time Fault Tolerance

2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) ◽

10.1109/iemcon51383.2020.9284929 ◽

2020 ◽

Author(s):

Yehia Kotb ◽

Mouhammad Alakkoummi ◽

Hassan Kanj

Keyword(s):

Reinforcement Learning ◽

Fault Tolerance ◽

Real Time

Download Full-text

A Practical Real Time Svd Machine With Multi-Level Fault Tolerance

10.1117/12.976256 ◽

1986 ◽

Cited By ~ 1

Author(s):

David E. Schimmel ◽

Franklin T. Luk

Keyword(s):

Fault Tolerance ◽

Real Time ◽

Multi Level

Download Full-text

A Design for Fault-Tolerant Communication Middleware Based on Time-Triggered

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.548-549.1326 ◽

2014 ◽

Vol 548-549 ◽

pp. 1326-1329

Author(s):

Juan Jin ◽

Qing Fan Gu

Keyword(s):

Fault Tolerance ◽

Real Time ◽

Fault Location ◽

Fault Tolerant ◽

Application Software ◽

Tolerance Mechanisms ◽

The Real ◽

Communication Middleware ◽

Fault Mechanism ◽

Health Diagnosis

Against to the unsustainable problems of health diagnosis, fault location and fault tolerance mechanisms that existing in the current avionics applications, we proposed a fault-tolerant communication middleware which is based on time-triggered in this paper. This middleware is designed to provide a support platform for applications of the real-time based on communication middleware. From the communication middleware level and also combined with time-triggered mechanism and fault-tolerant strategy, it diagnoses the general faults first, and then routes them to the appropriate fault mechanism to process it. So the middleware completely separates fault-tolerant process from the application software functions.

Download Full-text

Guiding Circuit Level Fault-Tolerance Design with Statistical Methods

2008 Design, Automation and Test in Europe ◽

10.1109/date.2008.4484704 ◽

2008 ◽

Cited By ~ 1

Author(s):

Drew C. Ness ◽

David J. Lilja

Keyword(s):

Fault Tolerance ◽

Statistical Methods ◽

Tolerance Design

Download Full-text

Minos—the design and implementation of an embedded real-time operating system with a perspective of fault tolerance

2008 International Multiconference on Computer Science and Information Technology ◽

10.1109/imcsit.2008.4747312 ◽

2008 ◽

Cited By ~ 7

Author(s):

Thomas Kaegi-Trachsel ◽

Juerg Gutknecht

Keyword(s):

Operating System ◽

Fault Tolerance ◽

Real Time ◽

Real Time Operating System ◽

Design And Implementation

Download Full-text

Adaptable Fault Tolerance for Real-Time Systems

Responsive Computer Systems: Steps Toward Fault-Tolerant Real-Time Systems ◽

10.1007/978-1-4615-2271-3_10 ◽

1995 ◽

pp. 187-208 ◽

Cited By ~ 7

Author(s):

A. Bondavalli ◽

J. Stankovic ◽

L. Strigini

Keyword(s):

Fault Tolerance ◽

Real Time ◽

Real Time Systems ◽

Time Systems

Download Full-text

Optimization of Data Center Fault Tolerance Design

Engineering and Management of Data Centers - Service Science: Research and Innovations in the Service Economy ◽

10.1007/978-3-319-65082-1_7 ◽

2017 ◽

pp. 141-162

Author(s):

Sascha Bosse ◽

Klaus Turowski

Keyword(s):

Fault Tolerance ◽

Data Center ◽

Tolerance Design

Download Full-text

Work-in-Progress: Improving Resilience of Distributed Real-Time Applications via Security and Fault Tolerance Co-Design

10.1109/rtss52674.2021.00059 ◽

2021 ◽

Author(s):

Wei Jiang ◽

Xinke Liao ◽

Jinyu Zhan ◽

Ke Jiang

Keyword(s):

Fault Tolerance ◽

Real Time ◽

Work In Progress ◽

Real Time Applications

Download Full-text

A Method to Support Fault Tolerance Design in Service Oriented Computing Systems

Theoretical and Analytical Service-Focused Systems Design and Development ◽

10.4018/978-1-4666-1767-4.ch019 ◽

2012 ◽

pp. 362-376

Author(s):

Domenico Cotroneo ◽

Antonio Pecchia ◽

Roberto Pietrantuono ◽

Stefano Russo

Keyword(s):

Fault Tolerance ◽

Common Ground ◽

Fault Injection ◽

Failure Behavior ◽

Tolerance Design ◽

System Failure ◽

Computing Systems ◽

Service Oriented Computing ◽

Service Oriented ◽

Tailored Design

Service Oriented Computing relies on the integration of heterogeneous software technologies and infrastructures that provide developers with a common ground for composing services and producing applications flexibly. However, this approach eases software development but makes dependability a big challenge. Integrating such diverse software items raise issues that traditional testing is not able to exhaustively cope with. In this context, tolerating faults, rather than attempt to detect them solely by testing, is a more suitable solution. This paper proposes a method to support a tailored design of fault tolerance actions for the system being developed. This paper describes system failure behavior through an extensive fault injection campaign to figure out its criticalities and adopt the most appropriate countermeasures to tolerate operational faults. The proposed method is applied to two distinct SOC-enabling technologies. Results show how the achieved findings allow designers to understand the system failure behavior and plan fault tolerance.

Download Full-text