Optimizing real-time fault tolerance design in WSI

Author(s):  
J.R. Samson
Author(s):  
Vincenzo De Florio

After having discussed the general approach of fault-tolerance languages and their main features, the focus is now set on one particular case: The ARIEL1 recovery language. It is also described as an approach towards resilient computing based on ARIEL and therefore dubbed the “recovery language approach” (ReL). In this chapter, first the main elements of ReL are introduced in general terms, coupling each concept to the technical foundations behind it. After this a quite extensive description of ARIEL and of a compliant architecture are provided. Target applications for such architecture are distributed codes, characterized by non-strict real-time requirements, written in a procedural language such as C, to be executed on distributed or parallel computers consisting of a predefined (fixed) set of processing nodes. The reason for giving special emphasis to ARIEL and its approach is not in their special qualities but more on the fact that, due to the first-hand experience of the author, who conceived, designed, and implemented ARIEL in the course of his studies, it was possible for him to provide the reader with what may be considered as a sort of practical exercise in system and fault modeling and in application-level fault-tolerance design, recalling and applying several of the concepts introduced.


2014 ◽  
Vol 548-549 ◽  
pp. 1326-1329
Author(s):  
Juan Jin ◽  
Qing Fan Gu

Against to the unsustainable problems of health diagnosis, fault location and fault tolerance mechanisms that existing in the current avionics applications, we proposed a fault-tolerant communication middleware which is based on time-triggered in this paper. This middleware is designed to provide a support platform for applications of the real-time based on communication middleware. From the communication middleware level and also combined with time-triggered mechanism and fault-tolerant strategy, it diagnoses the general faults first, and then routes them to the appropriate fault mechanism to process it. So the middleware completely separates fault-tolerant process from the application software functions.


Author(s):  
Domenico Cotroneo ◽  
Antonio Pecchia ◽  
Roberto Pietrantuono ◽  
Stefano Russo

Service Oriented Computing relies on the integration of heterogeneous software technologies and infrastructures that provide developers with a common ground for composing services and producing applications flexibly. However, this approach eases software development but makes dependability a big challenge. Integrating such diverse software items raise issues that traditional testing is not able to exhaustively cope with. In this context, tolerating faults, rather than attempt to detect them solely by testing, is a more suitable solution. This paper proposes a method to support a tailored design of fault tolerance actions for the system being developed. This paper describes system failure behavior through an extensive fault injection campaign to figure out its criticalities and adopt the most appropriate countermeasures to tolerate operational faults. The proposed method is applied to two distinct SOC-enabling technologies. Results show how the achieved findings allow designers to understand the system failure behavior and plan fault tolerance.


Sign in / Sign up

Export Citation Format

Share Document