A Design for Fault-Tolerant Communication Middleware Based on Time-Triggered

Against to the unsustainable problems of health diagnosis, fault location and fault tolerance mechanisms that existing in the current avionics applications, we proposed a fault-tolerant communication middleware which is based on time-triggered in this paper. This middleware is designed to provide a support platform for applications of the real-time based on communication middleware. From the communication middleware level and also combined with time-triggered mechanism and fault-tolerant strategy, it diagnoses the general faults first, and then routes them to the appropriate fault mechanism to process it. So the middleware completely separates fault-tolerant process from the application software functions.

Download Full-text

Single Chip Microcomputer Cluster Management

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.933.584 ◽

2014 ◽

Vol 933 ◽

pp. 584-589

Author(s):

Zhi Chun Zhang ◽

Song Wei Li ◽

Wei Ren Wang ◽

Wei Zhang ◽

Li Jun Qi

Keyword(s):

Real Time ◽

Management System ◽

Large Scale ◽

Fault Tolerant ◽

Single Chip ◽

Single Chip Microcomputer ◽

Cluster Management ◽

The Real ◽

Time Performance ◽

Operational Phase

This paper presents a system in which the cluster devices are controlled by single-chip microcomputers, with emphasis on the cluster management techniques of single-chip microcomputers. Each device in a cluster is controlled by a single-chip microcomputer collecting sample data sent to and driving the device by driving data received from the same cluster management computer through COMs. The cluster management system running on the cluster management computer carries out such control as initial SCM identification, run time slice management, communication resource utilization, fault tolerance and error corrections on single-chip microcomputers. Initial SCM identification is achieved by signal responses between the single-chip microcomputers and the cluster management computer. By using the port priority and the parallelization of serial communications, the systems real-time performance is maximized. The real-time performance can be adjusted and improved by increasing or decreasing COMs and the ports linked to each COM, and the real-time performance can also be raised by configuring more cluster management computers. Fault-tolerant control occurs in the initialization phase and the operational phase. In the initialization phase, the cluster management system incorporates unidentified single-chip microcomputers into the system based on the history information recorded on external storage media. In the operational phase, if an operation error of reading and writing on a single-chip microcomputer reaches a predetermined threshold, the single-chip microcomputer is regarded as serious fault or not existing. The cluster management system maintains accuracy maintenance database on external storage medium to solve nonlinear control of specific devices and accuracy maintenance due to wear. The cluster management system uses object-oriented method to design a unified driving framework in order to enable the implementation of the cluster management system simplified, standardized and easy to transplant. The system has been applied in a large-scale simulation system of 230 single-chip microcomputers, which proves that the system is reliable, real-time and easy to maintain.

Download Full-text

XFTRTS - A XtratuM Based Fault-Tolerant Real-Time Control System

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.4095 ◽

2012 ◽

Vol 433-440 ◽

pp. 4095-4100

Author(s):

Chan Juan Li ◽

Chuan De Zhang ◽

Qing Guo Zhou

Keyword(s):

Control System ◽

Fault Tolerance ◽

Real Time ◽

Operating Systems ◽

Fault Tolerant ◽

Real Time Control ◽

Time Control ◽

Performance Metric ◽

Virtualization Technology ◽

Single Host

Nowadays there are a few works which are concerned with the virtualization technology and the fault-tolerance technology, because virtualization system can provide an environment allowing multiple operating systems running in concurrent way. In this paper, we based on a real-time hypervisor-XtratuM, propose the architecture of a fault-tolerant real-time control system (XFTRTS), which provide local backup execution and to support different level diversity including N-version programming on a single host. Furthermore, we implement a prototype of XFTRTS and test its important performance metric–latency, which is within two microseconds.

Download Full-text

CHOOSING BETWEEN DESIGN OPTIONS FOR REAL-TIME COMPUTERS TOLERATING A SINGLE FAULT

Journal of Circuits System and Computers ◽

10.1142/s0218126610006591 ◽

2010 ◽

Vol 19 (05) ◽

pp. 1041-1068 ◽

Cited By ~ 2

Author(s):

REFIK SAMET

Keyword(s):

Fault Tolerance ◽

Optimal Design ◽

Real Time ◽

Graphical Models ◽

Fault Tolerant ◽

Single Fault ◽

Mode Of Operation ◽

Real Time Applications ◽

The Many ◽

Design Options

This paper proposes a methodology for supporting the design of fault-tolerant computers for real-time applications. To this end, the paper first presents steps of fault tolerance and describes mechanisms that can be used to realize them. Then, the design options consisting of described mechanisms are proposed and a table summarizing them is designed. From that, the paper proposes a flowchart for choosing between the many various design options available for building a redundant computer system. Choosing an optimal design option is performed according to the number of redundant computers, the mode of operation of redundant computers, the computer failure mode and the severity of the real-time constraint. Finally, graphical models for sequencing the mechanisms of design options are proposed. The main merit of the proposed methodology includes a spectrum of design options of fault-tolerant mechanisms for real-time computers tolerating a single fault at a time and a guide for choosing between them.

Download Full-text

Incorporation of security and fault tolerance mechanisms into real-time component-based distributed computing systems

Proceedings 20th IEEE Symposium on Reliable Distributed Systems ◽

10.1109/reldis.2001.969752 ◽

2002 ◽

Cited By ~ 1

Author(s):

K.H. Kim

Keyword(s):

Fault Tolerance ◽

Distributed Computing ◽

Real Time ◽

Tolerance Mechanisms ◽

Distributed Computing Systems ◽

Computing Systems ◽

Time Component

Download Full-text

A New Practical Fault Location Algorithm for Two-Terminal Transmission Lines

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.383-390.4377 ◽

2011 ◽

Vol 383-390 ◽

pp. 4377-4384

Author(s):

Zhou Ma ◽

Xiao Ning Li ◽

Xiao Ming Zhang

Keyword(s):

Transmission Line ◽

Real Time ◽

Transmission Lines ◽

Fault Location ◽

Distributed Parameter ◽

The Real ◽

Three Phase ◽

Transition Resistance ◽

Location Algorithm ◽

Real Time Transmission

A new practical fault location algorithm using two-terminal electrical quantities is presented in this article, which takes into account the distributed parameter line model. The analytical expression of algorithm derives from Three-Phase decoupling. First, an analytical synchronization of the unsynchronized measurements is performed with use of the determined synchronization operator and the non-synchronizing angle is calculated with the two-terminal pre-fault electrical quantities. Then, the real-time transmission line parameters are calculated using two-terminal non-synchronized electrical quantities and the non-synchronizing angle. The algorithm overcomes the drawbacks of the traditional fault location algorithms, which does not exist the pseudo-root problem. Besides, it has the advantages of simple, practical, litter computation, no need to search and iterative and robustness. The algorithm has not influenced by fault types, the transition resistance and other factors. At last the developed fault location algorithm is tested using signals of ATP-EMTP versatile simulations of faults on a transmission line.

Download Full-text

Fault-Tolerant Energy-Aware Task Scheduling on Multiprocessor System for Fixed-Priority Real-Time Tasks

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a5177.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2269-2275 ◽

Cited By ~ 1

Keyword(s):

Energy Efficiency ◽

Fault Tolerance ◽

Real Time ◽

Task Scheduling ◽

Fault Tolerant ◽

Multiprocessor System ◽

Energy Aware ◽

Fixed Priority ◽

Real Time Scheduling ◽

Time Scheduling

Energy-aware real-time scheduling is gaining attention in recent years owing to environmental concerns and applications in numerous fields. System reliability also gets affected adversely with increasing energy dissipations posing serious challenges before the researchers. Keeping these in view, in recent times researchers have diverted to combining issues of fault-tolerance and energy efficiency. In literature, DVFS and DPM, most commonly used techniques for power management in task scheduling, are often combined with Primary/Backup technique to achieve fault tolerance against transient and permanent faults. Optimal algorithms, Earliest deadline first (EDF) and Rate-Monotonic (RM), meant for scheduling dynamic and fixed priority tasks respectively, have mainly been analyzed using a dual-processor approach for fault-tolerance and energy efficiency. In this paper, to handle higher workload of fixed-priority real-time tasks, energy-aware fault-tolerant scheduling algorithms are proposed for multiprocessor systems with balanced and unbalanced number of main and auxiliary processors. Simulations over extensive task-sets indicate that balanced approach is more energy-efficient than the unbalanced one.

Download Full-text

Fault Tolerance Techniques for Distributed, Parallel Applications

Innovative Research and Applications in Next-Generation High Performance Computing - Advances in Systems Analysis, Software Engineering, and High Performance Computing ◽

10.4018/978-1-5225-0287-6.ch009 ◽

2016 ◽

pp. 221-252

Author(s):

Camille Coti

Keyword(s):

Fault Tolerance ◽

High Performance Computing ◽

High Performance ◽

Fault Tolerant ◽

Distributed Applications ◽

Parallel Applications ◽

Rollback Recovery ◽

Tolerance Mechanisms ◽

Performance Computing

This chapter gives an overview of techniques used to tolerate failures in high-performance distributed applications. We describe basic replication techniques, automatic rollback recovery and application-based fault tolerance. We present the challenges raised specifically by distributed, high performance computing and the performance overhead the fault tolerance mechanisms are likely to cost. Last, we give an example of a fault-tolerant algorithm that exploits specific properties of a recent algorithm.

Download Full-text

ADAPTIVE FAULT-TOLERANT TASK SCHEDULING FOR REAL-TIME ENERGY HARVESTING SYSTEMS

Journal of Circuits System and Computers ◽

10.1142/s0218126612500041 ◽

2012 ◽

Vol 21 (01) ◽

pp. 1250004 ◽

Cited By ~ 5

Author(s):

LINJIE ZHU ◽

TONGQUAN WEI ◽

XIAODAO CHEN ◽

YONGHE GUO ◽

SHIYAN HU

Keyword(s):

Fault Tolerance ◽

Energy Harvesting ◽

Real Time ◽

Task Scheduling ◽

Energy Efficient ◽

Fault Tolerant ◽

Multiprocessor System ◽

Technology Scaling ◽

Harvesting Systems ◽

Important Design

Fault tolerance and energy have become important design issues in multiprocessor system-on-chips (SoCs) with the technology scaling and the proliferation of battery-powered multiprocessor SoCs. This paper proposed an energy-efficient fault tolerance task allocation scheme for multiprocessor SoCs in real-time energy harvesting systems. The proposed fault-tolerance scheme is based on the principle of the primiary/backup task scheduling, and can tolerate at most one single transient fault. Extensive simulated experiment shows that the proposed scheme can save up to 30% energy consumption and reduce the miss ratio to about 8% in the presence of faults.

Download Full-text

A Software Architecture for Handling Complex Critical Section Constraints on Multiprocessors in a Fault-Tolerant Real-Time Embedded System

10.29007/brkj ◽

2019 ◽

Author(s):

Jia Xu

Keyword(s):

Fault Tolerance ◽

Embedded System ◽

Software Architecture ◽

Real Time ◽

Data Structures ◽

Fault Tolerant ◽

Critical Section ◽

Shared Data ◽

Time Task ◽

Critical Sections

In a real-time embedded system which uses a primary and an alternate for each real-time task to achieve fault tolerance, there is a need to allow both primaries and alternates to have critical sections/segments in which shared data structures can be read and updated while guaranteeing that the execution of any part of one critical section will not be interleaved with or overlap with the execution of any part of a critical section belonging to some other primary or alternate which reads and writes on those shared data structures. In this paper a software architecture is presented which effectively handles critical section constraints where both primaries and alternates may have critical sections which can either overrun or underrun, while still guaranteeing that all primaries or alternates that do not overrun will always meet their deadlines while keeping the shared data in a consistent state on a multiprocessor in a fault tolerant real-time embedded system.

Download Full-text

Exploring Parallel MPI Fault Tolerance Mechanisms for Phylogenetic Inference with RAxML-NG

10.1101/2021.01.15.426773 ◽

2021 ◽

Author(s):

Lukas Hübner ◽

Alexey M. Kozlov ◽

Demian Hespe ◽

Peter Sanders ◽

Alexandros Stamatakis

Keyword(s):

Fault Tolerance ◽

Phylogenetic Trees ◽

Large Scale ◽

Fault Tolerant ◽

Phylogenetic Inference ◽

Molecular Data ◽

Supplementary Information ◽

Tolerance Mechanisms ◽

Recovery Mechanisms ◽

Mpi Implementation

Phylogenetic trees are now routinely inferred on large scale HPC systems with thousands of cores as the parallel scalability of phylogenetic inference tools has improved over the past years to cope with the molecular data avalanche. Thus, the parallel fault tolerance of phylogenetic inference tools has become a relevant challenge. To this end, we explore parallel fault tolerance mechanisms and algorithms, the software modifications required, and the performance penalties induced via enabling parallel fault tolerance by example of RAxML-NG, the successor of the widely used RAxML tool for maximum likelihood based phylogenetic tree inference. We find that the slowdown induced by the necessary additional recovery mechanisms in RAxML-NG is on average 2%. The overall slowdown by using these recovery mechanisms in conjunction with a fault tolerant MPI implementation amounts to 8% on average for large empirical datasets. Via failure simulations, we show that RAxML-NG can successfully recover from multiple simultaneous failures, subsequent failures, failures during recovery, and failures during checkpointing. Recoveries are automatic and transparent to the user. The modified fault tolerant RAxML-NG code is available under GNU GPL at https://github.com/lukashuebner/ft-raxml-ng Contact: lukas.huebner@{kit.edu,h-its.org};, [email protected], [email protected], [email protected], [email protected] Supplementary information: Supplementary data are available at bioRχiv.

Download Full-text