scholarly journals Model Checking-based Software-FMEA: Assessment of Fault Tolerance and Error Detection Mechanisms

Author(s):  
Vince Molnár ◽  
István Majzik

Failure Mode and Effects Analysis (FMEA) is a systematic technique to explore the possible failure modes of individual components or subsystems and determine their potential effects at the system level. Applications of FMEA are common in case of hardware and communication failures, but analyzing software failures (SW-FMEA) poses a number of challenges. Failures may originate in permanent software faults commonly called bugs, and their effects can be very subtle and hard to predict, due to the complex nature of programs. Therefore, a behavior-based automatic method to analyze the potential effects of different types of bugs is desirable. Such a method could be used to automatically build an FMEA report about the fault effects, or to evaluate different failure mitigation and detection techniques. This paper follows the latter direction, demonstrating the use of a model checking-based automated SW-FMEA approach to evaluate error detection and fault tolerance mechanisms, demonstrated on a case study inspired by safety-critical embedded operating systems.

2011 ◽  
Vol 2011 ◽  
pp. 1-15 ◽  
Author(s):  
Mohsin Amin ◽  
Abbas Ramazani ◽  
Fabrice Monteiro ◽  
Camille Diou ◽  
Abbas Dandache

We introduce a specialized self-checking hardware journal being used as a centerpiece in our design strategy to build a processor tolerant to transient faults. Fault tolerance here relies on the use of error detection techniques in the processor core together with journalization and rollback execution to recover from erroneous situations. Effective rollback recovery is possible thanks to using a hardware journal and chosing a stack computing architecture for the processor core instead of the usual RISC or CISC. The main objective of the journalization and the hardware self-checking journal is to prevent data not yet validated to be sent to the main memory, and allow to fast rollback execution on faulty situations. The main memory, supposed to be fault secure in our model, only contains valid (uncorrupted) data obtained from fault-free computations. Error control coding techniques are used both in the processor core to detect errors and in the HW journal to protect the temporarily stored data from possible changes induced by transient faults. Implementation results on an FPGA of the Altera Stratix-II family show clearly the relevance of the approach, both in terms of performance/area tradeoff and fault tolerance effectiveness, even for high error rates.


2013 ◽  
Vol 33 (5) ◽  
pp. 1459-1462
Author(s):  
Xiaoming JU ◽  
Jiehao ZHANG ◽  
Yizhong ZHANG

Author(s):  
Dalila Amara Amara ◽  
Latifa Ben Arfa Rabai

Fault tolerance techniques are generally based around a common concept that is redundancy whose measurement is required. A suite of four semantic metrics is proposed to assess program redundancy and reflect their ability to tolerate faults. Literature shows that one of these metrics, namely state redundancy, is limited to compute program redundancy only in their initial and final states and ignores their internal states. Consequently, the authors focus in this paper to overcome this shortcoming by proposing a new redundancy-based semantic metric that computes the redundancy of the different program states including internal ones. The empirical study they perform shows that the proposed metric is a measure of program redundancy in one side and is an error detection indicator in another side. Moreover, they demonstrate that it is more accurate than the basic state redundancy metric in detecting masked errors. It is useful for testers to indicate if a tested program is error-free and to pinpoint the presence of masked errors even if the final states are equal to the expected ones.


2018 ◽  
Vol 2 (2) ◽  
pp. 63
Author(s):  
Ruaa Alaadeen Abdulsattar ◽  
Nada Hussein M. Ali

Error correction and error detection techniques are often used in wireless transmission systems. A color image of type BMP is considered as an application of developed lookup table algorithms to detect and correct errors in these images. Decimal Matrix Code (DMC) and Hamming code (HC) techniques were integrated to compose Hybrid Matrix Code (HMC) to maximize the error detection and correction. The results obtained from HMC still have some error not corrected because the redundant bits added by Hamming codes to the data are considered inadequate, and it is suitable when the error rate is low for detection and correction processes. Besides, a Hamming code could not detect large burst error period, in addition, the have same values sometimes which lead to not detect the error and consequently increase the error ratio. The proposed algorithm LUT_CORR is presented to detect and correct errors in color images over noisy channels, the proposed algorithm depends on the parallel Cyclic Redundancy Code (CRC) method that's based on two algorithms: Sarwate and slicing By N algorithms. The LUT-CORR and the aforementioned algorithms were merged to correct errors in color images, the output results correct the corrupted images with a 100 % ratio almost. The above high correction ratio due to some unique values that the LUT-CORR algorithm have. The HMC and the proposed algorithm applied to different BMP images, the obtained results from LUT-CORR are compared to HMC for both Mean Square Error (MSE) and correction ratio.  The outcome from the proposed algorithm shows a good performance and has a high correction ratio to retrieve the source BMP image.


Author(s):  
Deepali Chaurasia

Since, the industrial electronics is trending towards more compact components and system integration, innovative products offering greater flexibility, quality, safety, reliability, energy savings, wide range of connectivity with long operating lifetime. Now, Electronics is widely used in information processing, telecommunication and signal processing. Due to the complex nature of electronics theory, laboratory experimentation is an important part of development of electronic devices. These experiments are used to test or verify the proposed design and detect errors. Historically, electronics labs have consisted of electronic devices and equipment located in the physical space. Although in more recent years, the trend has been towards electronics lab simulation softwares and SystemVue is also one of them. SystemVue is a focussed electronic design automation (EDA) environment for electronic system-level (ESL) design. It enables system architects and algorithm developers to innovate the physical layer (PHY) of wireless and aerospace/defence communication systems and provide unique value to RF, DSP and FPGA/ASIC implementers. As a dedicated platform for ESL design and signal processing realization, SystemVue replaces general-purpose digital, analog and math environments. SystemVue “speaks RF”, cuts PHY development and verification time in half and connects to your mainstream EDA flow.


Author(s):  
Gabriella Carrozza ◽  
Roberto Natella

This paper proposes an approach to software faults diagnosis in complex fault tolerant systems, encompassing the phases of error detection, fault location, and system recovery. Errors are detected in the first phase, exploiting the operating system support. Faults are identified during the location phase, through a machine learning based approach. Then, the best recovery action is triggered once the fault is located. Feedback actions are also used during the location phase to improve detection quality over time. A real world application from the Air Traffic Control field has been used as case study for evaluating the proposed approach. Experimental results, achieved by means of fault injection, show that the diagnosis engine is able to diagnose faults with high accuracy and at a low overhead.


Sign in / Sign up

Export Citation Format

Share Document