Model Checking-based Software-FMEA: Assessment of Fault Tolerance and Error Detection Mechanisms

Failure Mode and Effects Analysis (FMEA) is a systematic technique to explore the possible failure modes of individual components or subsystems and determine their potential effects at the system level. Applications of FMEA are common in case of hardware and communication failures, but analyzing software failures (SW-FMEA) poses a number of challenges. Failures may originate in permanent software faults commonly called bugs, and their effects can be very subtle and hard to predict, due to the complex nature of programs. Therefore, a behavior-based automatic method to analyze the potential effects of different types of bugs is desirable. Such a method could be used to automatically build an FMEA report about the fault effects, or to evaluate different failure mitigation and detection techniques. This paper follows the latter direction, demonstrating the use of a model checking-based automated SW-FMEA approach to evaluate error detection and fault tolerance mechanisms, demonstrated on a case study inspired by safety-critical embedded operating systems.

Download Full-text

A Self-Checking Hardware Journal for a Fault-Tolerant Processor Architecture

International Journal of Reconfigurable Computing ◽

10.1155/2011/962062 ◽

2011 ◽

Vol 2011 ◽

pp. 1-15 ◽

Cited By ~ 3

Author(s):

Mohsin Amin ◽

Abbas Ramazani ◽

Fabrice Monteiro ◽

Camille Diou ◽

Abbas Dandache

Keyword(s):

Fault Tolerance ◽

Error Detection ◽

Error Control ◽

Fault Tolerant ◽

Error Rates ◽

Main Memory ◽

Transient Faults ◽

Processor Core ◽

Detection Techniques ◽

Performance Area

We introduce a specialized self-checking hardware journal being used as a centerpiece in our design strategy to build a processor tolerant to transient faults. Fault tolerance here relies on the use of error detection techniques in the processor core together with journalization and rollback execution to recover from erroneous situations. Effective rollback recovery is possible thanks to using a hardware journal and chosing a stack computing architecture for the processor core instead of the usual RISC or CISC. The main objective of the journalization and the hardware self-checking journal is to prevent data not yet validated to be sent to the main memory, and allow to fast rollback execution on faulty situations. The main memory, supposed to be fault secure in our model, only contains valid (uncorrupted) data obtained from fault-free computations. Error control coding techniques are used both in the processor core to detect errors and in the HW journal to protect the temporarily stored data from possible changes induced by transient faults. Implementation results on an FPGA of the Altera Stratix-II family show clearly the relevance of the approach, both in terms of performance/area tradeoff and fault tolerance effectiveness, even for high error rates.

Download Full-text

Real-time error detection techniques based on FPGA

Journal of Computer Applications ◽

10.3724/sp.j.1087.2013.01459 ◽

2013 ◽

Vol 33 (5) ◽

pp. 1459-1462

Author(s):

Xiaoming JU ◽

Jiehao ZHANG ◽

Yizhong ZHANG

Keyword(s):

Real Time ◽

Error Detection ◽

Time Error ◽

Detection Techniques

Download Full-text

Error injection aimed at fault removal in fault tolerance mechanisms-criteria for error selection using field data on software faults

Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering ◽

10.1109/issre.1996.558785 ◽

2002 ◽

Cited By ~ 11

Author(s):

J. Christmansson ◽

P. Santhanam

Keyword(s):

Fault Tolerance ◽

Field Data ◽

Tolerance Mechanisms ◽

Software Faults ◽

Fault Removal

Download Full-text

Application of data reconciliation and gross error detection techniques to enhance reliability and consistency of the blast furnace process data

Asia-Pacific Journal of Chemical Engineering ◽

10.1002/apj.2628 ◽

2021 ◽

Author(s):

Sujan Hazra ◽

Prakash Abhale ◽

Samik Nag ◽

Sam Mathew ◽

Shankar Narasimhan

Keyword(s):

Blast Furnace ◽

Error Detection ◽

Data Reconciliation ◽

Blast Furnace Process ◽

Process Data ◽

Gross Error Detection ◽

Furnace Process ◽

Detection Techniques ◽

Gross Error

Download Full-text

Towards a New Semantic Metric for Error Detection Based on Program State Redundancy

International Journal of Systems and Service-Oriented Engineering ◽

10.4018/ijssoe.2021070101 ◽

2021 ◽

Vol 11 (2) ◽

pp. 1-23

Author(s):

Dalila Amara Amara ◽

Latifa Ben Arfa Rabai

Keyword(s):

Fault Tolerance ◽

Empirical Study ◽

Error Detection ◽

Final States ◽

Internal States ◽

Common Concept ◽

Program Redundancy ◽

Semantic Metrics

Fault tolerance techniques are generally based around a common concept that is redundancy whose measurement is required. A suite of four semantic metrics is proposed to assess program redundancy and reflect their ability to tolerate faults. Literature shows that one of these metrics, namely state redundancy, is limited to compute program redundancy only in their initial and final states and ignores their internal states. Consequently, the authors focus in this paper to overcome this shortcoming by proposing a new redundancy-based semantic metric that computes the redundancy of the different program states including internal ones. The empirical study they perform shows that the proposed metric is a measure of program redundancy in one side and is an error detection indicator in another side. Moreover, they demonstrate that it is more accurate than the basic state redundancy metric in detecting masked errors. It is useful for testers to indicate if a tested program is error-free and to pinpoint the presence of masked errors even if the final states are equal to the expected ones.

Download Full-text

Towards System-level Fault-tolerance Using Formal Methods And Soc Methodologies

Unique Chips and Systems - Computer Engineering Series ◽

10.1201/9781420051759.ch12 ◽

2007 ◽

pp. 299-324

Author(s):

Kristina Lundqvist

Keyword(s):

Fault Tolerance ◽

Formal Methods ◽

System Level

Download Full-text

Lookup Table Algorithm for Error Correction in Color Images

JOIV International Journal on Informatics Visualization ◽

10.30630/joiv.2.2.113 ◽

2018 ◽

Vol 2 (2) ◽

pp. 63

Author(s):

Ruaa Alaadeen Abdulsattar ◽

Nada Hussein M. Ali

Keyword(s):

Error Correction ◽

Error Detection ◽

Color Image ◽

Color Images ◽

Lookup Table ◽

Hamming Code ◽

Detection Techniques ◽

Error Ratio ◽

Error Detection And Correction ◽

Large Burst

Error correction and error detection techniques are often used in wireless transmission systems. A color image of type BMP is considered as an application of developed lookup table algorithms to detect and correct errors in these images. Decimal Matrix Code (DMC) and Hamming code (HC) techniques were integrated to compose Hybrid Matrix Code (HMC) to maximize the error detection and correction. The results obtained from HMC still have some error not corrected because the redundant bits added by Hamming codes to the data are considered inadequate, and it is suitable when the error rate is low for detection and correction processes. Besides, a Hamming code could not detect large burst error period, in addition, the have same values sometimes which lead to not detect the error and consequently increase the error ratio. The proposed algorithm LUT_CORR is presented to detect and correct errors in color images over noisy channels, the proposed algorithm depends on the parallel Cyclic Redundancy Code (CRC) method that's based on two algorithms: Sarwate and slicing By N algorithms. The LUT-CORR and the aforementioned algorithms were merged to correct errors in color images, the output results correct the corrupted images with a 100 % ratio almost. The above high correction ratio due to some unique values that the LUT-CORR algorithm have. The HMC and the proposed algorithm applied to different BMP images, the obtained results from LUT-CORR are compared to HMC for both Mean Square Error (MSE) and correction ratio. The outcome from the proposed algorithm shows a good performance and has a high correction ratio to retrieve the source BMP image.

Download Full-text

A new approach to system-level fault-tolerance in message-passing multicomputers

Computing in the 90's - Lecture Notes in Computer Science ◽

10.1007/bfb0038515 ◽

1991 ◽

pp. 357-363 ◽

Cited By ~ 2

Author(s):

Guy W. Zimmerman

Keyword(s):

Fault Tolerance ◽

Message Passing ◽

System Level ◽

New Approach

Download Full-text

PSK Transciever designing using SystemVue

SRI JNPG COLLEGE REVELATION A JOURNAL OF POPULAR SCIENCE ◽

10.29320/sjnpgrj.3.1.11 ◽

2019 ◽

Vol 3 (1) ◽

Author(s):

Deepali Chaurasia

Keyword(s):

Signal Processing ◽

System Integration ◽

Communication Systems ◽

Energy Savings ◽

Electronic Devices ◽

Physical Space ◽

Electronic System ◽

System Level ◽

Complex Nature ◽

Wide Range

Since, the industrial electronics is trending towards more compact components and system integration, innovative products offering greater flexibility, quality, safety, reliability, energy savings, wide range of connectivity with long operating lifetime. Now, Electronics is widely used in information processing, telecommunication and signal processing. Due to the complex nature of electronics theory, laboratory experimentation is an important part of development of electronic devices. These experiments are used to test or verify the proposed design and detect errors. Historically, electronics labs have consisted of electronic devices and equipment located in the physical space. Although in more recent years, the trend has been towards electronics lab simulation softwares and SystemVue is also one of them. SystemVue is a focussed electronic design automation (EDA) environment for electronic system-level (ESL) design. It enables system architects and algorithm developers to innovate the physical layer (PHY) of wireless and aerospace/defence communication systems and provide unique value to RF, DSP and FPGA/ASIC implementers. As a dedicated platform for ESL design and signal processing realization, SystemVue replaces general-purpose digital, analog and math environments. SystemVue “speaks RF”, cuts PHY development and verification time in half and connects to your mainstream EDA flow.

Download Full-text

A Recovery-Oriented Approach for Software Fault Diagnosis in Complex Critical Systems

Innovations and Approaches for Resilient and Adaptive Systems ◽

10.4018/978-1-4666-2056-8.ch002 ◽

2012 ◽

pp. 29-56

Author(s):

Gabriella Carrozza ◽

Roberto Natella

Keyword(s):

Error Detection ◽

Traffic Control ◽

Fault Location ◽

Fault Tolerant ◽

Fault Injection ◽

Critical Systems ◽

Software Faults ◽

Real World Application ◽

Complex Fault ◽

Detection Quality

This paper proposes an approach to software faults diagnosis in complex fault tolerant systems, encompassing the phases of error detection, fault location, and system recovery. Errors are detected in the first phase, exploiting the operating system support. Faults are identified during the location phase, through a machine learning based approach. Then, the best recovery action is triggered once the fault is located. Feedback actions are also used during the location phase to improve detection quality over time. A real world application from the Air Traffic Control field has been used as case study for evaluating the proposed approach. Experimental results, achieved by means of fault injection, show that the diagnosis engine is able to diagnose faults with high accuracy and at a low overhead.

Download Full-text