scholarly journals Revisiting Symptom-Based Fault Tolerant Techniques against Soft Errors

Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 3028
Author(s):  
Hwisoo So ◽  
Moslem Didehban ◽  
Yohan Ko ◽  
Reiley Jeyapaul ◽  
Jongho Kim ◽  
...  

Aggressive technology scaling and near-threshold computing have made soft error reliability one of the leading design considerations in modern embedded microprocessors. Although traditional hardware/software redundancy-based schemes can provide a high level of protection, they incur significant overheads in terms of performance and hardware resources. The considerable overheads from such full redundancy-based techniques has motivated researchers to propose low-cost soft error protection schemes, such as symptom-based error protection schemes. The main idea behind a symptom-based error protection scheme is that soft errors in the system will quickly generate some symptoms, such as exceptions, branch mispredictions, cache or TLB misses, or unpredictable variable values. Therefore, monitoring such infrequent symptoms makes it possible to cover the manifestation of failures caused by soft errors. Symptom-based protection schemes have been suggested as shortcuts to achieve acceptable reliability with comparable overheads. Since the symptom-based protection schemes seem attractive due to their generality and simplicity, even state-of-the-art protection schemes exploit them as the baseline protections. However, our detailed analysis of the fault coverage and performance overheads of such schemes reveals that the user-visible failure coverage, particularly of ReStore, is limited (29% on average). By contrast, the runtime overheads are significant (40% on average) because the majority of the fault injection experiments, which were considered as detected/recovered failures by low-level symptoms, are actually benign faults by program-level masking effects.

2017 ◽  
Vol 26 (08) ◽  
pp. 1740009
Author(s):  
Aitzan Sari ◽  
Mihalis Psarakis

Due to the high vulnerability of SRAM-based FPGAs in single-event upsets (SEUs), effective fault tolerant soft processor architectures must be considered when we use FPGAs to build embedded systems for critical applications. In the past, the detection of symptoms of soft errors in the behavior of microprocessors has been used for the implementation of low-budget error detection techniques, instead of costly hardware redundancy techniques. To enable the development of such low-cost error detection techniques for FPGA soft processors, we propose an in-depth analysis of the symptoms of SEUs in the FPGA configuration memory. To this end, we present a flexible fault injection platform based on an open-source CAD framework (RapidSmith) for the soft error sensitivity analysis of soft processors in Xilinx SRAM-based FPGAs. Our platform supports the estimation of soft error sensitivity per configuration bit/frame, processor component and benchmark. The fault injection is performed on-chip by a dedicated microcontroller which also monitors processor behavior to identify specific symptoms as consequences of soft errors. The performed analysis showed that these symptoms can be used to build an efficient, low-cost error detection scheme. The proposed platform is demonstrated through an extensive fault injection campaign in the Leon3 soft processor.


Electronics ◽  
2021 ◽  
Vol 10 (17) ◽  
pp. 2101
Author(s):  
Yohan Ko

The exponentially increasing occurrence of soft errors makes the optimization of reliability, performance, hardware area, and power consumption one of the main concerns in modern embedded processors. Since the design cost of hardware techniques aimed at improving the reliability of microprocessors is quite expensive for resource-constrained embedded systems, software-level fault tolerance mechanisms have been proposed as an attractive solution for soft error threats. However, many software-level redundancy-based schemes are accompanied by considerable performance overhead, which is not acceptable for many embedded applications. In this work, we have introduced an ultra-low-cost soft error protection scheme for embedded applications, which works based on source-code analysis and identifying critical variables. After identification, these vital variables are adequately protected by placing runtime checks at critical points of execution. Our experimental results based on several applications demonstrate that the proposed scheme can mitigate the failure rate by 47% with negligible performance degradation.


Author(s):  
Jaan Raik ◽  
Urmas Repinski ◽  
Maksim Jenihhin ◽  
Anton Chepurov

This Chapter addresses the above-mentioned challenges by presenting a holistic diagnosis approach for design error location and malicious fault list generation for soft errors. First, a method for locating design errors at the source-level of hardware description language code using the design representation of high-level decision diagrams is explained. Subsequently, this method is reduced to malicious fault list generation at the high-level. A minimized fault list is generated for optimizing the time to be spent on the fault injection run necessary for assessing designs vulnerability to soft-errors.


Author(s):  
Qiang Guan ◽  
Nathan DeBardeleben ◽  
Sean Blanchard ◽  
Song Fu ◽  
Claude H. Davis IV ◽  
...  

As the high performance computing (HPC) community continues to push towards exascale computing, HPC applications of today are only affected by soft errors to a small degree but we expect that this will become a more serious issue as HPC systems grow. We propose F-SEFI, a Fine-grained Soft Error Fault Injector, as a tool for profiling software robustness against soft errors. We utilize soft error injection to mimic the impact of errors on logic circuit behavior. Leveraging the open source virtual machine hypervisor QEMU, F-SEFI enables users to modify emulated machine instructions to introduce soft errors. F-SEFI can control what application, which sub-function, when and how to inject soft errors with different granularities, without interference to other applications that share the same environment. We demonstrate use cases of F-SEFI on several benchmark applications with different characteristics to show how data corruption can propagate to incorrect results. The findings from the fault injection campaign can be used for designing robust software and power-efficient hardware.


2004 ◽  
Vol 14 (02) ◽  
pp. 299-309 ◽  
Author(s):  
R. C. BAUMANN

The once-ephemeral soft error has recently caused considerable concern for manufacturers of advanced silicon technology as this phenomenon now has the potential for inducing the highest failure rate of all other reliability mechanisms combined. We briefly review the three radiation mechanisms responsible for causing soft errors in commercial electronics and the basic physical mechanism by which ionizing radiation can produce a soft error. We then focus on the soft error sensitivity trends in commercial DRAM, SRAM, and peripheral logic devices as a function of technology scaling and discuss some of the solutions used for mitigating the impact of soft errors in high reliability systems.


2018 ◽  
Vol 27 (09) ◽  
pp. 1850144
Author(s):  
Bahman Arasteh

Decreasing the scale of transistors and exponential increase in the transistor counts has made the soft-errors as one of the major causes of software failures. Fault injection is a powerful method for dependability assessment of a computer system against soft-errors. A considerable number of randomly injected faults in the current methods and tools are effect-less or equivalent. To overcome this problem and reduce the cost of fault injection, this study presents a software based fault-injection method that accurately evaluates the dependability of a computer system with a limited number fault-injection. Using a genetic algorithm (GA) the most vulnerable executable paths of an input program is identified; then only the basic blocs (BBs) into the identified vulnerable paths are considered as the target of fault injection. The results of fault injections on the set of 8 traditional benchmark-programs show that the proposed method reduces about 20% of effect-less faults by avoiding the injection of faults in the error-derating blocks of a program. Furthermore, the number of injected faults is reduced to 60% of its original size in the random injection. Also, the proposed method provides more stable and accurate results than the random injection.


The difficulty in the signal processing and communication systems increase year by year. This results in the on demand for scaling and integration with the help of advanced CMOS technologies. Soft errors are reliability thread on modern digital world which explains the need of protection against errors in digital circuit applications. In some applications, techniques like Algorithm based fault tolerance (ABFT) are used to detect and correct error with the help of algorithm properties. As the filters are the basic building blocks in most of systems, FFTs are used with the protection scheme using parseval checks which detects and corrects errors. The proposed technique consume low power. A technique is proposed using parseval checks to protect the circuits from single bit errors and is further improved for multi bit errors detection and correction and are evaluated in area and delay parameters.


Sign in / Sign up

Export Citation Format

Share Document