Two control-flow error recovery methods for multithreaded programs running on multi-core processors

This paper presents two control-flow error recovery techniques, CFE Recovery using Data-flow graph Consideration and CFE Recovery using Macro block-level Check pointing. These techniques are proposed with regards to thread interactions in the programs. These techniques try to moderate the high memory and performance overheads of conventional control-flow checking techniques. The proposed recovery techniques are composed of two phases of control-flow error detection and recovery. These phases are designed by means of inserting additional instructions into program at compile time considering dependency graph, extracted from control-flow and data-flow dependencies among basic blocks and thread interactions in the programs. In order to evaluate the proposed techniques, five multithreaded benchmarks are utilized to run on a multi-core processor. Moreover, a total of 10000 transient faults have been injected into several executable points of each program. Fault injection experiments show that the proposed techniques recover the detected errors at-least for 91% of the cases.

Download Full-text

AN EFFECTIVE CONTROL FLOW CHECKING METHOD FOR MULTITASK PROCESSING IN HARSH ENVIRONMENTS

Journal of Circuits System and Computers ◽

10.1142/s0218126613500679 ◽

2013 ◽

Vol 22 (08) ◽

pp. 1350067 ◽

Cited By ~ 1

Author(s):

SEYYED AMIR ASGHARI ◽

ATENA ABDI ◽

OKYAY KAYNAK ◽

HASSAN TAHERI ◽

HOSSEIN PEDRAM

Keyword(s):

Error Detection ◽

Optimal Solution ◽

Control Flow ◽

Simultaneous Optimization ◽

Harsh Environments ◽

Effective Control ◽

Power Performance ◽

Single Event Upsets ◽

Novel Method ◽

And Performance

Electronic equipment used in harsh environments such as space has to cope with many threats. One major threat is the intensive radiation which gives rise to Single Event Upsets (SEU) that lead to control flow errors and data errors. In the design of embedded systems to be used in space, the use of radiation tolerant equipment may therefore be a necessity. However, even if the higher cost of such a choice is not a problem, the efficiency of such equipment is lower than the COTS equipment. Therefore, the use of COTS with appropriate measures to handle the threats may be the optimal solution, in which a simultaneous optimization is carried out for power, performance, reliability and cost. In this paper, a novel method is presented for control flow error detection in multitask environments with less memory and performance overheads as compared to other methods seen in the literature.

Download Full-text

Proposal of an Adaptive Fault Tolerance Mechanism to Tolerate Intermittent Faults in RAM

Electronics ◽

10.3390/electronics9122074 ◽

2020 ◽

Vol 9 (12) ◽

pp. 2074

Author(s):

J.-Carlos Baraza-Calvo ◽

Joaquín Gracia-Morán ◽

Luis-J. Saiz-Adalid ◽

Daniel Gil-Tomás ◽

Pedro-J. Gil-Vicente

Keyword(s):

Fault Tolerance ◽

Error Correction ◽

Error Detection ◽

Fault Injection ◽

Error Correction Codes ◽

Transient Faults ◽

Tolerance Mechanism ◽

Intermittent Faults ◽

Risc Processor ◽

Simulation Based

Due to transistor shrinking, intermittent faults are a major concern in current digital systems. This work presents an adaptive fault tolerance mechanism based on error correction codes (ECC), able to modify its behavior when the error conditions change without increasing the redundancy. As a case example, we have designed a mechanism that can detect intermittent faults and swap from an initial generic ECC to a specific ECC capable of tolerating one intermittent fault. We have inserted the mechanism in the memory system of a 32-bit RISC processor and validated it by using VHDL simulation-based fault injection. We have used two (39, 32) codes: a single error correction–double error detection (SEC–DED) and a code developed by our research group, called EPB3932, capable of correcting single errors and double and triple adjacent errors that include a bit previously tagged as error-prone. The results of injecting transient, intermittent, and combinations of intermittent and transient faults show that the proposed mechanism works properly. As an example, the percentage of failures and latent errors is 0% when injecting a triple adjacent fault after an intermittent stuck-at fault. We have synthesized the adaptive fault tolerance mechanism proposed in two types of FPGAs: non-reconfigurable and partially reconfigurable. In both cases, the overhead introduced is affordable in terms of hardware, time and power consumption.

Download Full-text

MAINTAINING DATA DEPENDENCIES ACROSS BPEL PROCESS FRAGMENTS

International Journal of Cooperative Information Systems ◽

10.1142/s0218843008001828 ◽

2008 ◽

Vol 17 (03) ◽

pp. 259-282 ◽

Cited By ~ 19

Author(s):

RANIA KHALAF ◽

OLIVER KOPP ◽

FRANK LEYMANN

Keyword(s):

Collective Behavior ◽

Data Flow ◽

Continuous Process ◽

Flow Analysis ◽

Control Flow ◽

Continuous Process Improvement ◽

Data Dependencies ◽

Data Flow Analysis ◽

Data Links ◽

Using Data

Continuous process improvement (CPI) may require a BPEL process to be split amongst different participants. In this paper, we enable splitting standard BPEL — without requiring any new middleware for the case of flat flows. The solution also supports splitting loops and scopes that have compensation and/or fault handlers. When splitting loops and scopes, we extend existing Web services standards and frameworks in a standard compliant manner in order to support the resulting split control (not data) between the fragments. Data dependencies, however, are handled directly using BPEL constructs placed in the fragments even for split loops and scopes. We present a solution that uses a BPEL process, partition information, and results of data-flow analysis to produce a BPEL process for each participant. The collective behavior of these participant processes recreates the control and data flow of the non-split process. Previous work presented process splitting using a variant of BPEL where data flow is modeled explicitly using "data links". We reuse the control flow aspect from that work as well as the control flow aspect from our work on splitting loops and scopes, focusing in this paper on maintaining the data dependencies in standard BPEL.

Download Full-text

Improving the Accuracy of Integer Signedness Error Detection Using Data Flow Analysis

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194015400331 ◽

2015 ◽

Vol 25 (09n10) ◽

pp. 1573-1593

Author(s):

Hao Sun ◽

Chao Su ◽

Yue Wang ◽

Qingkai Zeng

Keyword(s):

Real World ◽

Error Detection ◽

Data Flow ◽

Flow Analysis ◽

Data Flow Analysis ◽

Static Data ◽

Error Detector ◽

Important Challenge ◽

Using Data ◽

Security Checks

Integer signedness errors can be exploited by adversaries to cause severe damages to computer systems. Despite the significant advances in automating the detection of integer signedness errors, accurately differentiating exploitable and harmful signedness errors from unharmful ones is an important challenge. In this paper, we present the design and implementation of SignFlow, an instrumentation-based integer signedness error detector to reduce the reports for unharmful signedness errors. SignFlow first utilizes static data flow analysis to identify unharmful integer sign conversions from the view of where the source operands originate and whether the conversion results can propagate to security-related program points, and then inserts security checks for the remaining conversions so as to accomplish runtime protection. We evaluated SignFlow on 8 real-world harmful integer signedness bugs, SPECint 2006 benchmarks together with 5 real-world applications. The experimental results show that SignFlow correctly detected all harmful integer signedness bugs (i.e. no false negatives) and achieved a reduction of 41% in false positives over IntFlow, the state of the art.

Download Full-text

INCORPORATING FAULT-TOLERANT FEATURES IN VLIW PROCESSORS

International Journal of Reliability Quality and Safety Engineering ◽

10.1142/s0218539305001914 ◽

2005 ◽

Vol 12 (05) ◽

pp. 397-411

Author(s):

YUNG-YUAN CHEN

Keyword(s):

Error Detection ◽

Fault Tolerant ◽

Fault Injection ◽

Fault Simulation ◽

Error Recovery ◽

Instruction Level Parallelism ◽

Vliw Processors ◽

Test Instruction ◽

Vliw Processor ◽

Level Parallelism

In recent years, very long instruction word (VLIW) processor has attracted much attention in that it offers a high instruction level parallelism and reduces the hardware design complexity. In this paper, we present two fault-tolerant schemes for VLIW processors. The first one is termed as test-instruction scheme which is based on the concept of instruction duplication to detect the errors. The process of test-instruction scheme consists of the error detection, error rollback recovery and reconfiguration. The second approach is called self-checking scheme which adopts the concept of self-checking logic to detect the errors. A real-time error recovery procedure is developed to conquer the errors. We implement the proposed designs of fault-tolerant VLIW processor in VHDL and employ the fault injection and fault simulation to validate our schemes. The main contribution of this research is to present the complete frameworks from error detection to error recovery for fault-tolerant design of VLIW processors. Experience learned from this investigation is that the issues of error detection and error recovery entail considering together. Without taking both issues into account simultaneously, the outcomes may lead to the improper conclusions.

Download Full-text

Effects of Physical Injection of Transient Faults on Control Flow and Evaluation of Some Software-Implemented Error Detection Techniques

Dependable Computing and Fault-Tolerant Systems - Dependable Computing for Critical Applications 4 ◽

10.1007/978-3-7091-9396-9_36 ◽

1995 ◽

pp. 435-457 ◽

Cited By ~ 1

Author(s):

Ghassem Miremadi ◽

Jan Torin

Keyword(s):

Error Detection ◽

Control Flow ◽

Transient Faults ◽

Detection Techniques

Download Full-text

REPAIR: Control Flow Protection based on Register Pairing Updates for SW-Implemented HW Fault Tolerance

ACM Transactions on Embedded Computing Systems ◽

10.1145/3477001 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-22

Author(s):

Uzair Sharif ◽

Daniel Mueller-Gritschneder ◽

Ulf Schlichtmann

Keyword(s):

Fault Tolerance ◽

Data Flow ◽

Fault Injection ◽

Low Cost ◽

Computation Time ◽

Error Resilience ◽

Control Flow ◽

Soft Error ◽

High Coverage ◽

Computation Path

Safety-critical embedded systems may either use specialized hardware or rely on Software-Implemented Hardware Fault Tolerance (SIHFT) to meet soft error resilience requirements. SIHFT has the advantage that it can be used with low-cost, off-the-shelf components such as standard Micro-Controller Units. For this, SIHFT methods apply redundancy in software computation and special checker codes to detect transient errors, so called soft errors, that either corrupt the data flow or the control flow of the software and may lead to Silent Data Corruption (SDC). So far, this is done by applying separate SIHFT methods for the data and control flow protection, which leads to large overheads in computation time. This work in contrast presents REPAIR, a method that exploits the checks of the SIHFT data flow protection to also detect control flow errors as well, thereby, yielding higher SDC resilience with less computational overhead. For this, the data flow protection methods entail duplicating the computation with subsequent checks placed strategically throughout the program. These checks assure that the two redundant computation paths, which work on two different parts of the register file, yield the same result. By updating the pairing between the registers used in the primary computation path and the registers in the duplicated computation path using the REPAIR method, these checks also fail with high coverage when a control flow error, which leads to an illegal jumps, occurs. Extensive RTL fault injection simulations are carried out to accurately quantify soft error resilience while evaluating Mibench programs along with an embedded case-study running on an OpenRISC processor. Our method performs slightly better on average in terms of soft error resilience compared to the best state-of-the-art method but requiring significantly lower overheads. These results show that REPAIR is a valuable addition to the set of known SIHFT methods.

Download Full-text