scholarly journals Decentralized Validation for Non-malicious Arbitrary Fault Tolerance in Paxos

2019 ◽  
Author(s):  
Rodrigo Barbieri ◽  
Enrique dos Santos ◽  
Gustavo Maciel Dias Vieira

Fault-tolerant distributed systems offer high reliability because even if faults in their components occur, they do not exhibit erroneous behavior. Depending on the fault model adopted, hardware and software errors that do not result in a process crashing are usually not tolerated. To tolerate these rather common failures the usual solution is to adopt a stronger fault model, such as the arbitrary or Byzantine fault model. Algorithms created for this fault model, however, are considerably more complex and require more system resources than the ones developed for less strict fault models. One approach to reach a middle ground is the non-malicious arbitrary fault model. In this paper we describe how we incremented an implementation of active replication in the non-malicious fault model with a basic type of distributed validation, where a deviation from the expected algorithm behavior will make a process crash. We experimentally evaluate this implementation using a fault injection framework showing that it is feasible to extend the concept of non-malicious failures beyond hardware failures.

2012 ◽  
Vol 182-183 ◽  
pp. 1265-1269
Author(s):  
Zu Ming Xu ◽  
Xiong Fu

Wireless sensor networks require energy-efficient and robust routingprotocols. Most routing protocols for sensor networks try to extendnetwork lifetime by minimizing the energy consumption, but have not taken the network reliability into account. In this paper, we analyze the fault models and propose an ENergy-aware FAult-tolerantRouting scheme, termed as ENFAR. Firstly a link-based uniform fault model is presented, and we adopt a cross-layer design to measurethe transmission delay so as to detect the failed nodes.


Author(s):  
Peter Marwedel

AbstractUnfortunately, we cannot rely on designed and possibly already manufactured systems to operate as expected. These systems may have become defective during their use, or their function may have been compromised during the fabrication or their design. The purpose of testing is to verify whether or not an existing embedded/cyber-physical system can be operated as expected. In this chapter, we will present fundamental terms and techniques for testing. There will be a brief introduction to the aims of test pattern generation and their application. We will be introducing terms such as fault model, fault coverage, fault simulation, and fault injection. Also, we will be presenting techniques which improve testability, including the generation of pseudo-random patterns, and signature analysis. It would be beneficial to consider testability issues already during design. In case of fault-tolerant systems, resilience must be verified.


2021 ◽  
Author(s):  
Raha Abedi

One of the main goals of fault injection techniques is to evaluate the fault tolerance of a design. To have greater confidence in the fault tolerance of a system, an accurate fault model is essential. While more accurate than gate level, transistor level fault models cannot be synthesized into FPGA chips. Thus, transistor level faults must be mapped to the gate level to obtain both accuracy and synthesizability. Re-synthesizing a large system for fault injection is not cost effective when the number of faults and system complexity are high. Therefore, the system must be divided into partitions to reduce the re-synthesis time as faults are injected only into a portion of the system. However, the module-based partial reconfiguration complexity rises with an increase in the total number of partitions in the system. An unbalanced partitioning methodology is introduced to reduce the total number of partitions in a system while the size of the partitions where faults are to be injected remains small enough to achieve an acceptable re-synthesis time.


2021 ◽  
Vol 26 (6) ◽  
pp. 1-24
Author(s):  
Xuefei Ning ◽  
Guangjun Ge ◽  
Wenshuo Li ◽  
Zhenhua Zhu ◽  
Yin Zheng ◽  
...  

With the fast evolvement of embedded deep-learning computing systems, applications powered by deep learning are moving from the cloud to the edge. When deploying neural networks (NNs) onto the devices under complex environments, there are various types of possible faults: soft errors caused by cosmic radiation and radioactive impurities, voltage instability, aging, temperature variations, malicious attackers, and so on. Thus, the safety risk of deploying NNs is now drawing much attention. In this article, after the analysis of the possible faults in various types of NN accelerators, we formalize and implement various fault models from the algorithmic perspective. We propose Fault-Tolerant Neural Architecture Search (FT-NAS) to automatically discover convolutional neural network (CNN) architectures that are reliable to various faults in nowadays devices. Then, we incorporate fault-tolerant training (FTT) in the search process to achieve better results, which is referred to as FTT-NAS. Experiments on CIFAR-10 show that the discovered architectures outperform other manually designed baseline architectures significantly, with comparable or fewer floating-point operations (FLOPs) and parameters. Specifically, with the same fault settings, F-FTT-Net discovered under the feature fault model achieves an accuracy of 86.2% (VS. 68.1% achieved by MobileNet-V2), and W-FTT-Net discovered under the weight fault model achieves an accuracy of 69.6% (VS. 60.8% achieved by ResNet-18). By inspecting the discovered architectures, we find that the operation primitives, the weight quantization range, the capacity of the model, and the connection pattern have influences on the fault resilience capability of NN models.


2021 ◽  
Author(s):  
Raha Abedi

One of the main goals of fault injection techniques is to evaluate the fault tolerance of a design. To have greater confidence in the fault tolerance of a system, an accurate fault model is essential. While more accurate than gate level, transistor level fault models cannot be synthesized into FPGA chips. Thus, transistor level faults must be mapped to the gate level to obtain both accuracy and synthesizability. Re-synthesizing a large system for fault injection is not cost effective when the number of faults and system complexity are high. Therefore, the system must be divided into partitions to reduce the re-synthesis time as faults are injected only into a portion of the system. However, the module-based partial reconfiguration complexity rises with an increase in the total number of partitions in the system. An unbalanced partitioning methodology is introduced to reduce the total number of partitions in a system while the size of the partitions where faults are to be injected remains small enough to achieve an acceptable re-synthesis time.


2019 ◽  
Vol 63 (5) ◽  
pp. 758-773
Author(s):  
Matthew Leeke

Abstract The application of machine learning to software fault injection data has been shown to be an effective approach for the generation of efficient error detection mechanisms (EDMs). However, such approaches to the design of EDMs have invariably adopted a fault model with a single-fault assumption, limiting the relevance of the detectors and their evaluation. Software containing more than a single fault is commonplace, with safety standards recognizing that critical failures are often the result of unlikely or unforeseen combinations of faults. This paper addresses this shortcoming, demonstrating that it is possible to generate efficient EDMs under simultaneous fault models. In particular, it is shown that (i) efficient EDMs can be designed using fault injection data collected under models accounting for the occurrence of simultaneous faults, (ii) exhaustive fault injection under a simultaneous bit flip model can yield improved EDM efficiency, (iii) exhaustive fault injection under a simultaneous bit flip model can be made non-exhaustive and (iv) EDMs can be relocated within a software system using program slicing, reducing the resource costs of experimentation to practicable levels without sacrificing EDM efficiency.


Electronics ◽  
2019 ◽  
Vol 8 (8) ◽  
pp. 851 ◽  
Author(s):  
Gil-Tomàs ◽  
Gracia-Morán ◽  
Saiz-Adalid ◽  
Gil-Vicente

Due to the increasing defect rates in highly scaled complementary metal–oxide–semiconductor (CMOS) devices, and the emergence of alternative nanotechnology devices, reliability challenges are of growing importance. Understanding and controlling the fault mechanisms associated with new materials and structures for both transistors and interconnection is a key issue in novel nanodevices. The graphene nanoribbon field-effect transistor (GNR FET) has revealed itself as a promising technology to design emerging research logic circuits, because of its outstanding potential speed and power properties. This work presents a study of fault causes, mechanisms, and models at the device level, as well as their impact on logic circuits based on GNR FETs. From a literature review of fault causes and mechanisms, fault propagation was analyzed, and fault models were derived for device and logic circuit levels. This study may be helpful for the prevention of faults in the design process of graphene nanodevices. In addition, it can help in the design and evaluation of defect- and fault-tolerant nanoarchitectures based on graphene circuits. Results are compared with other emerging devices, such as carbon nanotube (CNT) FET and nanowire (NW) FET.


Author(s):  
Gian Franco Sacco ◽  
Robert D. Ferraro ◽  
Paul von Allmen ◽  
Dave A. Rennels

Sign in / Sign up

Export Citation Format

Share Document