Dependable Embedded Systems - Embedded Systems
Latest Publications


TOTAL DOCUMENTS

25
(FIVE YEARS 25)

H-INDEX

0
(FIVE YEARS 0)

Published By Springer International Publishing

9783030520168, 9783030520175

Author(s):  
Florian Kriebel ◽  
Kuan-Hsun Chen ◽  
Semeen Rehman ◽  
Jörg Henkel ◽  
Jian-Jia Chen ◽  
...  

AbstractFor generating and executing dependable software, the effects of hardware layer faults at the software layer have to be accurately analyzed and modeled. This requires relevant information from the hardware and software layers, as well as an in-depth analysis of how an application’s outputs are affected by errors, and quantifying the error masking and error propagation on the software layer. Based on this analysis, techniques for generating dependable software can be proposed, e.g., by different dependability-aware compiler-based software transformations or selective instruction protection. Beside functional aspects, timing also plays an important role, as oftentimes tasks have to be finished before a certain deadline to provide useful information, especially in real-time systems. Both aspects are jointly taken into account by the run-time system software which decides—with the help of offline and online-generated data—for multiple concurrently executing applications how to protect and when to execute which application task to optimize for dependability and timing correctness. This is achieved for example by selecting appropriate application versions and protection levels for single and multi-core systems—for example using redundant multithreading (RMT) in different modes—under tolerable performance overhead constraints.


Author(s):  
Eberle A. Rambo ◽  
Rolf Ernst

AbstractThe ASTEROID project developed a cross-layer fault-tolerance solution to provide reliable software execution on unreliable hardware under soft errors. The approach is based on replicated software execution with hardware support for error detection that exploits future many-core platforms to increase reliability without resorting to redundancy in hardware. This chapter gives an overview of ASTEROID and then focuses on the performance of replicated execution and the proposed replica-aware co-scheduling for mixed-criticality. The performance of systems with replicated execution strongly depends on the scheduling. Standard schedulers, such as Partitioned Strict Priority Preemptive (SPP) and Time-Division Multiplexing (TDM)-based ones, although widely employed, provide poor performance in face of replicated execution. By exploiting co-scheduling, the replica-aware co-scheduling is able to achieve superior performance.


Author(s):  
Horst Schirmeier ◽  
Christoph Borchert ◽  
Martin Hoffmann ◽  
Christian Dietrich ◽  
Arthur Martens ◽  
...  

AbstractAs all conceptual layers in the software stack depend on the operating system (OS) to reliably provide resource-management services and isolation, it can be considered the “reliable computing base” that must be hardened for correct operation under fault models such as transient hardware faults in the memory hierarchy. In this chapter, we approach the problem of system-software hardening in three complementary scenarios. (1) We address the following research question: Where do the general reliability limits of static system-software stacks lie, if designed from scratch with reliability as a first-class design goal? In order to reduce the proverbial “attack surface” as far as possible, we harness static application knowledge from an AUTOSAR-compliant task set, and protect the whole OS kernel with AN-encoding. This static approach yields an extremely reliable software system, but is constrained to specific application domains. (2) We investigate how reliable a dynamic COTS embedded OS can become if hardened with programming-language and compiler-based fault-tolerance techniques. We show that aspect-oriented programming is an appropriate means to encapsulate generic software-implemented hardware fault tolerance mechanisms that can be application-specifically applied to a selection of OS components. (3) We examine how system-software stacks can survive even more adverse fault models like whole-system outages, using emerging persistent memory (PM) technology as a vehicle for state conservation. Our findings include that software transactional memory facilitates maintaining consistent state within PM and allows fast recovery.


Author(s):  
Florian Kriebel ◽  
Faiq Khalid ◽  
Bharath Srinivas Prabakaran ◽  
Semeen Rehman ◽  
Muhammad Shafique

AbstractFault-tolerance using (full-scale) redundancy-based techniques has been employed to detect and correct reliability errors (i.e., soft errors), but they pose significant area and power overhead. On the other hand, due to the masking and the error tolerance properties at different system layers and of different applications, respectively, reliable heterogeneous architectures have been emerged as an attractive design choice for power-efficient dependable computing platforms. This chapter discusses the building blocks of such computing systems, based on both embedded and superscalar processors, with different reliability (fault-tolerant) modes at the architecture layer to memories like caches, for heterogeneous in-order and out-of-order processors. We provide a comprehensive reliability, i.e., soft error, vulnerability analysis of different components in in-order and out-of-order processors, e.g., caches. We also discuss different methodologies to improve the performance and power of such a system by analyzing these vulnerabilities. Moreover, we show how such heterogeneous hardware-level hardening modes can further be complemented by software-level techniques that can be realized using a reliability-driven compiler (as introduced in Chapter “Dependable Software Generation and Execution on Embedded Systems”).


Author(s):  
Amir Mahdi Hosseini Monazzah ◽  
Amir M. Rahmani ◽  
Antonio Miele ◽  
Nikil Dutt

AbstractDue to the consistent pressing quest of larger on-chip memories and caches of multicore and manycore architectures, Spin Transfer Torque Magnetic RAM (STT-MRAM or STT-RAM) has been proposed as a promising technology to replace classical SRAMs in near-future devices. Main advantages of STT-RAMs are a considerably higher transistor density and a negligible leakage power compared with SRAM technology. However, the drawback of this technology is the high probability of errors occurring especially in write operations. Such errors are asymmetric and transition-dependent, where 0 → 1 is the most critical one, and is high subjected to the amount and current (voltage) supplied to the memory during the write operation. As a consequence, STT-RAMs present an intrinsic trade-off between energy consumption vs. reliability that needs to be properly tuned w.r.t. the currently running application and its reliability requirement. This chapter proposes FlexRel, an energy-aware reliability improvement architectural scheme for STT-RAM cache memories. FlexRel considers a memory architecture provided with Error Correction Codes (ECCs) and a custom current regulator for the various cache ways and conducts a trade-off between reliability and energy consumption. FlexRel cache controller dynamically profiles the number of 0 → 1 transitions of each individual bit write operation in a cache block and based on that selects the most-suitable cache way and current level to guarantee the necessary error rate threshold (in terms of occurred write errors) while minimizing the energy consumption. We experimentally evaluated the efficiency of FlexRel against the most efficient uniform protection scheme from reliability, energy, area, and performance perspectives. Experimental simulations performed by using gem5 has demonstrated that while FlexRel satisfies the given error rate threshold, it delivers up to 13.2% energy saving. From the area footprint perspective, FlexRel delivers up to 7.9% cache ways’ area saving. Furthermore, the performance overhead of the FlexRel algorithm which changes the traffic patterns of the cache ways during the executions is 1.7%, on average.


Author(s):  
Mahfuzul Islam ◽  
Hidetoshi Onodera

AbstractCross-layer resiliency has become a critical deciding factor for any successful product. This chapter focuses on monitor circuits that are essential in realizing the cross-layer resiliency. The role of monitor circuits is to establish a bridge between the hardware and other layers by providing information about the devices and the operating environment in run-time. This chapter explores delay-based monitor circuits for design automation with the existing cell-based design methodology. The chapter discusses several design techniques to monitor parameters of threshold voltage, temperature, leakage current, critical delay, and aging. The chapter then demonstrates a reconfigurable architecture to monitor multiple parameters with small area footprint. Finally, an extraction methodology of physical parameters is discussed for model-hardware correlation. Utilizing the cell-based design flow, delay-based monitors can be placed inside the target digital circuit and thus a better correlation between monitor and target circuit behavior can be realized.


Author(s):  
Jian-Jia Chen ◽  
Joerg Henkel

AbstractResearch and development in the last decades have led to a silicon process that has been expected to become inherently undependable in the near future when migrating towards new technologies. The special priority program (SPP) 1500 funded by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) in 2010–2016 and the Variability Expedition funded by the National Science Foundation (NSF) in 2010–2015 made a joint effort to explore design challenges of Power Consumption, Reliability, Interference, and Manufacturability under such a design requirement.


Author(s):  
Muhammad Abdullah Hanif ◽  
Faiq Khalid ◽  
Rachmad Vidya Wicaksana Putra ◽  
Mohammad Taghi Teimoori ◽  
Florian Kriebel ◽  
...  

AbstractThe drive for automation and constant monitoring has led to rapid development in the field of Machine Learning (ML). The high accuracy offered by the state-of-the-art ML algorithms like Deep Neural Networks (DNNs) has paved the way for these algorithms to being used even in the emerging safety-critical applications, e.g., autonomous driving and smart healthcare. However, these applications require assurance about the functionality of the underlying systems/algorithms. Therefore, the robustness of these ML algorithms to different reliability and security threats has to be thoroughly studied and mechanisms/methodologies have to be designed which result in increased inherent resilience of these ML algorithms. Since traditional reliability measures like spatial and temporal redundancy are costly, they may not be feasible for DNN-based ML systems which are already super computer and memory intensive. Hence, new robustness methods for ML systems are required. Towards this, in this chapter, we present our analyses illustrating the impact of different reliability and security vulnerabilities on the accuracy of DNNs. We also discuss techniques that can be employed to design ML algorithms such that they are inherently resilient to reliability and security threats. Towards the end, the chapter provides open research challenges and further research opportunities.


Author(s):  
Christian Weis ◽  
Christina Gimmler-Dumont ◽  
Matthias Jung ◽  
Norbert Wehn

AbstractMany applications show an inherent error resilience due to their probabilistic behavior. This inherent error resilience can be exploited to reduce the design margin for advanced technology nodes resulting in more energy and area efficient implementation. We present in this chapter a cross-layer approach for efficient reliability management in wireless baseband processing with special emphasis on memories since memories are most susceptible to dependability problems. A multiple-antenna (MIMO) system will be used as design example. Further on we focus on DRAMs (Dynamic Random Access Memories). All today’s computing systems rely on dependable DRAMs. In the future DRAM memories will become more undependable due to further scaling. This has to be counterbalanced with higher refresh rates, which leads to a higher DRAM power consumption. Recent research activities resulted in the concept of “approximate DRAM” to save power and improve performance by lowering the refresh rate or disabling refresh completely. Here, we present a holistic simulation environment for investigations on approximate DRAM and show the impact on error-resilient applications.


Author(s):  
Daniel Mueller-Gritschneder ◽  
Eric Cheng ◽  
Uzair Sharif ◽  
Veit Kleeberger ◽  
Pradip Bose ◽  
...  

AbstractDriven by technology scaling, integrated systems become more susceptible to various causes of random hardware faults such as radiation-induced soft errors. Such soft errors may cause malfunction of the system due to corruption of data or control flow, which may lead to unacceptable risks for life or property in safety-critical applications. Hence, safety-critical systems deploy protection techniques such as hardening and redundancy at different layers of the system stack (circuit, logic, architecture, OS/schedule, compiler, software, algorithm) to improve resiliency against soft errors. Here, cross-layer resilience techniques aim at finding lower cost solutions by providing accurate estimation of soft error resilience combined with a systematic exploration of protection techniques that work collaboratively across the system stack. This chapter demonstrates how to apply the cross-layer resilience principle on custom processors, fixed-hardware processors, accelerators, and SRAM memories (with a focus on soft errors) and presents key insights obtained.


Sign in / Sign up

Export Citation Format

Share Document