Power-Aware Fault-Tolerance for Embedded Systems
AbstractPower-constrained fault-tolerance has emerged as a key challenge in the deep sub-micron technology. Multi-/many-core chips can support different hardening modes considering variants of redundant multithreading (RMT). In dark silicon chips, the maximum number of cores that can simultaneously be powered-on (at the full performance level) is constrained by the thermal design power (TDP). The rest of the cores have to be power-gated (i.e., stay “dark”), or the cores have to operate at a lower performance level. It has been predicted that about 25–50% of a many-core chip can potentially be “dark.” In this chapter, a system-level power–reliability management technique is presented. The technique jointly considers multiple hardening modes at the software and hardware levels, each offering distinct power, reliability, and performance properties. Also, a framework for the system-level optimization is introduced which considers different power–reliability–performance management problems for many-core processors depending upon the target system and user constraints.