Resource Management for Improving Overall Reliability of Multi-Processor Systems-on-Chip
AbstractMulti-processor systems on a chip (MPSoCs) are widely deployed in real-time embedded systems. In such systems, soft-error reliability (caused by transient faults) and lifetime reliability (caused by permanent faults) are both imperative design concerns. Most existing work considers only one of the two classes of faults. Unfortunately, techniques that increase one may adversely impact the other. Achieving high overall reliability requires a trade-off of soft-error reliability and lifetime reliability. In this chapter, we first introduce concepts and models associated with the two reliability metrics, then present two techniques that optimize them separately. Finally, we show how to make appropriate trade-offs using two case studies involving “big–little” type MPSoCs and CPU–GPU integrated MPSoCs.