Exploiting Memory Resilience for Emerging Technologies: An Energy-Aware Resilience Exemplar for STT-RAM Memories

Dependable Embedded Systems - Embedded Systems ◽

10.1007/978-3-030-52017-5_21 ◽

2020 ◽

pp. 505-526

Author(s):

Amir Mahdi Hosseini Monazzah ◽

Amir M. Rahmani ◽

Antonio Miele ◽

Nikil Dutt

Keyword(s):

Energy Consumption ◽

Error Rate ◽

Spin Transfer Torque ◽

Error Correction Codes ◽

Energy Aware ◽

Trade Off ◽

Protection Scheme ◽

And Performance ◽

On Chip ◽

Rate Threshold

AbstractDue to the consistent pressing quest of larger on-chip memories and caches of multicore and manycore architectures, Spin Transfer Torque Magnetic RAM (STT-MRAM or STT-RAM) has been proposed as a promising technology to replace classical SRAMs in near-future devices. Main advantages of STT-RAMs are a considerably higher transistor density and a negligible leakage power compared with SRAM technology. However, the drawback of this technology is the high probability of errors occurring especially in write operations. Such errors are asymmetric and transition-dependent, where 0 → 1 is the most critical one, and is high subjected to the amount and current (voltage) supplied to the memory during the write operation. As a consequence, STT-RAMs present an intrinsic trade-off between energy consumption vs. reliability that needs to be properly tuned w.r.t. the currently running application and its reliability requirement. This chapter proposes FlexRel, an energy-aware reliability improvement architectural scheme for STT-RAM cache memories. FlexRel considers a memory architecture provided with Error Correction Codes (ECCs) and a custom current regulator for the various cache ways and conducts a trade-off between reliability and energy consumption. FlexRel cache controller dynamically profiles the number of 0 → 1 transitions of each individual bit write operation in a cache block and based on that selects the most-suitable cache way and current level to guarantee the necessary error rate threshold (in terms of occurred write errors) while minimizing the energy consumption. We experimentally evaluated the efficiency of FlexRel against the most efficient uniform protection scheme from reliability, energy, area, and performance perspectives. Experimental simulations performed by using gem5 has demonstrated that while FlexRel satisfies the given error rate threshold, it delivers up to 13.2% energy saving. From the area footprint perspective, FlexRel delivers up to 7.9% cache ways’ area saving. Furthermore, the performance overhead of the FlexRel algorithm which changes the traffic patterns of the cache ways during the executions is 1.7%, on average.

Case study of cost and performance trade-off analysis for mixed-signal integration in system-on-chip

Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03. ◽

10.1109/iscas.2003.1206380 ◽

2003 ◽

Author(s):

Meigen Shen ◽

Li-Rong Zheng ◽

H. Tenhunen

Keyword(s):

System On Chip ◽

Signal Integration ◽

Trade Off ◽

Mixed Signal ◽

And Performance ◽

On Chip

Design Trade off and Performance Analysis of Router Architectures in Network-on-Chip

Procedia Computer Science ◽

10.1016/j.procs.2015.07.230 ◽

2015 ◽

Vol 56 ◽

pp. 421-426

Author(s):

Jawwad Latif ◽

Hassan Nazeer Chaudhry ◽

Sadia Azam ◽

Naveed Khan baloch

Keyword(s):

Performance Analysis ◽

Network On Chip ◽

Trade Off ◽

And Performance ◽

On Chip

Balancing Energy and Performance in Dense Linear System Solvers for Hybrid ARM+GPU platforms

CLEI electronic journal ◽

10.19153/cleiej.19.1.2 ◽

2016 ◽

Author(s):

Juan P. Silva ◽

Ernesto Dufrechou ◽

Pabl Ezzatti ◽

Enrique S. Quintana-Ortí ◽

Alfredo Remón ◽

...

Keyword(s):

Energy Consumption ◽

Linear System ◽

Energy Efficient ◽

High Performance ◽

Energy Aware ◽

Balancing Energy ◽

Dense Linear System ◽

And Performance ◽

Hardware Platforms ◽

Time And Energy

The high performance computing community has traditionally focused uniquely on the reduction of execution time, though in the last years, the optimization of energy consumption has become a main issue. A reduction of energy usage without a degradation of performance requires the adoption of energy-efficient hardware platforms accompanied by the development of energy-aware algorithms and computational kernels. The solution of linear systems is a key operation for many scientific and engineering problems. Its relevance has motivated an important amount of work, and consequently, it is possible to find high performance solvers for a wide variety of hardware platforms. In this work, we aim to develop a high performance and energy-efficient linear system solver. In particular, we develop two solvers for a low-power CPU-GPU platform, the NVIDIA Jetson TK1. These solvers implement the Gauss-Huard algorithm yielding an efficient usage of the target hardware as well as an efficient memory access. The experimental evaluation shows that the novel proposal reports important savings in both time and energy-consumption when compared with the state-of-the-art solvers of the platform.

Energy consumption and performance comparison of DE optimization and PSO based IP core mapping technique for 2-D and 3-D network-on-chip

Semiconductor Science and Technology ◽

10.1088/1361-6641/ac038c ◽

2021 ◽

Author(s):

Jayshree ◽

Gopalakrishnan Seetharaman ◽

Debadatta Pati

Keyword(s):

Energy Consumption ◽

Network On Chip ◽

Performance Comparison ◽

Mapping Technique ◽

Ip Core ◽

And Performance ◽

On Chip ◽

Core Mapping

Dynamic Trade-off among Fault Tolerance, Energy Consumption, and Performance on a Multiple-Issue VLIW Processor

IEEE Transactions on Multi-Scale Computing Systems ◽

10.1109/tmscs.2017.2760299 ◽

2018 ◽

Vol 4 (3) ◽

pp. 327-339 ◽

Cited By ~ 4

Author(s):

Anderson L. Sartor ◽

Pedro H. E. Becker ◽

Joost Hoozemans ◽

Stephan Wong ◽

Antonio C. S. Beck

Keyword(s):

Energy Consumption ◽

Fault Tolerance ◽

Trade Off ◽

Vliw Processor ◽

And Performance

Reliability Aware Performance and Power Optimization in DVFS-Based On-Chip Networks

Dynamic Reconfigurable Network-on-Chip Design ◽

10.4018/978-1-61520-807-4.ch011 ◽

2010 ◽

pp. 277-292 ◽

Cited By ~ 3

Author(s):

Aditya Yanamandra ◽

Soumya Eachempati ◽

Vijaykrishnan Narayanan ◽

Mary Jane Irwin

Keyword(s):

Interconnection Network ◽

Error Rates ◽

Operating Conditions ◽

Error Protection ◽

Dynamically Reconfigurable ◽

Protection Scheme ◽

Dynamic Voltage ◽

Run Time ◽

And Performance ◽

On Chip

Recently, chip multi-processors (CMP) have emerged to fully utilize the increased transistor count within stringent power budgets. Transistor scaling has lead to more error-prone and defective components. Static and run-time induced variations in the circuit lead to reduced yield and reliability. Providing reliability at low overheads specifically in terms of power is a challenging task that requires innovative solutions for building future integrated chips. Static variations have been studied previously. In this proposal, we study the impact of run-time variations on reliability. On-chip interconnection network that forms the communication fabric in the CMP has a crucial role in determining the performance, power consumption and reliability of the system. We manage protecting the data in a network on chip from transient errors induced by voltage fluctuations. Variations in operating conditions result in a significant variation in the reliability of the system, motivating the need to provide tunable levels of data protection. For example, the use of Dynamic Voltage and Frequency Scaling (DVFS) technique used in most CMPs today results in voltage variation across the chip, giving rise to variable error rates across the chip. We investigated the design of a dynamically reconfigurable error protection scheme in a NoC to achieve a desired level of reliability. We protect data at the desired reliability while minimizing the power and performance overhead incurred. We obtain a maximum of 55% savings in the power expended for error protection in the network with our proposed reconfigurable ECC while maintaining constant reliability. Further, 35% reduction in the average message latency in the network is observed, making a case for providing tunability in error protection in the on-chip network fabric.

E-BaTS: Energy-Aware Scheduling for Bag-of-Task Applications in HPC Clusters

Parallel Processing Letters ◽

10.1142/s0129626415410054 ◽

2015 ◽

Vol 25 (03) ◽

pp. 1541005

Author(s):

Alexandra Vintila Filip ◽

Ana-Maria Oprescu ◽

Stefania Costache ◽

Thilo Kielmann

Keyword(s):

Energy Consumption ◽

High Performance Computing ◽

High Performance ◽

Terms Of Trade ◽

Exhaustive Search ◽

Energy Aware ◽

Trade Offs ◽

Energy Aware Scheduling ◽

And Performance ◽

Performance Computing

High-Performance Computing (HPC) systems consume large amounts of energy. As the energy consumption predictions for HPC show increasing numbers, it is important to make users aware of the energy spent for the execution of their applications. Drawing from our experience with exposing cost and performance in public clouds, in this paper we present a generic mechanism to compute fast and accurate estimates for the tradeoffs between the performance (expressed as makespan) and the energy consumption of applications running on HPC clusters. We validate our approach by implementing it in a prototype, called E-BaTS and validating it with a wide variety of HPC bags-of-tasks. Our experiments show that E-BaTS produces conservative estimates with errors below 5%, while requiring at most 12% of the energy and time of an exhaustive search for providing configurations close to the optimal ones in terms of trade-offs between energy consumption and makespan.

Investigating Energy Consumption and Performance Trade-Off for Interactive Cloud Application

IEEE Transactions on Sustainable Computing ◽

10.1109/tsusc.2017.2714959 ◽

2017 ◽

Vol 2 (2) ◽

pp. 113-126 ◽

Cited By ~ 8

Author(s):

Md Sabbir Hasan ◽

Frederico Alvares ◽

Thomas Ledoux ◽

Jean-Louis Pazat

Keyword(s):

Energy Consumption ◽

Trade Off ◽

Cloud Application ◽

And Performance

Fine-grained dynamic voltage and frequency scaling for precise energy and performance trade-off based on the ratio of off-chip access to on-chip computation times

Proceedings Design, Automation and Test in Europe Conference and Exhibition ◽

10.1109/date.2004.1268819 ◽

2004 ◽

Cited By ~ 28

Author(s):

Kihwan Choi ◽

R. Soma ◽

M. Pedram

Keyword(s):

Frequency Scaling ◽

Trade Off ◽

Fine Grained ◽

Dynamic Voltage ◽

And Performance ◽

On Chip

CWC: A Companion Write Cache for Energy-Aware Multi-Level Spin-Transfer Torque RAM Cache Design

Journal of Circuits System and Computers ◽

10.1142/s0218126615500796 ◽

2015 ◽

Vol 24 (06) ◽

pp. 1550079 ◽

Cited By ~ 2

Author(s):

Tiefei Zhang ◽

Jixiang Zhu ◽

Jun Fu ◽

Tianzhou Chen

Keyword(s):

Leakage Power ◽

Spin Transfer Torque ◽

Spin Transfer ◽

Energy Aware ◽

Low Leakage ◽

Non Volatile Memory ◽

Energy Consuming ◽

Multi Level ◽

On Chip ◽

The Cost

Due to its large leakage power and low density, the conventional SARM becomes less appealing to implement the large on-chip cache due to energy issue. Emerging non-volatile memory technologies, such as phase change memory (PCM) and spin-transfer torque RAM (STT-RAM), have advantages of low leakage power and high density, which makes them good candidates for on-chip cache. In particular, STT-RAM has longer endurance and shorter access latency over PCM. There are two kinds of STT-RAM so far: single-level cell (SLC) STT-RAM and multi-level cell (MLC) STT-RAM. Compared to the SLC STT-RAM, the MLC STT-RAM has higher density and lower leakage power, which makes it a even more promising candidate for future on-chip cache. However, MLC STT-RAM improves density at the cost of almost doubled write latency and energy compared to the SLC STT-RAM. These drawbacks degrade the system performance and diminish the energy benefits. To alleviate these problems, we propose a novel cache organization, companion write cache (CWC), which is a small fully associative SRAM cache, working with the main MLC STT-RAM cache in a master-and-servant way. The key function of CWC is to absorb the energy-consuming write updates from the MLC STT-RAM cache. The experimental results are promising that CWC can greatly reduce the write energy and dynamic energy, improve the performance and endurance of MLC STT-RAM cache compared to a baseline.