Quantitative analysis and optimization techniques for on-chip cache leakage power

2005 ◽  
Vol 13 (10) ◽  
pp. 1147-1156 ◽  
Author(s):  
Nam Sung Kim ◽  
D. Blaauw ◽  
T. Mudge
2013 ◽  
Vol 22 (05) ◽  
pp. 1350038 ◽  
Author(s):  
TIEFEI ZHANG ◽  
TIANZHOU CHEN ◽  
JIANZHONG WU ◽  
YOUTIAN QU

Due to its low leakage power and high density, spin torque transfer RAM (STT-RAM) has become a good candidate for future on-chip cache. However, STT-RAM suffers from higher write energy compared to the SRAM. One state-of-the-art technique to alleviate this problem is read-before-write (RBW). In this paper, we study the pattern of the write accesses to the L2 cache and show that directly applying the RBW to a STT-RAM L2 cache can be problematic from energy perspective. We then propose a selective read-before-write (SRW) scheme to further reduce the dynamic write energy of the STT-RAM cache. Additional optimizations are included in the design of SRW so that it can save a considerable amount of energy at negligible overheads. The experimental results show that SRW achieves a 86.0% reduction in write energy consumption vs. a baseline without any write optimization techniques, and a 6.55% more reduction compared to the RBW scheme.


The need for miniaturization has been the driving force in chip manufacturing. The proliferation of IoT, robotics, consumer electronics and medical instruments pose unprecedented demands on the embedded system design. The area optimization can be achieved either by reducing the size of transistors or by optimizing (reducing) the circuit at the gate level. The first solution has attracted many researchers while the later has not been explored to its full potential. The aim is to design a System on Chip (SoC) to satisfy the dynamic requirements of disruptive technologies while occupying the lesser area. The design and testing of communication interfaces such as Serial Peripheral Interface (SPI), Inter-IC Communication (I2C), Universal Asynchronous Receiver and Transmitter (UART) are very crucial in the area optimization of microcontroller design. Since SPI being an important communication protocol, this work reports the preliminary research carried in the design and verification of it. In this work, Verilog is used for the design and verification of the SPI module. The results show that there is a drastic reduction in the number of Look-Up-Tables (LUTs) and slices required to build the circuit. We conclude that sophisticated optimization techniques of the circuit at the gate level has the potential to reduce the area by half.


2021 ◽  
Author(s):  
Muhammad Obaidullah

Network-on-Chip (NoC) has been proposed as an interconnection framework for connecting large number of cores for a System-on-Chip (SoC). Assuming a mesh-based NoC, we investigate application mapping and NoC configuration optimization using a hybrid optimization scheme. Our technique, Hybrid Discrete Particle Swarm Optimization (HDPSO), combines Tabu-search, communication volume based core swapping, and swarm intelligence. We employ a Tabu-list to discourage swarm particles to re-visit the explored search space and propose an alternative route towards the intended movement direction. In each iteration of swarm, a sub-swarm containing configuration solutions (sub-particles) searches for optimal configuration for the parent particle (mapping solution). Optimization goals include minimum average communication latency, power, area, credit loop latency, and maximum average link duty factor. The proposed technique is tested for well-known multimedia application core graphs and several large synthetic cores-graphs. It was found that on average our hybrid scheme generates high quality NoC mapping and configuration solutions when compared to some existing stochastic optimization techniques.


2018 ◽  
Vol 28 (01) ◽  
pp. 1950011
Author(s):  
Khushbu Chandrakar ◽  
Suchismita Roy

A possible solution to handle the rising complexity of modern Systems-on-Chip (SoCs) is to raise the level of abstraction for the design and optimization. A better optimization of performance and power can be achieved at higher abstraction levels by applying suitable optimization techniques. Insertion of clock gating logic into the generated Register-Transfer Level (RTL) would facilitate lowering dynamic power consumption by switching off the clock signal to portions of the circuit not currently in use and thereby reducing unnecessary toggling. In this work, we have tried to minimize the power consumption of synchronous circuits by reducing the number of activity string patterns. Activity-driven clock trees have been used wherein sections of the clock tree are turned off by gating the clock signals. Since gating the clock signal implies additional control signals and gates, there is always a trade-off existing between the logic circuit area overhead and the total power consumption of the clock tree. A pseudo-Boolean satisfiability (PB-SAT)-based approach is proposed in this work which focuses on the reduction of power consumption by reducing the activity pattern of the clock tree which will reduce the power consumption with appropriate module-binding solutions.


2015 ◽  
Vol 2015 ◽  
pp. 1-16 ◽  
Author(s):  
Feng Wang ◽  
Xiantuo Tang ◽  
Zuocheng Xing

Network-on-Chip (NoC) is one of critical communication architectures for future many-core systems. As technology is continually scaling down, on-chip network meets the increasing leakage power crisis. As a leakage power mitigation technique, power-gating can be utilized in on-chip network to solve the crisis. However, the network performance is severely affected by the disconnection in the conventional power-gated NoC. In this paper, we propose a novel partial power-gating approach to improve the performance in the power-gated NoC. The approach mainly involves a direction-slicing scheme, an improved routing algorithm, and a deadlock recovery mechanism. In the synthetic traffic simulation, the proposed design shows favorable power-efficiency at low-load range and achieves better performance than the conventional power-gated one. For the application trace simulation, the design in the mesh/torus network consumes 15.2%/18.9% more power on average, whereas it can averagely obtain 45.0%/28.7% performance improvement compared with the conventional power-gated design. On balance, the proposed design with partial power-gating has a better tradeoff between performance and power-efficiency.


Sign in / Sign up

Export Citation Format

Share Document