A Novel Compiler Assisted Approach for Issue Queue Power Reduction

2014 ◽  
Vol 543-547 ◽  
pp. 1981-1986
Author(s):  
Peng Wei Lv ◽  
Jian Qing Xiao ◽  
Sen Mao Shi

Superscalar processors contain complex control logic in order to extract sufficient instruction level parallelism (ILP). The issue logic is one of the main sources of power dissipation in current superscalar processors. It has been estimated that up to 30% of the energy consumed by a processor is in the issue logic. This paper presents a novel compiler assisted approach to power reduction where we use compiler analysis to pass information to the processor about the number of entries needed, allowing the processor to resize the issue queue dynamically which limit the number of instruction dispatched and resident in the queue reduces the energy consumption without adversely affecting performance. Compared with hardware scheme, our approach is simpler faster and saves more energy. Using the approach we achieve 43.3% dynamic and 28.5% static power savings.

2021 ◽  
Vol 11 (3) ◽  
pp. 1225
Author(s):  
Woohyong Lee ◽  
Jiyoung Lee ◽  
Bo Kyung Park ◽  
R. Young Chul Kim

Geekbench is one of the most referenced cross-platform benchmarks in the mobile world. Most of its workloads are synthetic but some of them aim to simulate real-world behavior. In the mobile world, its microarchitectural behavior has been reported rarely since the hardware profiling features are limited to the public. As a popular mobile performance workload, it is hard to find Geekbench’s microarchitecture characteristics in mobile devices. In this paper, a thorough experimental study of Geekbench performance characterization is reported with detailed performance metrics. This study also identifies mobile system on chip (SoC) microarchitecture impacts, such as the cache subsystem, instruction-level parallelism, and branch performance. After the study, we could understand the bottleneck of workloads, especially in the cache sub-system. This means that the change of data set size directly impacts performance score significantly in some systems and will ruin the fairness of the CPU benchmark. In the experiment, Samsung’s Exynos9820-based platform was used as the tested device with Android Native Development Kit (NDK) built binaries. The Exynos9820 is a superscalar processor capable of dual issuing some instructions. To help performance analysis, we enable the capability to collect performance events with performance monitoring unit (PMU) registers. The PMU is a set of hardware performance counters which are built into microprocessors to store the counts of hardware-related activities. Throughout the experiment, functional and microarchitectural performance profiles were fully studied. This paper describes the details of the mobile performance studies above. In our experiment, the ARM DS5 tool was used for collecting runtime PMU profiles including OS-level performance data. After the comparative study is completed, users will understand more about the mobile architecture behavior, and this will help to evaluate which benchmark is preferable for fair performance comparison.


Author(s):  
Dennis Wolf ◽  
Andreas Engel ◽  
Tajas Ruschke ◽  
Andreas Koch ◽  
Christian Hochberger

AbstractCoarse Grained Reconfigurable Arrays (CGRAs) or Architectures are a concept for hardware accelerators based on the idea of distributing workload over Processing Elements. These processors exploit instruction level parallelism, while being energy efficient due to their simplistic internal structure. However, the incorporation into a complete computing system raises severe challenges at the hardware and software level. This article evaluates a CGRA integrated into a control engineering environment targeting a Xilinx Zynq System on Chip (SoC) in detail. Besides the actual application execution performance, the practicability of the configuration toolchain is validated. Challenges of the real-world integration are discussed and practical insights are highlighted.


2015 ◽  
Vol 28 (3) ◽  
pp. 393-405 ◽  
Author(s):  
Sushanta Mohapatra ◽  
Kumar Pradhan ◽  
Prasanna Sahu

The present understanding of this work is about to evaluate and resolve the temperature compensation point (TCP) or zero temperature coefficient (ZTC) point for a sub-20 nm FinFET. The sensitivity of geometry parameters on assorted performances of Fin based device and its reliability over ample range of temperatures i.e. 25?C to 225?C is reviewed to extend the benchmark of device scalability. The impact of fin height (HFin), fin width (WFin), and temperature (T) on immense performance metrics including on-off ratio (Ion/Ioff), transconductance (gm), gain (AV), cut-off frequency (fT), static power dissipation (PD), energy (E), energy delay product (EDP), and sweet spot (gmfT/ID) of the FinFET is successfully carried out by commercially available TCAD simulator SentaurusTM from Synopsis Inc.


Author(s):  
B.T. Krishna ◽  
◽  
Shaik. mohaseena Salma ◽  

A flux-controlled memristor using complementary metal–oxide–(CMOS) structure is presented in this study. The proposed circuit provides higher power efficiency, less static power dissipation, lesser area, and can also reduce the power supply by using CMOS 90nm technology. The circuit is implemented based on the use of a second-generation current conveyor circuit (CCII) and operational transconductance amplifier (OTA) with few passive elements. The proposed circuit uses a current-mode approach which improves the high frequency performance. The reduction of a power supply is a crucial aspect to decrease the power consumption in VLSI. An offered emulator in this proposed circuit is made to operate incremental and decremental configurations well up to 26.3 MHZ in cadence virtuoso platform gpdk using 90nm CMOS technology. proposed memristor circuit has very little static power dissipation when operating with ±1V supply. Transient analysis, memductance analysis, and dc analysis simulations are verified practically with the Experimental demonstration by using ideal memristor made up of ICs AD844AN and CA3080, using multisim which exhibits theoretical simulation are verified and discussed.


Author(s):  
Diksha Siddhamshittiwar

Static power reduction is a challenge in deep submicron VLSI circuits. In this paper 28T full adder circuit, 14T full adder circuit and 32 bit power gated BCD adder using the full adders respectively were designed and their average power was compared. In existing work a conventional full adder is designed using 28T and the same is used to design 32 bit BCD adder. In the proposed architecture 14T transmission gate based power gated full adder is used for the design of 32 bit BCD adder. The leakage supremacy dissipated during standby mode in all deep submicron CMOS devices is reduced using efficient power gating and multi-channel technique. Simulation results were obtained using Tanner EDA and TSMC_180nm library file is used for the design of 28T full adder, 14T full adder and power gated BCD adder and a significant power reduction is achieved in the proposed architecture.


Sign in / Sign up

Export Citation Format

Share Document