Dynamically Reconfiguring Processor Resources to Reduce Power Consumption in High-Performance Processors

Author(s):  
Roberto Maro ◽  
Yu Bai ◽  
R. Iris Bahar
2007 ◽  
Vol 16 (02) ◽  
pp. 169-179 ◽  
Author(s):  
INHWA JUNG ◽  
MOO-YOUNG KIM ◽  
CHULWOO KIM

In many VLSI chips, the power dissipation of the clocking system that includes clock distribution network and flip-flops is often the largest portion of total chip power consumption. In the near future, this portion is likely to dominate total chip power consumption due to higher clock frequency and deeper pipeline design trend. Traditionally, two approaches have been used: (1) to reduce power consumption in the clock tree, several low-swing clock flip-flops and double-edge flip-flops have been introduced; (2) to reduce power consumption in flip-flops, conditional capture, clock-on-demand, data-transition look-ahead techniques have been developed. Recently, pulsed latch type flip-flops are introduced in several high-performance microprocessors to reduce E × D. In this paper, these flip-flops are described with their pros and cons. Then, a new circuit technique is described along with simulation results. The proposed pulsed latch reduces E × D by 82.6% to 95.4% compared to conventional flip-flops.


2012 ◽  
Vol 13 (03n04) ◽  
pp. 1250010
Author(s):  
SHAOSHAN LIU ◽  
WON W. RO ◽  
CHEN LIU ◽  
ALFREDO CRISTOBAL-SALAS ◽  
CHRISTOPHE CÉRIN ◽  
...  

The computer industry is moving towards two extremes: extremely high-performance high-throughput cloud computing, and low-power mobile computing. Cloud computing, while providing high performance, is very costly. Google and Microsoft Bing spend billions of dollars each year to maintain their server farms, mainly due to the high power bills. On the other hand, mobile computing is under a very tight energy budget, but yet the end users demand ever increasing performance on these devices. This trend indicates that conventional architectures are not able to deliver high-performance and low power consumption at the same time, and we need a new architecture model to address the needs of both extremes. In this paper, we thus introduce our Extremely Heterogeneous Architecture (EHA) project: EHA is a novel architecture that incorporates both general-purpose and specialized cores on the same chip. The general-purpose cores take care of generic control and computation. On the other hand, the specialized cores, including GPU, hard accelerators (ASIC accelerators), and soft accelerators (FPGAs), are designed for accelerating frequently used or heavy weight applications. When acceleration is not needed, the specialized cores are turned off to reduce power consumption. We demonstrate that EHA is able to improve performance through acceleration, and at the same time reduce power consumption. Since EHA is a heterogeneous architecture, it is suitable for accelerating heterogeneous workloads on the same chip. For example, data centers and clouds provide many services, including media streaming, searching, indexing, scientific computations. The ultimate goal of the EHA project is two-fold: first, to design a chip that is able to run different cloud services on it, and through this design, we would be able to greatly reduce the cost, both recurring and non-recurring, of data centers\clouds; second, to design a light-weight EHA that runs on mobile devices, providing end users with improved experience even under tight battery budget constraints.


2014 ◽  
Vol E97.B (12) ◽  
pp. 2698-2705
Author(s):  
Tomoyuki HINO ◽  
Hitoshi TAKESHITA ◽  
Kiyo ISHII ◽  
Junya KURUMIDA ◽  
Shu NAMIKI ◽  
...  

Author(s):  
A. Ferrerón Labari ◽  
D. Suárez Gracia ◽  
V. Viñals Yúfera

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.


Author(s):  
Deepika Bansal ◽  
Bal Chand Nagar ◽  
Brahamdeo Prasad Singh ◽  
Ajay Kumar

Background & Objective: In this paper, a modified pseudo domino configuration has been proposed to improve the leakage power consumption and Power Delay Product (PDP) of dynamic logic using Carbon Nanotube MOSFETs (CN-MOSFETs). The simulations for proposed and published domino circuits are verified by using Synopsys HSPICE simulator with 32nm CN-MOSFET technology which is provided by Stanford. Methods: The simulation results of the proposed technique are validated for improvement of wide fan-in domino OR gate as a benchmark circuit at 500 MHz clock frequency. Results: The proposed configuration is suitable for cascading of the high performance wide fan-in circuits without any charge sharing. Conclusion: The performance analysis of 8-input OR gate demonstrate that the proposed circuit provides lower static and dynamic power consumption up to 62 and 40% respectively, and PDP improvement is 60% as compared to standard domino circuit.


Nanophotonics ◽  
2020 ◽  
Vol 10 (2) ◽  
pp. 937-945
Author(s):  
Ruihuan Zhang ◽  
Yu He ◽  
Yong Zhang ◽  
Shaohua An ◽  
Qingming Zhu ◽  
...  

AbstractUltracompact and low-power-consumption optical switches are desired for high-performance telecommunication networks and data centers. Here, we demonstrate an on-chip power-efficient 2 × 2 thermo-optic switch unit by using a suspended photonic crystal nanobeam structure. A submilliwatt switching power of 0.15 mW is obtained with a tuning efficiency of 7.71 nm/mW in a compact footprint of 60 μm × 16 μm. The bandwidth of the switch is properly designed for a four-level pulse amplitude modulation signal with a 124 Gb/s raw data rate. To the best of our knowledge, the proposed switch is the most power-efficient resonator-based thermo-optic switch unit with the highest tuning efficiency and data ever reported.


Author(s):  
Chun-Yuan Lin ◽  
Jin Ye ◽  
Che-Lun Hung ◽  
Chung-Hung Wang ◽  
Min Su ◽  
...  

Current high-end graphics processing units (abbreviate to GPUs), such as NVIDIA Tesla, Fermi, Kepler series cards which contain up to thousand cores per-chip, are widely used in the high performance computing fields. These GPU cards (called desktop GPUs) should be installed in personal computers/servers with desktop CPUs; moreover, the cost and power consumption of constructing a high performance computing platform with these desktop CPUs and GPUs are high. NVIDIA releases Tegra K1, called Jetson TK1, which contains 4 ARM Cortex-A15 CPUs and 192 CUDA cores (Kepler GPU) and is an embedded board with low cost, low power consumption and high applicability advantages for embedded applications. NVIDIA Jetson TK1 becomes a new research direction. Hence, in this paper, a bioinformatics platform was constructed based on NVIDIA Jetson TK1. ClustalWtk and MCCtk tools for sequence alignment and compound comparison were designed on this platform, respectively. Moreover, the web and mobile services for these two tools with user friendly interfaces also were provided. The experimental results showed that the cost-performance ratio by NVIDIA Jetson TK1 is higher than that by Intel XEON E5-2650 CPU and NVIDIA Tesla K20m GPU card.


Author(s):  
GOPALA KRISHNA.M ◽  
UMA SANKAR.CH ◽  
NEELIMA. S ◽  
KOTESWARA RAO.P

In this paper, presents circuit design of a low-power delay buffer. The proposed delay buffer uses several new techniques to reduce its power consumption. Since delay buffers are accessed sequentially, it adopts a ring-counter addressing scheme. In the ring counter, double-edge-triggered (DET) flip-flops are utilized to reduce the operating frequency by half and the C-element gated-clock strategy is proposed. Both total transistor count and the number of clocked transistors are significantly reduced to improve power consumption and speed in the flip-flop. The number of transistors is reduced by 56%-60% and the Area-Speed-Power product is reduced by 56%-63% compared to other double edge triggered flip-flops. This design is suitable for high-speed, low-power CMOS VLSI design applications.


Sign in / Sign up

Export Citation Format

Share Document