Fuzzy-Based Thermal Management Scheme for 3D Chip Multicores with Stacked Caches

Lili Shen; Ning Wu; Gaizhen Yan

doi:10.3390/electronics9020346

Fuzzy-Based Thermal Management Scheme for 3D Chip Multicores with Stacked Caches

Electronics ◽

10.3390/electronics9020346 ◽

2020 ◽

Vol 9 (2) ◽

pp. 346 ◽

Cited By ~ 1

Author(s):

Lili Shen ◽

Ning Wu ◽

Gaizhen Yan

Keyword(s):

Power Consumption ◽

Thermal Management ◽

System Performance ◽

Control Policy ◽

Three Dimension ◽

Processor Core ◽

Management Scheme ◽

And Performance ◽

On Chip ◽

Silicon Vias

By using through-silicon-vias (TSV), three dimension integration technology can stack large memory on the top of cores as a last-level on-chip cache (LLC) to reduce off-chip memory access and enhance system performance. However, the integration of more on-chip caches increases chip power density, which might lead to temperature-related issues in power consumption, reliability, cooling cost, and performance. An effective thermal management scheme is required to ensure the performance and reliability of the system. In this study, a fuzzy-based thermal management scheme (FBTM) is proposed that simultaneously considers cores and stacked caches. The proposed method combines a dynamic cache reconfiguration scheme with a fuzzy-based control policy in a temperature-aware manner. The dynamic cache reconfiguration scheme determines the size of the cache for the processor core according to the application that reaches a substantial amount of power consumption savings. The fuzzy-based control policy is used to change the frequency level of the processor core based on dynamic cache reconfiguration, a process which can further improve the system performance. Experiments show that, compared with other thermal management schemes, the proposed FBTM can achieve, on average, 3 degrees of reduction in temperature and a 41% reduction of leakage energy.

Download Full-text

Power and Performance Evaluation of Memory-Intensive Applications

Energies ◽

10.3390/en14144089 ◽

2021 ◽

Vol 14 (14) ◽

pp. 4089

Author(s):

Kaiqiang Zhang ◽

Dongyang Ou ◽

Congfeng Jiang ◽

Yeliang Qiu ◽

Longchuan Yan

Keyword(s):

Energy Efficiency ◽

Energy Consumption ◽

Power Consumption ◽

Job Scheduling ◽

Memory System ◽

Processor Core ◽

Memory Efficiency ◽

And Performance ◽

Reasonable Use ◽

Server System

In terms of power and energy consumption, DRAMs play a key role in a modern server system as well as processors. Although power-aware scheduling is based on the proportion of energy between DRAM and other components, when running memory-intensive applications, the energy consumption of the whole server system will be significantly affected by the non-energy proportion of DRAM. Furthermore, modern servers usually use NUMA architecture to replace the original SMP architecture to increase its memory bandwidth. It is of great significance to study the energy efficiency of these two different memory architectures. Therefore, in order to explore the power consumption characteristics of servers under memory-intensive workload, this paper evaluates the power consumption and performance of memory-intensive applications in different generations of real rack servers. Through analysis, we find that: (1) Workload intensity and concurrent execution threads affects server power consumption, but a fully utilized memory system may not necessarily bring good energy efficiency indicators. (2) Even if the memory system is not fully utilized, the memory capacity of each processor core has a significant impact on application performance and server power consumption. (3) When running memory-intensive applications, memory utilization is not always a good indicator of server power consumption. (4) The reasonable use of the NUMA architecture will improve the memory energy efficiency significantly. The experimental results show that reasonable use of NUMA architecture can improve memory efficiency by 16% compared with SMP architecture, while unreasonable use of NUMA architecture reduces memory efficiency by 13%. The findings we present in this paper provide useful insights and guidance for system designers and data center operators to help them in energy-efficiency-aware job scheduling and energy conservation.

Download Full-text

Efficiency Analysis of Approaches for Temperature Management and Task Mapping in Networks-on-Chip

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Advancing Embedded Systems and Real-Time Communications with Emerging Technologies ◽

10.4018/978-1-4666-6034-2.ch015 ◽

2014 ◽

pp. 368-398

Author(s):

Tim Wegner ◽

Martin Gag ◽

Dirk Timmermann

Keyword(s):

Thermal Management ◽

System Performance ◽

Task Mapping ◽

Systematic Analysis ◽

Networks On Chip ◽

Chip Temperature ◽

Management Measures ◽

Mapping Process ◽

On Chip ◽

And Task

With the progress of deep submicron technology, power consumption and temperature-related issues have become dominant factors for chip design. Therefore, very large-scale integrated systems like Systems-on-Chip (SoCs) are exposed to an increasing thermal stress. On the one hand, this necessitates effective mechanisms for thermal management and task mapping. On the other hand, application of according thermal-aware approaches is accompanied by disturbance of system integrity and degradation of system performance. In this chapter, a method to predict and proactively manage the on-chip temperature distribution of systems based on Networks-on-Chip (NoCs) is proposed. Thereby, traditional reactive approaches for thermal management and task mapping can be replaced. This results in shorter response times for the application of management measures and therefore in a reduction of temperature and thermal imbalances and causes less impairment of system performance. The systematic analysis of simulations conducted for NoC sizes up to 4x4 proves that under certain conditions the proactive approach is able to mitigate the negative impact of thermal management on system performance while still improving the on-chip temperature profile. Similar effects can be observed for proactive thermal-aware task mapping at system runtime allowing for the consideration of prospective thermal conditions during the mapping process.

Download Full-text

Performance Analysis of Temperature Management Approaches in Networks-on-Chip

International Journal of Embedded and Real-Time Communication Systems ◽

10.4018/jertcs.2012100102 ◽

2012 ◽

Vol 3 (4) ◽

pp. 19-41

Author(s):

Tim Wegner ◽

Martin Gag ◽

Dirk Timmermann

Keyword(s):

Thermal Management ◽

System Performance ◽

Large Scale ◽

Negative Impact ◽

Temperature Management ◽

Systematic Analysis ◽

Networks On Chip ◽

Chip Temperature ◽

System Integrity ◽

On Chip

With the progress of deep submicron technology, power consumption and temperature related issues have become dominant factors for chip design. Therefore, very large-scale integrated systems like Systems-on-Chip (SoCs) are exposed to an increasing thermal stress. On the one hand, this necessitates effective mechanisms for thermal management. On the other hand, application of thermal management is accompanied by disturbance of system integrity and degradation of system performance. In this paper the authors propose to precompute and proactively manage on-chip temperature of systems based on Networks-on-Chip (NoCs). Thereby, traditional reactive approaches, utilizing the NoC infrastructure to perform thermal management, can be replaced. This results not only in shorter response times for application of management measures and a reduction of temperature and thermal imbalances, but also in less impairment of system integrity and performance. The systematic analysis of simulations conducted for NoC sizes ranging from 2x2 to 4x4 proves that under certain conditions the proactive approach is able to mitigate the negative impact of thermal management on system performance while still improving the on-chip temperature profile.

Download Full-text

A Novel Optimization Approach Based on Scratchpad Memory for Mobile LBS

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.325-326.935 ◽

2013 ◽

Vol 325-326 ◽

pp. 935-938

Author(s):

Rui Xin Hu ◽

Wei Hu ◽

Ze Yu Zuo ◽

Min Wang ◽

Jing Xu ◽

...

Keyword(s):

Power Consumption ◽

Mobile Devices ◽

Rapid Development ◽

Main Memory ◽

Optimization Approach ◽

Performance Requirement ◽

Scratchpad Memory ◽

Frequency Of Use ◽

And Performance ◽

On Chip

With the popularization of mobile broadband network, mobile devices have been used more wildly in recent years. As an important part of mobile services, LBS (Location Based Service) also has rapid development for such devices. However, LBS on mobile devices will consume more energy and mobile users have more restrict performance requirement than before. As an important part of on-chip memory, scratchpad memory (SPM) has less power-consumption and higher performance for SPM is controlled by software and without extra tags. In this paper, we proposed a novel optimization approach based SPM for mobile LBS to reduce the power-consumption and improve the performance of the application. According to our approach, SPM is used as the on-chip main memory to contain the data with high frequency of use. The experimental results show that SPM can optimize the mobile LBS both on power-consumption and performance.

Download Full-text

Design and analysis of buffer and bufferless routing based NoC for high throughput and low latency communication on FPGA

International Journal of Pervasive Computing and Communications ◽

10.1108/ijpcc-05-2021-0115 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Sujata S.B. ◽

Anuradha M. Sandi

Keyword(s):

Power Consumption ◽

High Throughput ◽

Data Communication ◽

Injection Rate ◽

Area Network ◽

Verilog Hdl ◽

Content Type ◽

Chip Area ◽

And Performance ◽

On Chip

Purpose The small area network for data communication within routers is suffering from storage of packet, throughput, latency and power consumption. There are a lot of solutions to increase speed of commutation and optimization of power consumption; one among them is Network-on-chip (NoC). In the literature, there are several NoCs which can reconfigurable dynamically and can easily test and validate the results on FPGA. But still, NoCs have limitations which are regarding chip area, reconfigurable time and throughput. Design/methodology/approach To address these limitations, this research proposes the dynamically buffered and bufferless reconfigurable NoC (DB2R NoC) using X-Y algorithm for routing, Torus for switching and Flexible Direction Order (FDOR) for direction finding between source and destination nodes. Thus, the 3 × 3 and 4 × 4 DB2R NoCs are made free from deadlock, low power and latency and high throughput. To prove the applicability and performance analysis of DB2R NoC for 3 × 3 and 4 × 4 routers on FPGA, the 22 bits for buffered and 19 bit for bufferless designs have been successfully synthesized using Verilog HDL and implemented on Artix-7 FPGA development bond. The virtual input/output chips cope pro tool has been incorporated in the design to verify and debug the complete design on Artix-7 FPGA. Findings In the obtained result, it has been found that 35% improvement in throughput, 23% improvement in latency and 47% optimization in area has been made. The complete design has been tested for 28 packets of injection rate 0.01; the packets have been generated by using NLFSR. Originality/value In the obtained result, it has been found that 35% improvement in throughput, 23% improvement in latency and 47% optimization in area has been made. The complete design has been tested for 28 packets of injection rate 0.01; the packets have been generated by using NLFSR.

Download Full-text

Applying Periodic Thermal Management on Hard Real-Time Systems to Minimize Peak Temperature

Journal of Circuits System and Computers ◽

10.1142/s0218126618502080 ◽

2018 ◽

Vol 27 (13) ◽

pp. 1850208 ◽

Cited By ~ 2

Author(s):

Long Cheng ◽

Kai Huang ◽

Gang Chen ◽

Biao Hu ◽

Zhuangyi Jiang ◽

...

Keyword(s):

Closed Form ◽

Real Time ◽

Thermal Management ◽

Peak Temperature ◽

Real Time Systems ◽

Event Stream ◽

And Performance ◽

On Chip ◽

Hard Real Time ◽

Time Systems

Due to growing power density, on-chip temperature increases rapidly, which has hampered the reliability and performance of modern real-time systems. This paper studies how to minimize the peak temperature of real-time systems under hard real-time constraints with periodic thermal management. A closed-form representation of the peak temperature for such a periodic scheme is derived to tackle this problem. Based on this closed form and the arrival curve model, one offline approach and one online approach are proposed to minimize the peak temperature for a given event stream. The offline one does thermal optimization in design phase and introduces negligible runtime overhead. The online one computes dynamic power-control schemes which are adaptive to actual event arrivals and execution states. We conduct experiments on a real single-core processor and compare our approaches to two existing works. The temperature results measured from a physical thermal sensor demonstrate that the achieved maximal and average temperature reductions are 5[Formula: see text]K and 2.6[Formula: see text]K, respectively.

Download Full-text

A low latency and high efficient three-dimension Network-on-Chip based on hierarchical structure

Modern Physics Letters B ◽

10.1142/s0217984917400619 ◽

2017 ◽

Vol 31 (19-21) ◽

pp. 1740061

Author(s):

Chen Zhu ◽

Huatao Zhao ◽

Tinghuan Chen ◽

Tianbo Zhu

Keyword(s):

Power Consumption ◽

Hierarchical Structure ◽

3D Structure ◽

Network On Chip ◽

Low Latency ◽

Three Dimension ◽

High Efficient ◽

On Chip ◽

3D Topology

Currently, the majority of the Network-on-Chip (NoC) researches are based on 2D algorithm or simple 3D structure. However, the congestion and faulty links in the topology can increase the latency and power consumption. In this paper, the authors try to build a novel 3D topology based on hierarchical structure and TSV links which can reduce the latency and power consumption by decreasing the hops during the process of passing the packets. We employ the C++ tool to test our method, and the results show that the performance can be improved about 21%–36% in throughput, also 3%–11% in latency.

Download Full-text

On-Chip Power Minimization Using Serialization-Widening with Frequent Value Encoding

VLSI Design ◽

10.1155/2014/801241 ◽

2014 ◽

Vol 2014 ◽

pp. 1-14 ◽

Cited By ~ 4

Author(s):

Khader Mohammad ◽

Ahsan Kabeer ◽

Tarek Taha

Keyword(s):

Power Consumption ◽

High Performance ◽

Chip Multiprocessors ◽

Data Transfer ◽

High Volume ◽

Power Minimization ◽

Processor Core ◽

L2 Cache ◽

Data Bus ◽

On Chip

In chip-multiprocessors (CMP) architecture, the L2 cache is shared by the L1 cache of each processor core, resulting in a high volume of diverse data transfer through the L1-L2 cache bus. High-performance CMP and SoC systems have a significant amount of data transfer between the on-chip L2 cache and the L3 cache of off-chip memory through the power expensive off-chip memory bus. This paper addresses the problem of the high-power consumption of the on-chip data buses, exploring a framework for memory data bus power consumption minimization approach. A comprehensive analysis of the existing bus power minimization approaches is provided based on the performance, power, and area overhead consideration. A novel approaches for reducing the power consumption for the on-chip bus is introduced. In particular, a serialization-widening (SW) of data bus with frequent value encoding (FVE), called the SWE approach, is proposed as the best power savings approach for the on-chip cache data bus. The experimental results show that the SWE approach with FVE can achieve approximately 54% power savings over the conventional bus for multicore applications using a 64-bit wide data bus in 45 nm technology.

Download Full-text

Collaborative fuzzy‐based partially‐throttling dynamic thermal management scheme for three‐dimensional networks‐on‐chip

IET Computers & Digital Techniques ◽

10.1049/iet-cdt.2015.0198 ◽

2016 ◽

Vol 11 (1) ◽

pp. 24-32

Author(s):

Gaizhen Yan ◽

Ning Wu ◽

Fen Ge ◽

Hao Xiao ◽

Fang Zhou

Keyword(s):

Thermal Management ◽

Three Dimensional ◽

Networks On Chip ◽

Dynamic Thermal Management ◽

Management Scheme ◽

On Chip

Download Full-text

Efficient Instruction and Data Caching for High Performance Embedded Processors

Jornada de Jóvenes Investigadores del I3A ◽

10.26754/jji-i3a.201201788 ◽

1970 ◽

pp. 9

Author(s):

A. Ferrerón Labari ◽

D. Suárez Gracia ◽

V. Viñals Yúfera

Keyword(s):

Embedded Systems ◽

Power Consumption ◽

Low Power ◽

Interconnection Networks ◽

High Performance ◽

Critical Issue ◽

Content Management ◽

Structure Design ◽

Portable Devices ◽

On Chip

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.

Download Full-text