scholarly journals Energy and Performance Trade-Off Optimization in Heterogeneous Computing via Reinforcement Learning

Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1812 ◽  
Author(s):  
Zheqi Yu ◽  
Pedro Machado ◽  
Adnan Zahid ◽  
Amir M. Abdulghani ◽  
Kia Dashtipour ◽  
...  

This paper suggests an optimisation approach in heterogeneous computing systems to balance energy power consumption and efficiency. The work proposes a power measurement utility for a reinforcement learning (PMU-RL) algorithm to dynamically adjust the resource utilisation of heterogeneous platforms in order to minimise power consumption. A reinforcement learning (RL) technique is applied to analyse and optimise the resource utilisation of field programmable gate array (FPGA) control state capabilities, which is built for a simulation environment with a Xilinx ZYNQ multi-processor systems-on-chip (MPSoC) board. In this study, the balance operation mode for improving power consumption and performance is established to dynamically change the programmable logic (PL) end work state. It is based on an RL algorithm that can quickly discover the optimization effect of PL on different workloads to improve energy efficiency. The results demonstrate a substantial reduction of 18% in energy consumption without affecting the application’s performance. Thus, the proposed PMU-RL technique has the potential to be considered for other heterogeneous computing platforms.

2021 ◽  
Author(s):  
Abdullah Siddiqui

One of the most critical steps of embedded systems design is Hardware-Software partitioning. It is characterized by distributing the components of an application between hardware and software such that the user defined system constraints are satisfied. Heterogeneous computing platforms consisting of CPUs and GPUs have tremendous potential for enhancing the performance of embedded applications. The challenge of application partitioning for CPU-GPU mapping is much greater on such platforms due to their unique and diverse characteristics. In this thesis, an optimization algorithm is devised and presented for partitioning and mapping computational tasks on CPU-GPU platforms while keeping a check on the power consumption. Our methodology also uses parallelism in applications and their tasks by utilizing the architectural capabilities of the GPU. The optimization algorithm was tested with a MJPEG decoder, several benchmarks and synthetic graphs.


2018 ◽  
Vol 8 (3) ◽  
pp. 25 ◽  
Author(s):  
Siddhartha Joshi ◽  
Dawei Li ◽  
Seda Ogrenci-Memik ◽  
Grzegorz Deptuch ◽  
James Hoff ◽  
...  

In this paper, we characterize the interplay between power consumption and performance of a matchline-based Content Addressable Memory and then propose the use of a multi-Vdd design to save power and increase post-fabrication tunability. Exploration of the power consumption behavior of a CAM chip shows the drastically different behavior among the components and suggests the use of different and independent power supplies. The complete design, simulation and testing of a multi-Vdd CAM chip along with an exploration of the multi-Vdd design space are presented. Our analysis has been applied to simulated models on two different technology nodes (130 nm and 45 nm), followed by experiments on a 246-kb test chip fabricated in 130 nm Global Foundries Low Power CMOS technology. The proposed design, operating at an optimal operating point in a triple-Vdd configuration, increases the power-delay operation range by 2.4 times and consumes 25.3% less dynamic power when compared to a conventional single-Vdd design operating over the same voltage range with equivalent noise margin. Our multi-Vdd design also helps save 51.3% standby power. Measurement results from the test chip combined with the simulation analysis at the two nodes validate our thesis.


2021 ◽  
Author(s):  
Abdullah Siddiqui

One of the most critical steps of embedded systems design is Hardware-Software partitioning. It is characterized by distributing the components of an application between hardware and software such that the user defined system constraints are satisfied. Heterogeneous computing platforms consisting of CPUs and GPUs have tremendous potential for enhancing the performance of embedded applications. The challenge of application partitioning for CPU-GPU mapping is much greater on such platforms due to their unique and diverse characteristics. In this thesis, an optimization algorithm is devised and presented for partitioning and mapping computational tasks on CPU-GPU platforms while keeping a check on the power consumption. Our methodology also uses parallelism in applications and their tasks by utilizing the architectural capabilities of the GPU. The optimization algorithm was tested with a MJPEG decoder, several benchmarks and synthetic graphs.


2013 ◽  
Vol 18 ◽  
pp. 1891-1898
Author(s):  
Chetan Kumar N G ◽  
Sudhanshu Vyas ◽  
Ron K. Cytron ◽  
Christopher D. Gill ◽  
Joseph Zambreno ◽  
...  

Energies ◽  
2021 ◽  
Vol 14 (14) ◽  
pp. 4089
Author(s):  
Kaiqiang Zhang ◽  
Dongyang Ou ◽  
Congfeng Jiang ◽  
Yeliang Qiu ◽  
Longchuan Yan

In terms of power and energy consumption, DRAMs play a key role in a modern server system as well as processors. Although power-aware scheduling is based on the proportion of energy between DRAM and other components, when running memory-intensive applications, the energy consumption of the whole server system will be significantly affected by the non-energy proportion of DRAM. Furthermore, modern servers usually use NUMA architecture to replace the original SMP architecture to increase its memory bandwidth. It is of great significance to study the energy efficiency of these two different memory architectures. Therefore, in order to explore the power consumption characteristics of servers under memory-intensive workload, this paper evaluates the power consumption and performance of memory-intensive applications in different generations of real rack servers. Through analysis, we find that: (1) Workload intensity and concurrent execution threads affects server power consumption, but a fully utilized memory system may not necessarily bring good energy efficiency indicators. (2) Even if the memory system is not fully utilized, the memory capacity of each processor core has a significant impact on application performance and server power consumption. (3) When running memory-intensive applications, memory utilization is not always a good indicator of server power consumption. (4) The reasonable use of the NUMA architecture will improve the memory energy efficiency significantly. The experimental results show that reasonable use of NUMA architecture can improve memory efficiency by 16% compared with SMP architecture, while unreasonable use of NUMA architecture reduces memory efficiency by 13%. The findings we present in this paper provide useful insights and guidance for system designers and data center operators to help them in energy-efficiency-aware job scheduling and energy conservation.


Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 346 ◽  
Author(s):  
Lili Shen ◽  
Ning Wu ◽  
Gaizhen Yan

By using through-silicon-vias (TSV), three dimension integration technology can stack large memory on the top of cores as a last-level on-chip cache (LLC) to reduce off-chip memory access and enhance system performance. However, the integration of more on-chip caches increases chip power density, which might lead to temperature-related issues in power consumption, reliability, cooling cost, and performance. An effective thermal management scheme is required to ensure the performance and reliability of the system. In this study, a fuzzy-based thermal management scheme (FBTM) is proposed that simultaneously considers cores and stacked caches. The proposed method combines a dynamic cache reconfiguration scheme with a fuzzy-based control policy in a temperature-aware manner. The dynamic cache reconfiguration scheme determines the size of the cache for the processor core according to the application that reaches a substantial amount of power consumption savings. The fuzzy-based control policy is used to change the frequency level of the processor core based on dynamic cache reconfiguration, a process which can further improve the system performance. Experiments show that, compared with other thermal management schemes, the proposed FBTM can achieve, on average, 3 degrees of reduction in temperature and a 41% reduction of leakage energy.


2022 ◽  
Vol 15 (2) ◽  
pp. 1-27
Author(s):  
Andrea Damiani ◽  
Giorgia Fiscaletti ◽  
Marco Bacis ◽  
Rolando Brondolin ◽  
Marco D. Santambrogio

“Cloud-native” is the umbrella adjective describing the standard approach for developing applications that exploit cloud infrastructures’ scalability and elasticity at their best. As the application complexity and user-bases grow, designing for performance becomes a first-class engineering concern. As an answer to these needs, heterogeneous computing platforms gained widespread attention as powerful tools to continue meeting SLAs for compute-intensive cloud-native workloads. We propose BlastFunction, an FPGA-as-a-Service full-stack framework to ease FPGAs’ adoption for cloud-native workloads, integrating with the vast spectrum of fundamental cloud models. At the IaaS level, BlastFunction time-shares FPGA-based accelerators to provide multi-tenant access to accelerated resources without any code rewriting. At the PaaS level, BlastFunction accelerates functionalities leveraging the serverless model and scales functions proactively, depending on the workload’s performance. Further lowering the FPGAs’ adoption barrier, an accelerators’ registry hosts accelerated functions ready to be used within cloud-native applications, bringing the simplicity of a SaaS-like approach to the developers. After an extensive experimental campaign against state-of-the-art cloud scenarios, we show how BlastFunction leads to higher performance metrics (utilization and throughput) against native execution, with minimal latency and overhead differences. Moreover, the scaling scheme we propose outperforms the main serverless autoscaling algorithms in workload performance and scaling operation amount.


Sign in / Sign up

Export Citation Format

Share Document