Power Efficient High-Level Synthesis by Centralized and Fine-Grained Clock Gating

Author(s):  
Mohsen Riahi Alam ◽  
Mostafa Ersali Salehi Nasab ◽  
Sied Mehdi Fakhraie
Electronics ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 73
Author(s):  
Francesco Ratto ◽  
Tiziana Fanni ◽  
Luigi Raffo ◽  
Carlo Sau

With the diffusion of cyber-physical systems and internet of things, adaptivity and low power consumption became of primary importance in digital systems design. Reconfigurable heterogeneous platforms seem to be one of the most suitable choices to cope with such challenging context. However, their development and power optimization are not trivial, especially considering hardware acceleration components. On the one hand high level synthesis could simplify the design of such kind of systems, but on the other hand it can limit the positive effects of the adopted power saving techniques. In this work, the mutual impact of different high level synthesis tools and the application of the well known clock gating strategy in the development of reconfigurable accelerators is studied. The aim is to optimize a clock gating application according to the chosen high level synthesis engine and target technology (Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA)). Different levels of application of clock gating are evaluated, including a novel multi level solution. Besides assessing the benefits and drawbacks of the clock gating application at different levels, hints for future design automation of low power reconfigurable accelerators through high level synthesis are also derived.


2003 ◽  
Vol 12 (01) ◽  
pp. 1-17
Author(s):  
Sungpack Hong ◽  
Taewhan Kim

Sub-micron feature sizes have resulted in a considerable portion of power to be dissipated on the buses, causing an increased attention on savings for power at the behavioral level and the RT level of design. This paper addresses the problem of minimizing power dissipated in the switching of the buses in the high-level synthesis of data-dominated behavioral descriptions. Unlike the previous approaches in which the minimization of the power consumed in buses has not been considered until operation scheduling is completed, our approach integrates the bus binding problem into scheduling to exploit the impact of scheduling on the reduction of power dissipated on the buses more fully and effectively. We accomplish this by formulating the problem into a flow problem in a network, and devising an efficient algorithm which iteratively finds the maximum flow of minimum cost solutions in the network. Experimental results on a number of benchmark problems show that given resource and global timing constraints our designs are 19.8% power-efficient over the designs produced by a random-move based solution, and 15.5% power-efficient over the designs by a clock-step based optimal solution.


2021 ◽  
Author(s):  
Pallabi Sarkar

High level Synthesis (HLS) or Electronic System Level (ESL) synthesis requires scheduling algorithms that have strong capability to reach optimal/near-optimal solutions with significant rapidity and greater accuracy. A novel power efficient scheduling approach using ‘PI’ method has been presented in this thesis that reduces the final power consumption of the solution at the expenditure of minimal latency clock cycles. The proposed scheduling approach is based on ‘Priority indicator (PI)’ metric and ‘Intersect Matrix’ topology methods that have a tendency to escape local optimal solutions and thereby reach global solutions. Application of the proposed approach results in even distribution of allocated hardware functional units thereby yielding power efficient scheduling solutions. The two main novel and significant aspects of the thesis are: a) Introduction of ‘Intersect Matrix’ topology with its associated algorithm which is used to check for precedence violation during scheduling b) Introduction of PI method using Priority indicator metric that assists in choosing the highest priority node during each iteration of the scheduling optimization process. Comparative analysis of the proposed approach has been done with an existing design space exploration method for qualitative assessment using proposed ‘Quality Cost Factor (Q- metric)’. This Q-metric is a combination of latency and power consumption values for the solution found, which dictates the quality of the final solutions found in terms of cost for both the proposed and existing approaches. An average improvement of approximately 12 % in quality of final solution and average reduction of 59 % in runtime has been achieved by the proposed approach compared to a current scheduling approach for the DSP benchmarks.


2021 ◽  
Author(s):  
Pallabi Sarkar

High level Synthesis (HLS) or Electronic System Level (ESL) synthesis requires scheduling algorithms that have strong capability to reach optimal/near-optimal solutions with significant rapidity and greater accuracy. A novel power efficient scheduling approach using ‘PI’ method has been presented in this thesis that reduces the final power consumption of the solution at the expenditure of minimal latency clock cycles. The proposed scheduling approach is based on ‘Priority indicator (PI)’ metric and ‘Intersect Matrix’ topology methods that have a tendency to escape local optimal solutions and thereby reach global solutions. Application of the proposed approach results in even distribution of allocated hardware functional units thereby yielding power efficient scheduling solutions. The two main novel and significant aspects of the thesis are: a) Introduction of ‘Intersect Matrix’ topology with its associated algorithm which is used to check for precedence violation during scheduling b) Introduction of PI method using Priority indicator metric that assists in choosing the highest priority node during each iteration of the scheduling optimization process. Comparative analysis of the proposed approach has been done with an existing design space exploration method for qualitative assessment using proposed ‘Quality Cost Factor (Q- metric)’. This Q-metric is a combination of latency and power consumption values for the solution found, which dictates the quality of the final solutions found in terms of cost for both the proposed and existing approaches. An average improvement of approximately 12 % in quality of final solution and average reduction of 59 % in runtime has been achieved by the proposed approach compared to a current scheduling approach for the DSP benchmarks.


2014 ◽  
Vol 7 (0) ◽  
pp. 74-80
Author(s):  
Hiroyuki Akasaka ◽  
Shin-ya Abe ◽  
Masao Yanagisawa ◽  
Nozomu Togawa

2019 ◽  
Vol 5 (1) ◽  
pp. 7 ◽  
Author(s):  
Paulo Garcia ◽  
Deepayan Bhowmik ◽  
Robert Stewart ◽  
Greg Michaelson ◽  
Andrew Wallace

Memory is the biggest limiting factor to the widespread use of FPGAs for high-level image processing, which require complete frame(s) to be stored in situ. Since FPGAs have limited on-chip memory capabilities, efficient use of such resources is essential to meet performance, size and power constraints. In this paper, we investigate allocation of on-chip memory resources in order to minimize resource usage and power consumption, contributing to the realization of power-efficient high-level image processing fully contained on FPGAs. We propose methods for generating memory architectures, from both Hardware Description Languages and High Level Synthesis designs, which minimize memory usage and power consumption. Based on a formalization of on-chip memory configuration options and a power model, we demonstrate how our partitioning algorithms can outperform traditional strategies. Compared to commercial FPGA synthesis and High Level Synthesis tools, our results show that the proposed algorithms can result in up to 60% higher utilization efficiency, increasing the sizes and/or number of frames that can be accommodated, and reduce frame buffers’ dynamic power consumption by up to approximately 70%. In our experiments using Optical Flow and MeanShift Tracking, representative high-level algorithms, data show that partitioning algorithms can reduce total power by up to 25% and 30%, respectively, without impacting performance.


Sign in / Sign up

Export Citation Format

Share Document