Minimizing leakage power in aging-bounded high-level synthesis with design time multi-Vth assignment

Author(s):  
Yibo Chen ◽  
Yuan Xie ◽  
Yu Wang ◽  
Andres Takach
Author(s):  
Nan WANG ◽  
Song CHEN ◽  
Cong HAO ◽  
Haoran ZHANG ◽  
Takeshi YOSHIMURA

Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 2934
Author(s):  
Tobías Alonso ◽  
Gustavo Sutter ◽  
Jorge E. López de Vergara

In this work, we present and evaluate a hardware architecture for the LOCO-ANS (Low Complexity Lossless Compression with Asymmetric Numeral Systems) lossless and near-lossless image compressor, which is based on JPEG-LS standard. The design is implemented in two FPGA generations, evaluating its performance for different codec configurations. The tests show that the design is capable of up to 40.5 MPixels/s and 124 MPixels/s per lane for Zynq 7020 and UltraScale+ FPGAs, respectively. Compared to the single thread LOCO-ANS software implementation running in a 1.2 GHz Raspberry Pi 3B, each hardware lane achieves 6.5 times higher throughput, even when implemented in an older and cost-optimized chip like the Zynq 7020. Results are also presented for a lossless only version, which achieves a lower footprint and approximately 50% higher performance than the version that supports both lossless and near-lossless. Interestingly, these great results were obtained applying High-Level Synthesis, describing the coder with C++ code, which tends to establish a trade-off between design time and quality of results. These results show that the algorithm is very suitable for hardware implementation. Moreover, the implemented system is faster and achieves higher compression than the best previously available near-lossless JPEG-LS hardware implementation.


2019 ◽  
Vol 15 (4) ◽  
pp. 388-409
Author(s):  
Xiuyan Zhang ◽  
Ouwen Shi ◽  
Jian Xu ◽  
Shantanu Dutt

We present a power-driven hierarchical framework for module/functional-unit selection, scheduling, and binding in high level synthesis. A significant aspect of algorithm design for large and complex problems is arriving at tradeoffs between quality of solution and timing complexity. Towards this end, we integrate an improved version of the very runtime-efficient list scheduling algorithm called modified list scheduling (MLS) with a power-driven simulated annealing (SA) algorithm for module selection. Our hierarchical framework efficiently explores the problem solution space by an extensive exploration of the power-driven module-selection solution space via SA, and for each module selection solution, uses MLS to obtain a scheduling and (integrated) binding (S&B) solution in which the binding is either a regular one (minimizing number of FUs and thus FU leakage power) or power-driven with mux/demux power considerations. This framework avoids the very runtime intensive exploration of both module selection and S&B within a conventional SA algorithm, but retains the basic prowess of SA by exploring only the important aspect of power-driven module-selection in a stochastic manner. The proposed hierarchical framework provides an average of 9.5% FU leakage power improvement over state of the art (approximate) algorithms that optimize only FU leakage power, and has a smaller runtime by factors of 2.5–3x. Further, compared to a sophisticated flat simulated annealing framework and an optimal 0/1-ILP formulation for total (dynamic and leakage) FU and architecture power optimization under latency constraints, PSA-MLS provides an improvement of 5.3–5.8% with a runtime advantage of 2x, and has an average optimality gap of only 4.7–4.8% with a significant runtime advantage of a factor of more than 1900, respectively.


Sign in / Sign up

Export Citation Format

Share Document