An FPGA-Based LOCO-ANS Implementation for Lossless and Near-Lossless Image Compression Using High-Level Synthesis

In this work, we present and evaluate a hardware architecture for the LOCO-ANS (Low Complexity Lossless Compression with Asymmetric Numeral Systems) lossless and near-lossless image compressor, which is based on JPEG-LS standard. The design is implemented in two FPGA generations, evaluating its performance for different codec configurations. The tests show that the design is capable of up to 40.5 MPixels/s and 124 MPixels/s per lane for Zynq 7020 and UltraScale+ FPGAs, respectively. Compared to the single thread LOCO-ANS software implementation running in a 1.2 GHz Raspberry Pi 3B, each hardware lane achieves 6.5 times higher throughput, even when implemented in an older and cost-optimized chip like the Zynq 7020. Results are also presented for a lossless only version, which achieves a lower footprint and approximately 50% higher performance than the version that supports both lossless and near-lossless. Interestingly, these great results were obtained applying High-Level Synthesis, describing the coder with C++ code, which tends to establish a trade-off between design time and quality of results. These results show that the algorithm is very suitable for hardware implementation. Moreover, the implemented system is faster and achieves higher compression than the best previously available near-lossless JPEG-LS hardware implementation.

Download Full-text

Using Global Code Motions to Improve the Quality of Results for High-Level Synthesis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/tcad.2003.822105 ◽

2004 ◽

Vol 23 (2) ◽

pp. 302-312 ◽

Cited By ~ 43

Author(s):

S. Gupta ◽

N. Savoiu ◽

N. Dutt ◽

R. Gupta ◽

A. Nicolau

Keyword(s):

High Level Synthesis ◽

Quality Of Results ◽

High Level

Download Full-text

Fast and Accurate Estimation of Quality of Results in High-Level Synthesis with Machine Learning

2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) ◽

10.1109/fccm.2018.00029 ◽

2018 ◽

Cited By ~ 15

Author(s):

Steve Dai ◽

Yuan Zhou ◽

Hang Zhang ◽

Ecenur Ustun ◽

Evangeline F.Y. Young ◽

...

Keyword(s):

Machine Learning ◽

High Level Synthesis ◽

Accurate Estimation ◽

Quality Of Results ◽

High Level

Download Full-text

Model Based Design needs high level synthesis - A collection of high level synthesis techniques to improve productivity and quality of results for model based electronic design

2009 Design, Automation & Test in Europe Conference & Exhibition ◽

10.1109/date.2009.5090845 ◽

2009 ◽

Cited By ~ 8

Author(s):

S. Perry

Keyword(s):

High Level Synthesis ◽

Quality Of Results ◽

Synthesis Techniques ◽

Model Based ◽

High Level ◽

Electronic Design

Download Full-text

Efficient Hardware Implementation of PQC Primitives and PQC algorithms Using High-Level Synthesis

10.1109/isvlsi51109.2021.00061 ◽

2021 ◽

Author(s):

Deepraj Soni ◽

Ramesh Karri

Keyword(s):

Hardware Implementation ◽

High Level Synthesis ◽

High Level

Download Full-text

An Efficient Hardware Implementation of TimSort and MergeSort Algorithms Using High Level Synthesis

2017 International Conference on High Performance Computing & Simulation (HPCS) ◽

10.1109/hpcs.2017.92 ◽

2017 ◽

Cited By ~ 1

Author(s):

Yomna Ben Jmaa ◽

Karim M. A. Ali ◽

David Duvivier ◽

Maher Ben Jemaa ◽

Rabie Ben Atitallah

Keyword(s):

Hardware Implementation ◽

High Level Synthesis ◽

High Level

Download Full-text

A hardware implementation for real-time lane detection using high-level synthesis

2018 International Workshop on Advanced Image Technology (IWAIT) ◽

10.1109/iwait.2018.8369730 ◽

2018 ◽

Cited By ~ 1

Author(s):

Chanon Khongprasongsiri ◽

Pinit Kumhom ◽

Watcharapan Suwansantisuk ◽

Teerasak Chotikawanid ◽

Surachate Chumpol ◽

...

Keyword(s):

Real Time ◽

Hardware Implementation ◽

Lane Detection ◽

High Level Synthesis ◽

High Level

Download Full-text

LP-HLS: Automatic power-intent generation for high-level synthesis based hardware implementation flow

Microprocessors and Microsystems ◽

10.1016/j.micpro.2017.02.002 ◽

2017 ◽

Vol 50 ◽

pp. 26-38 ◽

Cited By ~ 11

Author(s):

Affaq Qamar ◽

Fahad Bin Muslim ◽

Javed Iqbal ◽

Luciano Lavagno

Keyword(s):

Hardware Implementation ◽

High Level Synthesis ◽

High Level

Download Full-text

Power Efficient Rapid Design Space Exploration of Integrated Scheduling and Module Selection in High Level Synthesis

10.32920/ryerson.14644968 ◽

2021 ◽

Author(s):

Pallabi Sarkar

Keyword(s):

Power Consumption ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

High Level Synthesis ◽

Optimal Solutions ◽

Power Efficient ◽

Pi Method ◽

High Level

High level Synthesis (HLS) or Electronic System Level (ESL) synthesis requires scheduling algorithms that have strong capability to reach optimal/near-optimal solutions with significant rapidity and greater accuracy. A novel power efficient scheduling approach using ‘PI’ method has been presented in this thesis that reduces the final power consumption of the solution at the expenditure of minimal latency clock cycles. The proposed scheduling approach is based on ‘Priority indicator (PI)’ metric and ‘Intersect Matrix’ topology methods that have a tendency to escape local optimal solutions and thereby reach global solutions. Application of the proposed approach results in even distribution of allocated hardware functional units thereby yielding power efficient scheduling solutions. The two main novel and significant aspects of the thesis are: a) Introduction of ‘Intersect Matrix’ topology with its associated algorithm which is used to check for precedence violation during scheduling b) Introduction of PI method using Priority indicator metric that assists in choosing the highest priority node during each iteration of the scheduling optimization process. Comparative analysis of the proposed approach has been done with an existing design space exploration method for qualitative assessment using proposed ‘Quality Cost Factor (Q- metric)’. This Q-metric is a combination of latency and power consumption values for the solution found, which dictates the quality of the final solutions found in terms of cost for both the proposed and existing approaches. An average improvement of approximately 12 % in quality of final solution and average reduction of 59 % in runtime has been achieved by the proposed approach compared to a current scheduling approach for the DSP benchmarks.

Download Full-text