ProACt: Hardware Architecture for Cross-Layer Approximate Computing

The new Versatile Video Coding (VVC) standard was recently developed to improve compression efficiency of previous video coding standards and to support new applications. This was achieved at the cost of an increase in the computational complexity of the encoder algorithms, which leads to the need to develop hardware accelerators and to apply approximate computing techniques to achieve the performance and power dissipation required for systems that encode video. This work proposes the implementation of an approximate hardware architecture for interpolation filters defined in the VVC standard targeting real-time processing of high resolution videos. The architecture is able to process up to 2560x1600 pixels videos at 30 fps with power dissipation of 23.9 mW when operating at a frequency of 522 MHz, with an average compression efficiency degradation of only 0.41% compared to default VVC video encoder software configuration.

Download Full-text

Plasticine: A Cross-layer Approximation Methodology for Multi-kernel Applications through Minimally Biased, High-throughput, and Energy-efficient SIMD Soft Multiplier-divider

ACM Transactions on Design Automation of Electronic Systems ◽

10.1145/3486616 ◽

2022 ◽

Vol 27 (2) ◽

pp. 1-33

Author(s):

Zahra Ebrahimi ◽

Dennis Klar ◽

Mohammad Aasim Ekhtiyar ◽

Akash Kumar

Keyword(s):

High Throughput ◽

Performance Metrics ◽

Synergistic Effects ◽

Rapid Evolution ◽

Future Research ◽

Cross Layer ◽

Approximate Computing ◽

Approximation Techniques ◽

End To End ◽

A Chain

The rapid evolution of error-resilient programs intertwined with their quest for high throughput has motivated the use of Single Instruction, Multiple Data (SIMD) components in Field-Programmable Gate Arrays (FPGAs). Particularly, to exploit the error-resiliency of such applications, Cross-layer approximation paradigm has recently gained traction, the ultimate goal of which is to efficiently exploit approximation potentials across layers of abstraction. From circuit- to application-level, valuable studies have proposed various approximation techniques, albeit linked to four drawbacks: First, most of approximate multipliers and dividers operate only in SISD mode. Second, imprecise units are often substituted, merely in a single kernel of a multi-kernel application, with an end-to-end analysis in Quality of Results (QoR) and not in the gained performance. Third, state-of-the-art (SoA) strategies neglect the fact that each kernel contributes differently to the end-to-end QoR and performance metrics. Therefore, they lack in adopting a generic methodology for adjusting the approximation knobs to maximize performance gains for a user-defined quality constraint. Finally, multi-level techniques lack in being efficiently supported, from application-, to architecture-, to circuit-level, in a cohesive cross-layer hierarchy. In this article, we propose Plasticine , a cross-layer methodology for multi-kernel applications, which addresses the aforementioned challenges by efficiently utilizing the synergistic effects of a chain of techniques across layers of abstraction. To this end, we propose an application sensitivity analysis and a heuristic that tailor the precision at constituent kernels of the application by finding the most tolerable degree of approximations for each of consecutive kernels, while also satisfying the ultimate user-defined QoR. The chain of approximations is also effectively enabled in a cross-layer hierarchy, from application- to architecture- to circuit-level, through the plasticity of SIMD multiplier-dividers, each supporting dynamic precision variability along with hybrid functionality. The end-to-end evaluations of Plasticine on three multi-kernel applications employed in bio-signal processing, image processing, and moving object tracking for Unmanned Air Vehicles (UAV) demonstrate 41%–64%, 39%–62%, and 70%–86% improvements in area, latency, and Area-Delay-Product (ADP), respectively, over 32-bit fixed precision, with negligible loss in QoR. To springboard future research in reconfigurable and approximate computing communities, our implementations will be available and open-sourced at https://cfaed.tu-dresden.de/pd-downloads.

Download Full-text