Exploring the Design Space of HEVC Inverse Transforms with Dataflow Programming

Author(s):  
Khoo Zhi Yion ◽  
Ab Al-Hadi Ab Rahman

<p>This paper presents the design space exploration of the hardware-based inverse fixed-point integer transform for High Efficiency Video Coding (HEVC). The designs are specified at high-level using CAL dataflow language and automatically synthesized to HDL for FPGA implementation. Several parallel design alternatives are proposed with trade-off between performance and resource. The HEVC transform consists of several independent components from 4x4 to 32x32 discrete cosine transform and 4x4 discrete sine transform. This work explores the strategies to efficiently compute the transforms by applying data parallelism on the different components. Results show that an intermediate version of parallelism, whereby the 4x4 and 8x8 are merged together, and the 16x16 and 32x32 merged together gives the best trade-off between performance and resource. The results presented in this work also give an insight on how the HEVC transform can be designed efficiently in parallel for hardware implementation.</p>

Author(s):  
Umar Ibrahim Minhas ◽  
Roger Woods ◽  
Georgios Karakonstantis

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.


Author(s):  
Zsolt Lattmann ◽  
Adam Nagel ◽  
Jason Scott ◽  
Kevin Smyth ◽  
Chris vanBuskirk ◽  
...  

We describe the use of the Cyber-Physical Modeling Language (CyPhyML) to support trade studies and integration activities in system-level vehicle designs. CyPhyML captures parameterized component behavior using acausal models (i.e. hybrid bond graphs and Modelica) to enable automatic composition and synthesis of simulation models for significant vehicle subsystems. Generated simulations allow us to compare performance between different design alternatives. System behavior and evaluation are specified independently from specifications for design-space alternatives. Test bench models in CyPhyML are given in terms of generic assemblies over the entire design space, so performance can be evaluated for any selected design instance once automated design space exploration is complete. Generated Simulink models are also integrated into a mobility model for interactive 3-D simulation.


Author(s):  
Nicolas Albarello ◽  
Jean-Baptiste Welcomme

The design of systems architectures often involve a combinatorial design-space made of technological and architectural choices. A complete or large exploration of this design space requires the use of a method to generate and evaluate design alternatives. This paper proposes an innovative approach for the design-space exploration of systems architectures. The SAMOA (System Architecture Model-based OptimizAtion) tool associated to the method is also introduced. The method permits to create a large number of various system architectures combining a set of possible components to address given system functions. The method relies on models that are used to represent the problem and the solutions and to evaluate architecture performances. An algorithm first synthesizes design alternatives (a physical architecture associated to a functional allocation) based on the functional architecture of the system, the system interfaces, a library of available components and user-defined design rules. Chains of components are sequentially added to an initially empty architecture until all functions are fulfilled. The design rules permit to guarantee the viability and validity of the chains of components and, consequently, of the generated architectures. The design space exploration is then performed in a smart way through the use of an evolutionary algorithm, the evolution mechanisms of which are specific to system architecting. Evaluation modules permit to assess the performances of alternatives based on the structure of the architecture model and the data embedded in the component models. These performances are used to select the best generated architectures considering constraints and quality metrics. This selection is based on the Pareto-dominance-based NSGA-II algorithm or, alternatively, on an interactive preference-based algorithm. Iterating over this evolution-evaluation-selection process permits to increase the quality of solutions and, thus, to highlight the regions of interest of the design-space which can be used as a base for further manual investigations. By using this method, the system designers have a larger confidence in the optimality of the adopted architecture than using a classical derivative approach as many more solutions are evaluated. Also, the method permits to quickly evaluate the trade-offs between the different considered criteria. Finally, the method can also be used to evaluate the impact of a technology on the system performances not only by a substituting a technology by another but also by adapting the architecture of the system.


2021 ◽  
Author(s):  
Aakriti Tarun Sharma

The process of converting a behavioral specification of an application to its equivalent system architecture is referred to as High Level-Synthesis (HLS). A crucial stage in embedded systems design involves finding the trade off between resource utilization and performance. An exhaustive search would yield the required results, but would take a huge amount of time to arrive at the solution even for smaller designs. This would result in a high time complexity. We employ the use of Design Space Exploration (DSE) in order to reduce the complexity of the design space and to reach the desired results in less time. In reality, there are multiple constraints defined by the user that need to be satisfied simultaneously. Thus, the nature of the task at hand is referred to as Multi-Objective Optimization. In this thesis, the design process of DSP benchmarks was analyzed based on user defined constraints such as power and execution time. The analyzed outcome was compared with the existing approaches in DSE and an optimal design solution was derived in a shorter time period.


2014 ◽  
Vol 27 (2) ◽  
pp. 235-249 ◽  
Author(s):  
Anirban Sengupta ◽  
Reza Sedaghat ◽  
Vipul Mishra

Design space exploration is an indispensable segment of High Level Synthesis (HLS) design of hardware accelerators. This paper presents a novel technique for Area-Execution time tradeoff using residual load decoding heuristics in genetic algorithms (GA) for integrated design space exploration (DSE) of scheduling and allocation. This approach is also able to resolve issues encountered during DSE of data paths for hardware accelerators, such as accuracy of the solution found, as well as the total exploration time during the process. The integrated solution found by the proposed approach satisfies the user specified constraints of hardware area and total execution time (not just latency), while at the same time offers a twofold unified solution of chaining based schedule and allocation. The cost function proposed in the genetic algorithm approach takes into account the functional units, multiplexers and demultiplexers needed during implementation. The proposed exploration system (ExpSys) was tested on a large number of benchmarks drawn from the literature for assessment of its efficiency. Results indicate an average improvement in Quality of Results (QoR) greater than 26% when compared to a recent well known GA based exploration method.


Author(s):  
MyungJun Kim ◽  
Yung-Lyul Lee

High Efficiency Video Coding (HEVC) uses an 8-point filter and a 7-point filter, which are based on the discrete cosine transform (DCT), for the 1/2-pixel and 1/4-pixel interpolations, respectively. In this paper, discrete sine transform (DST)-based interpolation filters (IF) are proposed. The first proposed DST-based IFs (DST-IFs) use 8-point and 7-point filters for the 1/2-pixel and 1/4-pixel interpolations, respectively. The final proposed DST-IFs use 12-point and 11-point filters for the 1/2-pixel and 1/4-pixel interpolations, respectively. These DST-IF methods are proposed to improve the motion-compensated prediction in HEVC. The 8-point and 7-point DST-IF methods showed average BD-rate reductions of 0.7% and 0.3% in the random access (RA) and low delay B (LDB) configurations, respectively. The 12-point and 11-point DST-IF methods showed average BD-rate reductions of 1.4% and 1.2% in the RA and LDB configurations for the Luma component, respectively.


Sign in / Sign up

Export Citation Format

Share Document