Real-Time Image and Video Processing Using High-Level Synthesis (HLS)

Author(s):  
Murad Qasaimeh ◽  
Ehab Najeh Salahat

Implementing high-performance, low-cost hardware accelerators for the computationally intensive image and video processing algorithms has attracted a lot of attention in the last 20 years. Most of the recent research efforts were trying to figure out new design automation methods to fill the gap between the ability of realizing efficient accelerators in hardware and the tight performance requirements of the complex image processing algorithms. High-Level synthesis (HLS) is a new method to automate the design process by transforming high-level algorithmic description into digital hardware while satisfying the design constraints. This chapter focuses on evaluating the suitability of using HLS as a new tool to accelerate the most demanding image and video processing algorithms in hardware. It discusses the gained benefits and current limitations, the recent academic and commercial tools, the compiler's optimization techniques and four case studies.

2018 ◽  
pp. 1004-1022
Author(s):  
Murad Qasaimeh ◽  
Ehab Najeh Salahat

Implementing high-performance, low-cost hardware accelerators for the computationally intensive image and video processing algorithms has attracted a lot of attention in the last 20 years. Most of the recent research efforts were trying to figure out new design automation methods to fill the gap between the ability of realizing efficient accelerators in hardware and the tight performance requirements of the complex image processing algorithms. High-Level synthesis (HLS) is a new method to automate the design process by transforming high-level algorithmic description into digital hardware while satisfying the design constraints. This chapter focuses on evaluating the suitability of using HLS as a new tool to accelerate the most demanding image and video processing algorithms in hardware. It discusses the gained benefits and current limitations, the recent academic and commercial tools, the compiler's optimization techniques and four case studies.


2020 ◽  
Vol 18 (02) ◽  
pp. 311-318
Author(s):  
Carlos Alejandro Perez ◽  
Mario Sergio Cleva ◽  
Diego Orlando Liska ◽  
Dominga Concepcion Aquino ◽  
Claudio Rodrigues da Fonseca

2015 ◽  
Vol 2015 ◽  
pp. 1-16
Author(s):  
Muhammad Asif ◽  
Imtiaz A. Taj ◽  
S. M. Ziauddin ◽  
Maaz Bin Ahmad ◽  
M. Tahir

One of the key requirements for mobile devices is to provide high-performance computing at lower power consumption. The processors used in these devices provide specific hardware resources to handle computationally intensive video processing and interactive graphical applications. Moreover, processors designed for low-power applications may introduce limitations on the availability and usage of resources, which present additional challenges to the system designers. Owing to the specific design of the JZ47x series of mobile application processors, a hybrid software-hardware implementation scheme for H.264/AVC encoder is proposed in this work. The proposed scheme distributes the encoding tasks among hardware and software modules. A series of optimization techniques are developed to speed up the memory access and data transferring among memories. Moreover, an efficient data reusage design is proposed for the deblock filter video processing unit to reduce the memory accesses. Furthermore, fine grained macroblock (MB) level parallelism is effectively exploited and a pipelined approach is proposed for efficient utilization of hardware processing cores. Finally, based on parallelism in the proposed design, encoding tasks are distributed between two processing cores. Experiments show that the hybrid encoder is 12 times faster than a highly optimized sequential encoder due to proposed techniques.


2018 ◽  
pp. 1133-1154
Author(s):  
Ahmed Abouelfarag ◽  
Marwa Ali Elshenawy ◽  
Esraa Alaaeldin Khattab

Recently, computer vision is playing an important role in many essential human-computer interactive applications, these applications are subject to a “real-time” constraint, and therefore it requires a fast and reliable computational system. Edge Detection is the most used approach for segmenting images based on changes in intensity. There are various kernels used to perform edge detection, such as: Sobel, Robert, and Prewitt, upon which, the most commonly used is Sobel. In this research a novel type of operator cells that perform addition is introduced to achieve computational acceleration. The novel operator cells have been employed in the chosen FPGA Zedboard which is well-suited for real-time image and video processing. Accelerating the Sobel edge detection technique is exploited using different tools such as the High-Level Synthesis tools provided by Vivado. This enhancement shows a significant improvement as it decreases the computational time by 26% compared to the conventional adder cells.


Author(s):  
Ahmed Abouelfarag ◽  
Marwa Ali Elshenawy ◽  
Esraa Alaaeldin Khattab

Recently, computer vision is playing an important role in many essential human-computer interactive applications, these applications are subject to a “real-time” constraint, and therefore it requires a fast and reliable computational system. Edge Detection is the most used approach for segmenting images based on changes in intensity. There are various kernels used to perform edge detection, such as: Sobel, Robert, and Prewitt, upon which, the most commonly used is Sobel. In this research a novel type of operator cells that perform addition is introduced to achieve computational acceleration. The novel operator cells have been employed in the chosen FPGA Zedboard which is well-suited for real-time image and video processing. Accelerating the Sobel edge detection technique is exploited using different tools such as the High-Level Synthesis tools provided by Vivado. This enhancement shows a significant improvement as it decreases the computational time by 26% compared to the conventional adder cells.


2021 ◽  
Vol 29 (2) ◽  
Author(s):  
Panadda Solod ◽  
Nattha Jindapetch ◽  
Kiattisak Sengchuai ◽  
Apidet Booranawong ◽  
Pakpoom Hoyingcharoen ◽  
...  

In this work, we proposed High-Level Synthesis (HLS) optimization processes to improve the speed and the resource usage of complex algorithms, especially nested-loop. The proposed HLS optimization processes are divided into four steps: array sizing is performed to decrease the resource usage on Programmable Logic (PL) part, loop analysis is performed to determine which loop must be loop unrolling or loop pipelining, array partitioning is performed to resolve the bottleneck of loop unrolling and loop pipelining, and HLS interface is performed to select the best block level and port level interface for array argument of RTL design. A case study road lane detection was analyzed and applied with suitable optimization techniques to implement on the Xilinx Zynq-7000 family (Zybo ZC7010-1) which was a low-cost FPGA. From the experimental results, our proposed method reaches 6.66 times faster than the primitive method at clock frequency 100 MHz or about 6 FPS. Although the proposed methods cannot reach the standard real-time (25 FPS), they can instruct HLS developers for speed increasing and resource decreasing on an FPGA.


2019 ◽  
Vol 18 (02) ◽  
pp. 311-318
Author(s):  
Carlos Alejandro Perez ◽  
Mario Sergio Cleva ◽  
Diego Orlando Liska ◽  
Dominga Concepcion Aquino ◽  
Claudio Rodrigues da Fonseca

Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1275
Author(s):  
Changdao Du ◽  
Yoshiki Yamaguchi

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.


Sign in / Sign up

Export Citation Format

Share Document