A Scalable High-Performance Hardware Architecture for Real-Time Stereo Vision by Semi-Global Matching

Author(s):  
Jaco Hofmann ◽  
Jens Korinth ◽  
Andreas Koch
2019 ◽  
Vol 17 (5) ◽  
pp. 1447-1468 ◽  
Author(s):  
Lucas F. S. Cambuim ◽  
Luiz A. Oliveira ◽  
Edna N. S. Barros ◽  
Antonyus P. A. Ferreira

2012 ◽  
Vol 479-481 ◽  
pp. 2521-2524
Author(s):  
Guang Hua Chen ◽  
Wen Peng Su ◽  
Feng Jiao Wang ◽  
An Qi Wang ◽  
Wei Min Zeng ◽  
...  

The design of H.264/AVC interpolation unit is very challenging for the high memory bandwidth and large calculation complexity caused by the new coding features of variable block size (VBS) and 6-tap filter. In this paper, a novel one-step interpolation implementation algorithm is proposed which can effectively reduce processing cycle because of its less memory accessing. Moreover, a data reuse scheme is used to save processing cycle and memory bandwidth. A high performance hardware architecture is implemented according to the methods mentioned above. As a result, 26% memory bandwidth reduction and 45% processing cycle reduction are achieved, which shows that our architecture is an efficient hardware accelerating solution and can be used in real-time encoder.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3938
Author(s):  
Boitumelo Ruf ◽  
Jonas Mohrs ◽  
Martin Weinmann ◽  
Stefan Hinz ◽  
Jürgen Beyerer

With the emergence of low-cost robotic systems, such as *UAV, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability of embedded GPU-based systems, such as the NVIDIA Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for massively parallel embedded computing on graphics hardware. With this in mind, we propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs. We have evaluated our approach with different configurations on two public stereo benchmark datasets to demonstrate that they can reach an error rate as low as 3.3%. Furthermore, our experiments show that the fastest configuration of our approach reaches up to 46 FPS on VGA image resolution. Finally, in a use-case specific qualitative evaluation, we have evaluated the power consumption of our approach and deployed it on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK *UAV, demonstrating its suitability for real-time stereo processing onboard a *UAV.


2011 ◽  
Vol 2011 ◽  
pp. 1-9 ◽  
Author(s):  
Marcel M. Corrêa ◽  
Mateus T. Schoenknecht ◽  
Robson S. Dornelles ◽  
Luciano V. Agostini

This paper presents a high-performance hardware architecture for the H.264/AVC Half-Pixel Motion Estimation that targets high-definition videos. This design can process very high-definition videos like QHDTV () in real time (30 frames per second). It also presents an optimized arrangement of interpolated samples, which is the main key to achieve an efficient search. The interpolation process is interleaved with the SAD calculation and comparison, allowing the high throughput. The architecture was fully described in VHDL, synthesized for two different Xilinx FPGA devices, and it achieved very good results when compared to related works.


Sign in / Sign up

Export Citation Format

Share Document