Data partitioning for parallel implementation of real-time video processing systems

For real-time applications, efficient VLSI implementation of DWT is desired. In this paper, DWT architecture based on retiming for pipelining and unfolding is presented. The architecture is based on lifting one-dimensional Cohen-Daubechies-Feauveau (CDF) (5,3) wavelet filter, which is easily extended to 2-D implementation. It consists of low complexity and easily repeatable components. This paper is focused on the critical path minimization and throughput optimization at the same time. The architecture has been implemented on Virtex 6 Xilinx FPGA platform. The implementation results show that the critical path is minimized four to five times, while throughput is doubled, making the overall architecture approximately ten times faster when compared with the conventional lifting-based DWT architecture. Further with parallel implementation, the throughput has doubled without any increase in number of row buffers, implying that the architecture is memory efficient as well. The even and odd rows of the image are scanned in parallel fashion. To perform the 2-D DWT transform of an image of size 15 Megapixels, it takes 16.86 ms, which implies 59 images of that size can be processed in one second. This can be utilized for real-time video processing applications even for high resolution videos.

Download Full-text

Parallel implementation of background subtraction algorithms for real-time video processing on a supercomputer platform

Journal of Real-Time Image Processing ◽

10.1007/s11554-012-0310-5 ◽

2012 ◽

Vol 11 (1) ◽

pp. 111-125 ◽

Cited By ~ 13

Author(s):

Grzegorz Szwoch ◽

Damian Ellwart ◽

Andrzej Czyżewski

Keyword(s):

Real Time ◽

Video Processing ◽

Background Subtraction ◽

Parallel Implementation

Download Full-text

Real-time parallel implementation of road traffic radar video processing algorithms on a parallel architecture based on DSP and ARM processors

2015 15th International Conference on Intelligent Systems Design and Applications (ISDA) ◽

10.1109/isda.2015.7489222 ◽

2015 ◽

Cited By ~ 2

Author(s):

Abdessamad Klilou ◽

Francois Bourzeix ◽

Omar Bourja ◽

Yahya Zennayi ◽

Lhoussein Mabrouk ◽

...

Keyword(s):

Real Time ◽

Video Processing ◽

Road Traffic ◽

Parallel Implementation ◽

Parallel Architecture ◽

Processing Algorithms

Download Full-text

Parallel Implementation of Real-Time Communication and IP Communication by using Multiple Ring Buffers

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.134.1031 ◽

2014 ◽

Vol 134 (8) ◽

pp. 1031-1038

Author(s):

Kazuki Ueda ◽

Tatsushi Kikutani ◽

Takahiro Yakoh

Keyword(s):

Real Time ◽

Parallel Implementation ◽

Multiple Ring ◽

Ip Communication

Download Full-text

Design of a real-time video processing system with FPGA

10.1117/12.481630 ◽

2002 ◽

Author(s):

Wei Liu ◽

Zeying Chi ◽

Wenjian Chen

Keyword(s):

Real Time ◽

Video Processing ◽

Processing System

Download Full-text

A Modified KNN Algorithm for High-Performance Computing on FPGA of Real-Time m-QAM Demodulators

Electronics ◽

10.3390/electronics10050627 ◽

2021 ◽

Vol 10 (5) ◽

pp. 627

Author(s):

David Marquez-Viloria ◽

Luis Castano-Londono ◽

Neil Guerrero-Gonzalez

Keyword(s):

Real Time ◽

High Performance ◽

Interference Mitigation ◽

Parallel Implementation ◽

Computational Time ◽

Successful Implementation ◽

Interchannel Interference ◽

The Difference ◽

High Level ◽

Performance Computing

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.

Download Full-text