Real-time parallel implementation of road traffic radar video processing algorithms on a parallel architecture based on DSP and ARM processors

The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia’s GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.

Download Full-text

Data partitioning for parallel implementation of real-time video processing systems

Proceedings of the 2005 European Conference on Circuit Theory and Design, 2005. ◽

10.1109/ecctd.2005.1522948 ◽

2006 ◽

Cited By ~ 1

Author(s):

M. O'Nils ◽

P.-R. Lilljefjall ◽

B. Thornberg

Keyword(s):

Real Time ◽

Video Processing ◽

Parallel Implementation ◽

Data Partitioning

Download Full-text

A High Speed Architecture for Lifting-based 2-D Cohen-Daubechies-Feauveau (5,3) Discrete Wavelet Transform used in JPEG2000

International Journal of Advances in Telecommunications Electrotechnics Signals and Systems ◽

10.11601/ijates.v6i1.202 ◽

2017 ◽

Vol 6 (1) ◽

pp. 24 ◽

Cited By ~ 2

Author(s):

Mohammad Rafi Lone ◽

Najeed- Ud-Din

Keyword(s):

Real Time ◽

Video Processing ◽

High Speed ◽

Parallel Implementation ◽

Critical Path ◽

Low Complexity ◽

Discrete Wavelet ◽

Throughput Optimization ◽

Xilinx Fpga ◽

Parallel Fashion

For real-time applications, efficient VLSI implementation of DWT is desired. In this paper, DWT architecture based on retiming for pipelining and unfolding is presented. The architecture is based on lifting one-dimensional Cohen-Daubechies-Feauveau (CDF) (5,3) wavelet filter, which is easily extended to 2-D implementation. It consists of low complexity and easily repeatable components. This paper is focused on the critical path minimization and throughput optimization at the same time. The architecture has been implemented on Virtex 6 Xilinx FPGA platform. The implementation results show that the critical path is minimized four to five times, while throughput is doubled, making the overall architecture approximately ten times faster when compared with the conventional lifting-based DWT architecture. Further with parallel implementation, the throughput has doubled without any increase in number of row buffers, implying that the architecture is memory efficient as well. The even and odd rows of the image are scanned in parallel fashion. To perform the 2-D DWT transform of an image of size 15 Megapixels, it takes 16.86 ms, which implies 59 images of that size can be processed in one second. This can be utilized for real-time video processing applications even for high resolution videos.

Download Full-text

Parallel implementation of background subtraction algorithms for real-time video processing on a supercomputer platform

Journal of Real-Time Image Processing ◽

10.1007/s11554-012-0310-5 ◽

2012 ◽

Vol 11 (1) ◽

pp. 111-125 ◽

Cited By ~ 13

Author(s):

Grzegorz Szwoch ◽

Damian Ellwart ◽

Andrzej Czyżewski

Keyword(s):

Real Time ◽

Video Processing ◽

Background Subtraction ◽

Parallel Implementation

Download Full-text

Optimized Parallel Implementation of Extended Kalman Filter Using FPGA

Journal of Circuits System and Computers ◽

10.1142/s0218126618500093 ◽

2017 ◽

Vol 27 (01) ◽

pp. 1850009 ◽

Cited By ~ 1

Author(s):

Amin Jarrah ◽

Abdel-Karim Al-Tamimi ◽

Tala Albashir

Keyword(s):

Kalman Filter ◽

Real Time ◽

Extended Kalman Filter ◽

Parallel Implementation ◽

Parallel Architecture ◽

Optimization Techniques ◽

Graphic Processing Units ◽

Linear Measurements ◽

Performance Levels ◽

The Future

There are enormous numbers of applications that require the use of tracking algorithms to predict the future states of a system according to its previous accumulated states. Thus, many efficient techniques are widely adopted to estimate the future states of a system at every point in time to get the desired performance levels. Kalman filter is a popular and an efficient method for online estimations for linear measurements. Extended Kalman Filter (EKF), on the other hand, is more suited for nonlinear measurements. However, EKF algorithm is well known to be computationally intensive, and may not achieve the strict requirements of real time applications. This issue has motivated researchers to consider the use of parallel processing platforms such as Field Programmable Gate Arrays (FPGAs) and Graphic Processing Units (GPUs) to meet the real time requirements. This paper provides an optimized parallel architecture for EKF using FPGA. Our approach exploits many optimization and parallel techniques such as pipelining, loop unrolling, dataflow, and inlining; and utilizes the inherently parallel architecture nature of FPGAs to accelerate the estimation process. Our experimental analyses show that our optimized implementation of EKF can achieve better results when compared to other implementations using GPU and multicore platforms. Moreover, higher performance levels can be achieved when operating on larger data sizes. This is due to our proposed optimization techniques that we have applied, and the exploited inherent parallelism among EKF operations.

Download Full-text

Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

The Scientific World JOURNAL ◽

10.1155/2014/716020 ◽

2014 ◽

Vol 2014 ◽

pp. 1-19 ◽

Cited By ~ 5

Author(s):

Huayou Su ◽

Mei Wen ◽

Nan Wu ◽

Ju Ren ◽

Chunyuan Zhang

Keyword(s):

Video Processing ◽

Parallel Implementation ◽

Parallel Architecture ◽

Optimization Methods ◽

Memory Bandwidth ◽

Deblocking Filter ◽

Parallel Video Processing ◽

Speedup Ratio ◽

Processing Techniques ◽

Execution Order

Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA’s GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design.

Download Full-text

Tools and Techniques for Implementation of Real-time Video Processing Algorithms

Journal of Signal Processing Systems ◽

10.1007/s11265-018-1402-7 ◽

2018 ◽

Vol 91 (1) ◽

pp. 93-113

Author(s):

Vecdi Emre Levent ◽

Aydin E. Guzel ◽

Mustafa Tosun ◽

Mert Buyukmihci ◽

Furkan Aydin ◽

...

Keyword(s):

Real Time ◽

Video Processing ◽

Processing Algorithms ◽

Tools And Techniques

Download Full-text

Multithreading Application for Counting Vehicle by Using Background Subtraction Method

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.57594 ◽

2020 ◽

Vol 14 (3) ◽

pp. 309

Author(s):

Yohanssen Pratama ◽

Puspoko Ponco Ratno

Keyword(s):

Data Acquisition ◽

Real Time ◽

Video Processing ◽

Intelligent Transportation System ◽

Road Traffic ◽

Ccd Camera ◽

Acquisition Time ◽

Application Performance ◽

Typical Application ◽

Before And After

Image and video processing has become important part in intelligent transportation system (ITS) application, especially for collecting road traffic data. Pictures that already collected by a charged coupled device (CCD) camera usually being processed by several image processing algorithms and the application’s code will be executed in a large number of iteration because many algorithms are getting involved in processing the frame which captured by the camera. Typical application will process the first frame until finish and then continue to the next frame, so the application must wait until the first frame being processed. If the algorithms that executed quite complex and have a significant running time there will be a dropped frame and the time difference between data acquisition and real time video is divided by large margin. We proposed an implementation of multithreading to boost the application performance so the data can be acquire in real time and every new frame could be processed in short time. The application performance before and after using a multithreading is known by comparing the data acquisition time that stored in the database. The application effectiveness could define by running a multiple video streaming in same resolution.

Download Full-text