Real-time parallel implementation of road traffic radar video processing algorithms on a parallel architecture based on DSP and ARM processors

Author(s):  
Abdessamad Klilou ◽  
Francois Bourzeix ◽  
Omar Bourja ◽  
Yahya Zennayi ◽  
Lhoussein Mabrouk ◽  
...  
2014 ◽  
Vol 2014 ◽  
pp. 1-13
Author(s):  
Jinwei Wang ◽  
Xirong Ma ◽  
Yuanping Zhu ◽  
Jizhou Sun

The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia’s GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.


Author(s):  
Mohammad Rafi Lone ◽  
Najeed- Ud-Din

For real-time applications, efficient VLSI implementation of DWT is desired. In this paper, DWT architecture based on retiming for pipelining and unfolding is presented. The architecture is based on lifting one-dimensional Cohen-Daubechies-Feauveau (CDF) (5,3) wavelet filter, which is easily extended to 2-D implementation. It consists of low complexity and easily repeatable components. This paper is focused on the critical path minimization and throughput optimization at the same time. The architecture has been implemented on Virtex 6 Xilinx FPGA platform. The implementation results show that the critical path is minimized four to five times, while throughput is doubled, making the overall architecture approximately ten times faster when compared with the conventional lifting-based DWT architecture. Further with parallel implementation, the throughput has doubled without any increase in number of row buffers, implying that the architecture is memory efficient as well. The even and odd rows of the image are scanned in parallel fashion. To perform the 2-D DWT transform of an image of size 15 Megapixels, it takes 16.86 ms, which implies 59 images of that size can be processed in one second. This can be utilized for real-time video processing applications even for high resolution videos.


2017 ◽  
Vol 27 (01) ◽  
pp. 1850009 ◽  
Author(s):  
Amin Jarrah ◽  
Abdel-Karim Al-Tamimi ◽  
Tala Albashir

There are enormous numbers of applications that require the use of tracking algorithms to predict the future states of a system according to its previous accumulated states. Thus, many efficient techniques are widely adopted to estimate the future states of a system at every point in time to get the desired performance levels. Kalman filter is a popular and an efficient method for online estimations for linear measurements. Extended Kalman Filter (EKF), on the other hand, is more suited for nonlinear measurements. However, EKF algorithm is well known to be computationally intensive, and may not achieve the strict requirements of real time applications. This issue has motivated researchers to consider the use of parallel processing platforms such as Field Programmable Gate Arrays (FPGAs) and Graphic Processing Units (GPUs) to meet the real time requirements. This paper provides an optimized parallel architecture for EKF using FPGA. Our approach exploits many optimization and parallel techniques such as pipelining, loop unrolling, dataflow, and inlining; and utilizes the inherently parallel architecture nature of FPGAs to accelerate the estimation process. Our experimental analyses show that our optimized implementation of EKF can achieve better results when compared to other implementations using GPU and multicore platforms. Moreover, higher performance levels can be achieved when operating on larger data sizes. This is due to our proposed optimization techniques that we have applied, and the exploited inherent parallelism among EKF operations.


2014 ◽  
Vol 2014 ◽  
pp. 1-19 ◽  
Author(s):  
Huayou Su ◽  
Mei Wen ◽  
Nan Wu ◽  
Ju Ren ◽  
Chunyuan Zhang

Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA’s GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design.


2018 ◽  
Vol 91 (1) ◽  
pp. 93-113
Author(s):  
Vecdi Emre Levent ◽  
Aydin E. Guzel ◽  
Mustafa Tosun ◽  
Mert Buyukmihci ◽  
Furkan Aydin ◽  
...  

Author(s):  
Yohanssen Pratama ◽  
Puspoko Ponco Ratno

Image and video processing has become important part in intelligent transportation system (ITS) application, especially for collecting road traffic data. Pictures that already collected by a charged coupled device (CCD) camera usually being processed by several image processing algorithms and the application’s code will be executed in a large number of iteration because many algorithms are getting involved in processing the frame which captured by the camera. Typical application will process the first frame until finish and then continue to the next frame, so the application must wait until the first frame being processed. If the algorithms that executed quite complex and have a significant running time there will be a dropped frame and the time difference between data acquisition and real time video is divided by large margin. We proposed an implementation of multithreading to boost the application performance so the data can be acquire in real time and every new frame could be processed in short time. The application performance before and after using a multithreading is known by comparing the data acquisition time that stored in the database. The application effectiveness could define by running a multiple video streaming in same resolution.


Author(s):  
Yang Xu ◽  
Zhang Zhenjiang ◽  
Liu Yun

Sign in / Sign up

Export Citation Format

Share Document