New Motion Estimation Algorithms and its VLSI architectures for Real Time High Definition Video Coding

This paper presents an efficient hardware design using the new Motion Estimation (ME) algorithms named: Multi-point Diamond Search (MPDS) and Dynamic Multi-Point Diamond Search (DMPDS). These algorithms are more efficient to avoid from local minima falls than traditional fast algorithms.This fact contributes to increase the quality of the motion vectors, especially in High Definition (HD) videos, were the number of local minima are considerable higher. Two versions of MPDS algorithm were proposed. The first one, focused on high performance, is capable to process videos QFHD at 30 frames per second when synthesized to Altera Stratix 4 and 90nm TSCM, with only 18mW. The second version is focused on quality enhancement and is capable to process HD 1080p videos in real time. The DMPDS architecture has been developed focusing on high performance and was synthesized to Altera stratix 4. This architecture is capable to process videos QFHD at 34 frames per second. In comparison to related works, our solutions obtained the highest processing rates, and a good trade-off among power consumption, area, memory bits and performance.

Download Full-text

A High-Throughput Hardware Architecture for the H.264/AVC Half-Pixel Motion Estimation Targeting High-Definition Videos

International Journal of Reconfigurable Computing ◽

10.1155/2011/254730 ◽

2011 ◽

Vol 2011 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Marcel M. Corrêa ◽

Mateus T. Schoenknecht ◽

Robson S. Dornelles ◽

Luciano V. Agostini

Keyword(s):

Motion Estimation ◽

Real Time ◽

High Throughput ◽

High Performance ◽

Hardware Architecture ◽

Interpolation Process ◽

High Definition ◽

Efficient Search ◽

Xilinx Fpga ◽

Very High

This paper presents a high-performance hardware architecture for the H.264/AVC Half-Pixel Motion Estimation that targets high-definition videos. This design can process very high-definition videos like QHDTV () in real time (30 frames per second). It also presents an optimized arrangement of interpolated samples, which is the main key to achieve an efficient search. The interpolation process is interleaved with the SAD calculation and comparison, allowing the high throughput. The architecture was fully described in VHDL, synthesized for two different Xilinx FPGA devices, and it achieved very good results when compared to related works.

Download Full-text

DMPDS: A Fast Motion Estimation Algorithm Targeting High Resolution Videos and Its FPGA Implementation

International Journal of Reconfigurable Computing ◽

10.1155/2012/186057 ◽

2012 ◽

Vol 2012 ◽

pp. 1-12 ◽

Cited By ~ 7

Author(s):

Gustavo Sanchez ◽

Felipe Sampaio ◽

Marcelo Porto ◽

Sergio Bampi ◽

Luciano Agostini

Keyword(s):

High Resolution ◽

Motion Estimation ◽

Real Time ◽

Video Quality ◽

Estimation Algorithm ◽

High Definition ◽

Fast Motion Estimation ◽

Processing Rate ◽

Diamond Search ◽

Fast Motion

This paper presents a new fast motion estimation (ME) algorithm targeting high resolution digital videos and its efficient hardware architecture design. The new Dynamic Multipoint Diamond Search (DMPDS) algorithm is a fast algorithm which increases the ME quality when compared with other fast ME algorithms. The DMPDS achieves a better digital video quality reducing the occurrence of local minima falls, especially in high definition videos. The quality results show that the DMPDS is able to reach an average PSNR gain of 1.85 dB when compared with the well-known Diamond Search (DS) algorithm. When compared to the optimum results generated by the Full Search (FS) algorithm the DMPDS shows a lose of only 1.03 dB in the PSNR. On the other hand, the DMPDS reached a complexity reduction higher than 45 times when compared to FS. The quality gains related to DS caused an expected increase in the DMPDS complexity which uses 6.4-times more calculations than DS. The DMPDS architecture was designed focused on high performance and low cost, targeting to process Quad Full High Definition (QFHD) videos in real time (30 frames per second). The architecture was described in VHDL and synthesized to Altera Stratix 4 and Xilinx Virtex 5 FPGAs. The synthesis results show that the architecture is able to achieve processing rates higher than 53 QFHD fps, reaching the real-time requirements. The DMPDS architecture achieved the highest processing rate when compared to related works in the literature. This high processing rate was obtained designing an architecture with a high operation frequency and low numbers of cycles necessary to process each block.

Download Full-text

High-Performance Image Filters via Sparse Approximations

Proceedings of the ACM on Computer Graphics and Interactive Techniques ◽

10.1145/3406182 ◽

2020 ◽

Vol 3 (2) ◽

pp. 1-19

Author(s):

Kersten Schuster ◽

Philip Trettner ◽

Leif Kobbelt

Keyword(s):

High Performance ◽

Hardware Acceleration ◽

Optimization Method ◽

Translation Invariant ◽

Approximation Quality ◽

Trade Offs ◽

Sparse Approximations ◽

Image Filters ◽

Good Trade ◽

And Performance

We present a numerical optimization method to find highly efficient (sparse) approximations for convolutional image filters. Using a modified parallel tempering approach, we solve a constrained optimization that maximizes approximation quality while strictly staying within a user-prescribed performance budget. The results are multi-pass filters where each pass computes a weighted sum of bilinearly interpolated sparse image samples, exploiting hardware acceleration on the GPU. We systematically decompose the target filter into a series of sparse convolutions, trying to find good trade-offs between approximation quality and performance. Since our sparse filters are linear and translation-invariant, they do not exhibit the aliasing and temporal coherence issues that often appear in filters working on image pyramids. We show several applications, ranging from simple Gaussian or box blurs to the emulation of sophisticated Bokeh effects with user-provided masks. Our filters achieve high performance as well as high quality, often providing significant speed-up at acceptable quality even for separable filters. The optimized filters can be baked into shaders and used as a drop-in replacement for filtering tasks in image processing or rendering pipelines.

Download Full-text

Computation and performance trade-offs in motion estimation algorithms

Proceedings International Conference on Information Technology: Coding and Computing ◽

10.1109/itcc.2001.918803 ◽

2002 ◽

Cited By ~ 5

Author(s):

K.R. Namuduri ◽

Aiyuan Ji

Keyword(s):

Motion Estimation ◽

Estimation Algorithms ◽

Trade Offs ◽

And Performance

Download Full-text

Motion Estimation Architecture Using Efficient Adder-Compressors for HDTV Video Coding

Journal of Integrated Circuits and Systems ◽

10.29292/jics.v5i1.312 ◽

2010 ◽

Vol 5 (1) ◽

pp. 78-88 ◽

Cited By ~ 1

Author(s):

Marcelo Porto ◽

André Silva ◽

Sergo Almeida ◽

Eduardo Da Costa ◽

Sergio Bampi

Keyword(s):

Motion Estimation ◽

Real Time ◽

Search Algorithm ◽

Absolute Difference ◽

High Definition ◽

Case Processing ◽

Worst Case ◽

Average Case ◽

High Definition Television ◽

Internal Structures

This paper presents real time HDTV (High Definition Television) architecture for Motion Estimation (ME) using efficient adder compressors. The architecture is based on the Quarter Sub-sampled Diamond Search algorithm (QSDS) with Dynamic Iteration Control (DIC) algorithm. The main characteristic of the proposed architecture is the large amount of Processing Units (PUs) that are used to calculate the SAD (Sum of Absolute Difference) metric. The internal structures of the PUs are composed by a large number of addition operations to calculate the SADs. In this paper, efficient 4-2 and 8-2 adder compressors are used in the PUs architecture to achieve the performance to work with HDTV (High Definition Television) videos in real time at 30 frames per second. These adder compressors enable the simultaneous addition of 4 and 8 operands respectively. The PUs, using adder compressors, were applied to the ME architecture. The implemented architecture was described in VHDL and synthesized to FPGA and, with Leonardo Spectrum tool, to the TSMC 0.18μm CMOS standard cell technology. Synthesis results indicate that the new QSDS-DIC architecture reach the best performance result and enable gains of 12% in terms of processing rate. The architecture can reach real time for full HDTV (1920x1080 pixels) in the worst case processing 65 frames per second, and it can process 269 HDTV frames per second in the average case.

Download Full-text

On Optimizing the Visual Quality of HASM-Based Streaming—The Study the Sensitivity of Motion Estimation Techniques for Mesh-Based Codecs in Ultra High Definition Large Format Real-Time Video Coding

Enabling Machine Learning Applications in Data Science - Algorithms for Intelligent Systems ◽

10.1007/978-981-33-6129-4_15 ◽

2021 ◽

pp. 207-219

Author(s):

Khaled Ezzat ◽

Ahmed Tarek Mohamed ◽

Ibrahim El-Shal ◽

Wael Badawy

Keyword(s):

Motion Estimation ◽

Video Coding ◽

Real Time ◽

Visual Quality ◽

High Definition ◽

Large Format ◽

Estimation Techniques

Download Full-text

High Performance Architecture of Motion Estimation Algorithm for Video Compression

Journal of Circuits System and Computers ◽

10.1142/s0218126616500833 ◽

2016 ◽

Vol 25 (08) ◽

pp. 1650083

Author(s):

P. Muralidhar ◽

C. B. Rama Rao

Keyword(s):

Motion Estimation ◽

Video Compression ◽

High Performance ◽

Estimation Algorithm ◽

Block Matching ◽

High Definition ◽

Systolic Architecture ◽

Hardware Architectures ◽

Computationally Intensive ◽

Full Search

Motion estimation (ME) is a highly computationally intensive operation in video compression. Efficient ME architectures are proposed in the literature. This paper presents an efficient low computational complexity systolic architecture for full search block matching ME (FSBME) algorithm. The proposed architecture is based on one-bit transform-based full search (FS) algorithm. The proposed ME hardware architectures perform FS ME for four macroblocks (MBs) in parallel. The proposed hardware architecture is implemented in VHDL. The FSBME hardware consumes 34% of the slices in a Xilinx Vertex XC6vlx240T FPGA device with a maximum frequency of 133[Formula: see text]MHz and is capable of processing full high definition (HD) ([Formula: see text]) frames at a rate of 60 frames per second.

Download Full-text

Data memory power optimization and performance exploration of embedded systems for implementing motion estimation algorithms

Real-Time Imaging ◽

10.1016/j.rti.2003.09.006 ◽

2003 ◽

Vol 9 (6) ◽

pp. 371-386

Author(s):

K. Tatas ◽

M. Dasygenis ◽

N. Kroupis ◽

A. Argyriou ◽

D. Soudris ◽

...

Keyword(s):

Embedded Systems ◽

Motion Estimation ◽

Power Optimization ◽

Estimation Algorithms ◽

Data Memory ◽

Memory Power ◽

And Performance ◽

Optimization And Performance

Download Full-text

High Performance Hierarchical Block-based Motion Estimation for Real-Time Video Coding

Real-Time Imaging ◽

10.1006/rtim.1996.0064 ◽

1998 ◽

Vol 4 (1) ◽

pp. 67-79 ◽

Cited By ~ 2

Author(s):

Marco Accame ◽

Francesco G.B. De Natale ◽

Daniele D. Giusto

Keyword(s):

Motion Estimation ◽

Video Coding ◽

Real Time ◽

High Performance ◽

Block Based

Download Full-text

Bayesian Localization in Real-Time using Probabilistic Maps and Unscented-Kalman-Filters

Journal of Engineering Research ◽

10.36909/jer.11073 ◽

2021 ◽

Author(s):

Wael Farag ◽

Keyword(s):

Particle Filter ◽

Real Time ◽

High Performance ◽

Unscented Kalman Filter ◽

Spatial Clustering ◽

Computational Cost ◽

Measurement Data ◽

High Definition ◽

Probabilistic Maps ◽

Reference Map

In this paper, based on the fusion of Lidar and Radar measurement data, high-definition probabilistic maps, and a tailored particle filter, a Real-Time Monte Carlo Localization (RT_MCL) method for autonomous cars is proposed. The lidar and radar devices are installed on the ego car, and a customized Unscented Kalman Filter (UKF) is used for their data fusion. Lidars are accurate in determining objects' positions and have a much higher spatial resolution. On the other hand, Radars are more accurate in measuring objects velocities and perform well in extreme weather conditions. Therefore, the merits of both sensors are combined using the UKF to provide pole-like static-objects pose estimations that are well suited to serve as landmarks for vehicle localization in urban environments. These pose estimations are then clustered using the Grid-Based Density-Based Spatial Clustering of Applications with Noise (GB-DBSCAN) algorithm to represent each pole landmarks in the form of a source-point model to reduce computational cost and memory requirements. A reference map that includes pole landmarks is generated off-line and extracted from a 3-D lidar to be used by a carefully designed Particle Filter (PF) for accurate ego-car localization. The particle filter is initialized by the combined GPS+IMU reading and used an ego-car motion model to predict the states of the particles. The data association between the estimated landmarks by the UKF and that in the reference map is performed using Iterative Closest Point (ICP) algorithm. The proposed pipeline is implemented using the high-performance language C++ and utilizes highly optimized math and optimization libraries for best real-time performance. Extensive simulation studies have been carried out to evaluate the performance of the RT_MCL in both longitudinal and lateral localization.

Download Full-text