scholarly journals Motion Estimation Architecture Using Efficient Adder-Compressors for HDTV Video Coding

2010 ◽  
Vol 5 (1) ◽  
pp. 78-88 ◽  
Author(s):  
Marcelo Porto ◽  
André Silva ◽  
Sergo Almeida ◽  
Eduardo Da Costa ◽  
Sergio Bampi

This paper presents real time HDTV (High Definition Television) architecture for Motion Estimation (ME) using efficient adder compressors. The architecture is based on the Quarter Sub-sampled Diamond Search algorithm (QSDS) with Dynamic Iteration Control (DIC) algorithm. The main characteristic of the proposed architecture is the large amount of Processing Units (PUs) that are used to calculate the SAD (Sum of Absolute Difference) metric. The internal structures of the PUs are composed by a large number of addition operations to calculate the SADs. In this paper, efficient 4-2 and 8-2 adder compressors are used in the PUs architecture to achieve the performance to work with HDTV (High Definition Television) videos in real time at 30 frames per second. These adder compressors enable the simultaneous addition of 4 and 8 operands respectively. The PUs, using adder compressors, were applied to the ME architecture. The implemented architecture was described in VHDL and synthesized to FPGA and, with Leonardo Spectrum tool, to the TSMC 0.18μm CMOS standard cell technology. Synthesis results indicate that the new QSDS-DIC architecture reach the best performance result and enable gains of 12% in terms of processing rate. The architecture can reach real time for full HDTV (1920x1080 pixels) in the worst case processing 65 frames per second, and it can process 269 HDTV frames per second in the average case.

2016 ◽  
Vol 2016 ◽  
pp. 1-11
Author(s):  
Nehal N. Shah ◽  
Harikrishna Singapuri ◽  
Upena D. Dalal

Video coding standards such as MPEG-x and H.26x incorporate variable block size motion estimation (VBSME) which is highly time consuming and extremely complex from hardware implementation perspective due to huge computation. In this paper, we have discussed basic aspects of video coding and studied and compared existing architectures for VBSME. Various architectures with different pixel scanning pattern give a variety of performance results for motion vector (MV) generation, showing tradeoff between macroblock processed per second and resource requirement for computation. Aim of this paper is to design VBSME architecture which utilizes optimal resources to minimize chip area and offer adequate frame processing rate for real time implementation. Speed of computation can be improved by accessing 16 pixels of base macroblock of size 4 × 4 in single clock cycle using z scanning pattern. Widely adopted cost function for hardware implementation known as sum of absolute differences (SAD) is used for VBSME architecture with multiplexer based absolute difference calculator and partial summation term reduction (PSTR) based multioperand adders. Device utilization of proposed implementation is only 22k gates and it can process 179 HD (1920 × 1080) resolution frames in best case and 47 HD resolution frames in worst case per second. Due to such higher throughput design is well suitable for real time implementation.


2006 ◽  
Vol 6 (6) ◽  
pp. 483-494
Author(s):  
T. Tulsi ◽  
L.K. Grover ◽  
A. Patel

The standard quantum search lacks a feature, enjoyed by many classical algorithms, of having a fixed point, i.e. monotonic convergence towards the solution. Recently a fixed point quantum search algorithm has been discovered, referred to as the Phase-\pi/3 search algorithm, which gets around this limitation. While searching a database for a target state, this algorithm reduces the error probability from \epsilon to \epsilon^{2q+1} using q oracle queries, which has since been proved to be asymptotically optimal. A different algorithm is presented here, which has the same worst-case behavior as the Phase-\pi/3 search algorithm but much better average-case behavior. Furthermore the new algorithm gives \epsilon^{2q+1} convergence for all integral q, whereas the Phase-\pi/3 search algorithm requires q to be (3^{n}-1)/2 with n a positive integer. In the new algorithm, the operations are controlled by two ancilla qubits, and fixed point behavior is achieved by irreversible measurement operations applied to these ancillas. It is an example of how measurement can allow us to bypass some restrictions imposed by unitarity on quantum computing.


2002 ◽  
Vol 24 (4) ◽  
pp. 215-228 ◽  
Author(s):  
Pai-Chi Li ◽  
Wei-Ning Lee

An efficient speckle tracking algorithm is proposed for motion estimation in ultrasonic imaging. Speckle tracking involves a matching process and a searching process. The matching process of the proposed algorithm is based on a Block Sum Pyramid algorithm that significantly reduces the computational complexity while maintaining the same accuracy as the conventional sum of absolute difference approach. The searching process, on the other hand, is based on a multilevel search strategy rather than the full-search strategy used by most conventional tracking methods. Both simulated speckle images and clinical breast images were used to test the performance of the proposed algorithm. The results show that the computation efficiency is improved by up to a factor of five over the conventional approach. The improved efficiency enables real-time or near-real-time implementation of motion estimation in ultrasonic imaging, which is particularly beneficial in areas such as blood flow estimation, elasticity imaging, speckle image registration, and strain compounding.


2016 ◽  
Vol 57 ◽  
pp. 307-343 ◽  
Author(s):  
Nathan R. Sturtevant ◽  
Vadim Bulitko

Real-time agent-centered heuristic search is a well-studied problem where an agent that can only reason locally about the world must travel to a goal location using bounded computation and memory at each step. Many algorithms have been proposed for this problem and theoretical results have also been derived for the worst-case performance with simple examples demonstrating worst-case performance in practice. Lower bounds, however, have not been widely studied. In this paper we study best-case performance more generally and derive theoretical lower bounds for reaching the goal using LRTA*, a canonical example of a real-time agent-centered heuristic search algorithm. The results show that, given some reasonable restrictions on the state space and the heuristic function, the number of steps an LRTA*-like algorithm requires to reach the goal will grow asymptotically faster than the state space, resulting in ``scrubbing'' where the agent repeatedly visits the same state. We then show that while the asymptotic analysis does not hold for more complex real-time search algorithms, experimental results suggest that it is still descriptive of practical performance.


2012 ◽  
Vol 7 (1) ◽  
pp. 37-46
Author(s):  
Gustavo Sanchez ◽  
Marcelo Porto ◽  
Diego Noble ◽  
Sergio Bampi ◽  
Luciano Agostini

This paper presents an efficient hardware design using the new Motion Estimation (ME) algorithms named: Multi-point Diamond Search (MPDS) and Dynamic Multi-Point Diamond Search (DMPDS). These algorithms are more efficient to avoid from local minima falls than traditional fast algorithms.This fact contributes to increase the quality of the motion vectors, especially in High Definition (HD) videos, were the number of local minima are considerable higher. Two versions of MPDS algorithm were proposed. The first one, focused on high performance, is capable to process videos QFHD at 30 frames per second when synthesized to Altera Stratix 4 and 90nm TSCM, with only 18mW. The second version is focused on quality enhancement and is capable to process HD 1080p videos in real time. The DMPDS architecture has been developed focusing on high performance and was synthesized to Altera stratix 4. This architecture is capable to process videos QFHD at 34 frames per second. In comparison to related works, our solutions obtained the highest processing rates, and a good trade-off among power consumption, area, memory bits and performance.


2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Alba Sandyra Bezerra Lopes ◽  
Ivan Saraiva Silva ◽  
Luciano Volcan Agostini

The motion estimation is the most complex module in a video encoder requiring a high processing throughput and high memory bandwidth, mainly when the focus is high-definition videos. The throughput problem can be solved increasing the parallelism in the internal operations. The external memory bandwidth may be reduced using a memory hierarchy. This work presents a memory hierarchy model for a full-search motion estimation core. The proposed memory hierarchy model is based on a data reuse scheme considering the full search algorithm features. The proposed memory hierarchy expressively reduces the external memory bandwidth required for the motion estimation process, and it provides a very high data throughput for the ME core. This throughput is necessary to achieve real time when processing high-definition videos. When considering the worst bandwidth scenario, this memory hierarchy is able to reduce the external memory bandwidth in 578 times. A case study for the proposed hierarchy, using32×32search window and8×8block size, was implemented and prototyped on a Virtex 4 FPGA. The results show that it is possible to reach 38 frames per second when processing full HD frames (1920×1080pixels) using nearly 299 Mbytes per second of external memory bandwidth.


Sign in / Sign up

Export Citation Format

Share Document