Parallel Full Search Algorithm for Motion Estimation on Graphic Processing Unit

Background: Advances in video compression technology have been driven by everincreasing processing power available in software and hardware. Methods: The emerging High-Efficiency Video Coding (HEVC) standard aims to provide a doubling in coding efficiency with respect to the H.264/AVC high profile, delivering the same video quality at half the bit rate. Results: Thus, the results show high computational complexity. In both standards, the motion estimation block presents a significant challenge in clock latency since it consumes more than 40% of the total encoding time. For these reasons, we proposed an optimized implementation of this algorithm on a low-cost NVIDIA GPU developed with CUDA language. Conclusion: This optimized implementation can provide high-performance video encoder where the speed reaches about 85.

Download Full-text

A Motion Estimation based Algorithm for Encoding Time Reduction in HEVC

Defence Science Journal ◽

10.14429/dsj.72.16733 ◽

2022 ◽

Vol 72 (1) ◽

pp. 56-66

Author(s):

S. Karthik Sairam ◽

P. Muralidhar

Keyword(s):

Motion Estimation ◽

Video Coding ◽

Video Compression ◽

High Efficiency ◽

Search Algorithm ◽

Video Quality ◽

Search Pattern ◽

Absolute Difference ◽

High Efficiency Video Coding ◽

Search Patterns

High Efficiency Video Coding (HEVC) is a video compression standard that offers 50% more efficiency at the expense of high encoding time contrasted with the H.264 Advanced Video Coding (AVC) standard. The encoding time must be reduced to satisfy the needs of real-time applications. This paper has proposed the Multi- Level Resolution Vertical Subsampling (MLRVS) algorithm to reduce the encoding time. The vertical subsampling minimizes the number of Sum of Absolute Difference (SAD) computations during the motion estimation process. The complexity reduction algorithm is also used for fast coding the coefficients of the quantised block using a flag decision. Two distinct search patterns are suggested: New Cross Diamond Diamond (NCDD) and New Cross Diamond Hexagonal (NCDH) search patterns, which reduce the time needed to locate the motion vectors. In this paper, the MLRVS algorithm with NCDD and MLRVS algorithm with NCDH search patterns are simulated separately and analyzed. The results show that the encoding time of the encoder is decreased by 55% with MLRVS algorithm using NCDD search pattern and 56% with MLRVS using NCDH search pattern compared to HM16.5 with Test Zone (TZ) search algorithm. These results are achieved with a slight increase in bit rate and negligible deterioration in output video quality.

Download Full-text

Video Compression for Screen Recorded Sequences Following Eye Movements

Journal of Signal Processing Systems ◽

10.1007/s11265-021-01719-2 ◽

2021 ◽

Author(s):

Diego Jesus Serrano-Carrasco ◽

Antonio Jesus Diaz-Honrubia ◽

Pedro Cuenca

Keyword(s):

Eye Tracking ◽

Video Coding ◽

Video Compression ◽

High Performance ◽

High Efficiency ◽

Tracking System ◽

Bit Rate ◽

Video Traffic ◽

High Efficiency Video Coding ◽

Perceptual Video Coding

AbstractWith the advent of smartphones and tablets, video traffic on the Internet has increased enormously. With this in mind, in 2013 the High Efficiency Video Coding (HEVC) standard was released with the aim of reducing the bit rate (at the same quality) by 50% with respect to its predecessor. However, new contents with greater resolutions and requirements appear every day, making it necessary to further reduce the bit rate. Perceptual video coding has recently been recognized as a promising approach to achieving high-performance video compression and eye tracking data can be used to create and verify these models. In this paper, we present a new algorithm for the bit rate reduction of screen recorded sequences based on the visual perception of videos. An eye tracking system is used during the recording to locate the fixation point of the viewer. Then, the area around that point is encoded with the base quantization parameter (QP) value, which increases when moving away from it. The results show that up to 31.3% of the bit rate may be saved when compared with the original HEVC-encoded sequence, without a significant impact on the perceived quality.

Download Full-text

High Speed Parallel SAD Architecture Implementation on FPGA for HEVC encoder

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8380.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 1235-1238

Keyword(s):

Motion Estimation ◽

Video Compression ◽

High Speed ◽

High Performance ◽

High Efficiency ◽

Block Size ◽

Video Encoding ◽

Computational Time ◽

Absolute Difference ◽

Asymmetric Mode

Video compression is a very complex and time consuming task which generally pursuit high performance. Motion Estimation (ME) process in any video encoder is responsible to primarily achieve the colossal performance which contributes to significant compression gain. Summation of Absolute Difference (SAD) is widely applied as distortion metric for ME process. With the increase in block size to 64×64 for real time applications along with the introduction of asymmetric mode motion partitioning(AMP) in High Efficiency Video Encoding (HEVC)causes variable block size motion estimation very convoluted. This results in increase in computational time and demands for significant requirement of hardware resources. In this paper parallel SAD hardware circuit for ME process in HEVC is propound where parallelism is used at various levels. The propound circuit has been implemented using Xilinx Virtex-5 FPGA for XC5VLX20T family. Synthesis results shows that the propound circuit provides significant reduction in delay and increase in frequency in comparison with results of other parallel architectures.

Download Full-text

Performance Analysis of OpenCL and CUDA Programming Models for the High Efficiency Video Coding

10.5772/intechopen.99823 ◽

2021 ◽

Author(s):

Randa Khemiri ◽

Soulef Bouaafia ◽

Asma Bahba ◽

Maha Nasr ◽

Fatma Ezahra Sayadi

Keyword(s):

Motion Estimation ◽

Execution Time ◽

High Efficiency ◽

Graphics Processing Unit ◽

Block Matching ◽

Performance Ratio ◽

Processing Unit ◽

High Efficiency Video Coding ◽

Estimation Algorithms ◽

Cuda Programming

In Motion estimation (ME), the block matching algorithms have a great potential of parallelism. This process of the best match is performed by computing the similarity for each block position inside the search area, using a similarity metric, such as Sum of Absolute Differences (SAD). It is used in the various steps of motion estimation algorithms. Moreover, it can be parallelized using Graphics Processing Unit (GPU) since the computation algorithm of each block pixels is similar, thus offering better results. In this work a fixed OpenCL code was performed firstly on several architectures as CPU and GPU, secondly a parallel GPU-implementation was proposed with CUDA and OpenCL for the SAD process using block of sizes from 4x4 to 64x64. A comparative study established between execution time on GPU on the same video sequence. The experimental results indicated that GPU OpenCL execution time was better than that of CUDA times with performance ratio that reached the double.

Download Full-text

FPGA-Based ROI Encoding for HEVC Video Bitrate Reduction

Journal of Circuits System and Computers ◽

10.1142/s0218126620501820 ◽

2020 ◽

Vol 29 (11) ◽

pp. 2050182

Author(s):

Zhilei Chai ◽

Shen Li ◽

Qunfang He ◽

Mingsong Chen ◽

Wenjie Chen

Keyword(s):

Data Storage ◽

Video Compression ◽

High Efficiency ◽

Video Quality ◽

Region Of Interest ◽

Absolute Difference ◽

High Efficiency Video Coding ◽

High Definition ◽

Mapping Algorithm ◽

Block Based

The explosive growth of video applications has produced great challenges for data storage and transmission. In this paper, we propose a new ROI (region of interest) encoding solution to accelerate the processing and reduce the bitrate based on the latest video compression standard H.265/HEVC (High-Efficiency Video Coding). The traditional ROI extraction mapping algorithm uses pixel-based Gaussian background modeling (GBM), which requires a large number of complex floating-point calculations. Instead, we propose a block-based GBM to set up the background, which is in accord with the block division of HEVC. Then, we use the SAD (sum of absolute difference) rule to separate the foreground block from the background block, and these blocks are mapped into the coding tree unit (CTU) of HEVC. Moreover, the quantization parameter (QP) is adjusted according to the distortion rate automatically. The experimental results show that the processing speed on FPGA has reached a real-time level of 22 FPS (frames per second) for full high-definition videos ([Formula: see text]), and the bitrate is reduced by 10% on average with stable video quality.

Download Full-text

Hardware-aware motion estimation search algorithm development for high-efficiency video coding (HEVC) standard

2012 19th IEEE International Conference on Image Processing ◽

10.1109/icip.2012.6467163 ◽

2012 ◽

Cited By ~ 8

Author(s):

Mahmut E. Sinangil ◽

Anantha P. Chandrakasan ◽

Vivienne Sze ◽

Minhua Zhou

Keyword(s):

Motion Estimation ◽

Video Coding ◽

High Efficiency ◽

Search Algorithm ◽

High Efficiency Video Coding ◽

Algorithm Development

Download Full-text

Motion Estimation in HEVC/H.265: Metaheuristic Approach to Improve the Efficiency

Engineering Proceedings ◽

10.3390/engproc2021012059 ◽

2021 ◽

Vol 12 (1) ◽

pp. 59

Author(s):

Khwaja Humble Hassan ◽

Shahzad Ahmad Butt

Keyword(s):

Motion Estimation ◽

Video Coding ◽

High Efficiency ◽

Optimization Problems ◽

Search Algorithm ◽

High Efficiency Video Coding ◽

High Definition ◽

The Social ◽

Test Zone ◽

Encoding Efficiency

An ever increasing use of digital video applications such as video telephony, broadcast and the storage of high and ultra-high definition videos has steered the development of video coding standards. The state of the art video coding standard is High Efficiency Video Coding (HEVC) or otherwise known as H.265. It promises to be 50 percent more efficient than the previous video coding standard H.264. Ultimately, H.265 provides significant improvement in compression at the expense of computational complexity. HEVC encoder is very complex and 50 percent of the encoding consists of Motion Estimation (ME). It uses a Test Zone (TZ) fast search algorithm for its motion estimation, which compares a block of pixels with a few selected blocks in the search region of a referenced frame. However, the encoding time is not suitable to meet the needs of real time video applications. So, there is a requirement to improve the search algorithm and to provide comparable results to TZ search to save a substantial amount of time. In our paper, we aim to study the effects of a meta-heuristic algorithm on motion estimation. One such suitable algorithm for this task is the Firefly Algorithm (FA). FA is inspired by the social behavior of fireflies and is generally used to solve optimization problems. Our results show that implementing FA for ME saves a considerable amount of time with a comparable encoding efficiency.

Download Full-text

Rate Distortion Performance of Motion Estimation for High Efficiency Video Coding

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrteb2132.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 2855-2860

Keyword(s):

Computational Complexity ◽

Motion Estimation ◽

Video Coding ◽

High Efficiency ◽

Search Algorithm ◽

Rate Distortion ◽

Objective Evaluation ◽

Video Sequences ◽

High Efficiency Video Coding ◽

Different Types

The contemporary coding standard for video is High Efficiency Video Coding Standard (HEVC). It’s introduced by ITU-T (International Telegraph Union) and Joint Collaborative Team on Video Coding (JCT-VC). HEVC attains the requirement of video storage and transmission with high resolution. Although it requires the high amount of computational complexity. Motion Vectors are determined with motion estimation analysis; it is implemented with different types of algorithm. In this paper, Motion Estimation Process is implementing with the content split block search algorithm. It improves Peak Signal Noise Ratio (PSNR) than to the existing algorithms. The Objective evaluation has been performed with various video sequences such as BQ Terrace and also improved PSNR.

Download Full-text

A modified TZ search algorithm for parallel integer motion estimation in high efficiency video coding

2015 International SoC Design Conference (ISOCC) ◽

10.1109/isocc.2015.7401664 ◽

2015 ◽

Author(s):

Doan Trung Nghia ◽

Tae Sung Kim ◽

Hyuk-Jae Lee ◽

Soo-Ik Chae

Keyword(s):

Motion Estimation ◽

Video Coding ◽

High Efficiency ◽

Search Algorithm ◽

High Efficiency Video Coding ◽

Integer Motion Estimation

Download Full-text

Complexity Analysis of a Versatile Video Coding Decoder over Embedded Systems and General Purpose Processors

Sensors ◽

10.3390/s21103320 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3320

Author(s):

Anup Saha ◽

Miguel Chavarrías ◽

Fernando Pescador ◽

Ángel M. Groba ◽

Kheyter Chassaigne ◽

...

Keyword(s):

Video Coding ◽

Video Compression ◽

High Performance ◽

High Efficiency ◽

Complexity Analysis ◽

General Purpose ◽

High Efficiency Video Coding ◽

General Purpose Processor ◽

Fine Grain ◽

Quality Video

The increase in high-quality video consumption requires increasingly efficient video coding algorithms. Versatile video coding (VVC) is the current state-of-the-art video coding standard. Compared to the previous video standard, high efficiency video coding (HEVC), VVC demands approximately 50% higher video compression while maintaining the same quality and significantly increasing the computational complexity. In this study, coarse-grain profiling of a VVC decoder over two different platforms was performed: One platform was based on a high-performance general purpose processor (HGPP), and the other platform was based on an embedded general purpose processor (EGPP). For the most intensive computational modules, fine-grain profiling was also performed. The results allowed the identification of the most intensive computational modules necessary to carry out subsequent acceleration processes. Additionally, the correlation between the performance of each module on both platforms was determined to identify the influence of the hardware architecture.

Download Full-text