A Multi-Scale Position Feature Transform Network for Video Frame Interpolation

Recently, video frame interpolation research developed with a convolutional neural network has shown remarkable results. However, these methods demand huge amounts of memory and run time for high-resolution videos, and are unable to process a 4K frame in a single pass. In this paper, we propose a fast 4K video frame interpolation method, based upon a multi-scale optical flow reconstruction scheme. The proposed method predicts low resolution bi-directional optical flow, and reconstructs it into high resolution. We also proposed consistency and multi-scale smoothness loss to enhance the quality of the predicted optical flow. Furthermore, we use adversarial loss to make the interpolated frame more seamless and natural. We demonstrated that the proposed method outperforms the existing state-of-the-art methods in quantitative evaluation, while it runs up to 4.39× faster than those methods for 4K videos.

Download Full-text

FISR: Deep Joint Frame Interpolation and Super-Resolution with a Multi-Scale Temporal Loss

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6788 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11278-11286 ◽

Cited By ~ 2

Author(s):

Soo Ye Kim ◽

Jihyong Oh ◽

Munchurl Kim

Keyword(s):

Video Sequence ◽

Super Resolution ◽

Motion Artifacts ◽

Video Frame ◽

High Definition ◽

Frame Interpolation ◽

Display Devices ◽

Multi Scale ◽

Training Scheme ◽

Spatio Temporal

Super-resolution (SR) has been widely used to convert low-resolution legacy videos to high-resolution (HR) ones, to suit the increasing resolution of displays (e.g. UHD TVs). However, it becomes easier for humans to notice motion artifacts (e.g. motion judder) in HR videos being rendered on larger-sized display devices. Thus, broadcasting standards support higher frame rates for UHD (Ultra High Definition) videos (4K@60 fps, 8K@120 fps), meaning that applying SR only is insufficient to produce genuine high quality videos. Hence, to up-convert legacy videos for realistic applications, not only SR but also video frame interpolation (VFI) is necessitated. In this paper, we first propose a joint VFI-SR framework for up-scaling the spatio-temporal resolution of videos from 2K 30 fps to 4K 60 fps. For this, we propose a novel training scheme with a multi-scale temporal loss that imposes temporal regularization on the input video sequence, which can be applied to any general video-related task. The proposed structure is analyzed in depth with extensive experiments.

Download Full-text

Multi-Scale Warping for Video Frame Interpolation

IEEE Access ◽

10.1109/access.2021.3126593 ◽

2021 ◽

pp. 1-1

Author(s):

Whan Choi ◽

Yeong Jun Koh ◽

Chang-Su Kim

Keyword(s):

Video Frame ◽

Frame Interpolation ◽

Multi Scale

Download Full-text

Multi-Scale Attention Generative Adversarial Networks for Video Frame Interpolation

IEEE Access ◽

10.1109/access.2020.2995705 ◽

2020 ◽

Vol 8 ◽

pp. 94842-94851 ◽

Cited By ~ 1

Author(s):

Jian Xiao ◽

Xiaojun Bi

Keyword(s):

Generative Adversarial Networks ◽

Video Frame ◽

Frame Interpolation ◽

Multi Scale ◽

Adversarial Networks

Download Full-text

Video Frame Interpolation Based on Multi-scale Convolutional Network and Adversarial Training

2018 IEEE Third International Conference on Data Science in Cyberspace (DSC) ◽

10.1109/dsc.2018.00089 ◽

2018 ◽

Cited By ~ 3

Author(s):

Chenguang Li ◽

Donghao Gu ◽

Xueyan Ma ◽

Kai Yang ◽

Shaohui Liu ◽

...

Keyword(s):

Video Frame ◽

Convolutional Network ◽

Frame Interpolation ◽

Multi Scale ◽

Adversarial Training

Download Full-text

Video Frame Interpolation via Deformable Separable Convolution

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6634 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10607-10614 ◽

Cited By ~ 2

Author(s):

Xianhang Cheng ◽

Zhenzhong Chen

Keyword(s):

State Of The Art ◽

Video Frame ◽

Kernel Size ◽

Frame Interpolation ◽

Interpolation Methods ◽

Video Frames ◽

Convolution Process ◽

Strong Performance ◽

Existing Frames ◽

Better Than

Learning to synthesize non-existing frames from the original consecutive video frames is a challenging task. Recent kernel-based interpolation methods predict pixels with a single convolution process to replace the dependency of optical flow. However, when scene motion is larger than the pre-defined kernel size, these methods yield poor results even though they take thousands of neighboring pixels into account. To solve this problem in this paper, we propose to use deformable separable convolution (DSepConv) to adaptively estimate kernels, offsets and masks to allow the network to obtain information with much fewer but more relevant pixels. In addition, we show that the kernel-based methods and conventional flow-based methods are specific instances of the proposed DSepConv. Experimental results demonstrate that our method significantly outperforms the other kernel-based interpolation methods and shows strong performance on par or even better than the state-of-the-art algorithms both qualitatively and quantitatively.

Download Full-text