Video Frame Interpolation Based on Multi-scale Convolutional Network and Adversarial Training

Author(s):  
Chenguang Li ◽  
Donghao Gu ◽  
Xueyan Ma ◽  
Kai Yang ◽  
Shaohui Liu ◽  
...  
Symmetry ◽  
2019 ◽  
Vol 11 (10) ◽  
pp. 1251 ◽  
Author(s):  
Ahn ◽  
Jeong ◽  
Kim ◽  
Kwon ◽  
Yoo

Recently, video frame interpolation research developed with a convolutional neural network has shown remarkable results. However, these methods demand huge amounts of memory and run time for high-resolution videos, and are unable to process a 4K frame in a single pass. In this paper, we propose a fast 4K video frame interpolation method, based upon a multi-scale optical flow reconstruction scheme. The proposed method predicts low resolution bi-directional optical flow, and reconstructs it into high resolution. We also proposed consistency and multi-scale smoothness loss to enhance the quality of the predicted optical flow. Furthermore, we use adversarial loss to make the interpolated frame more seamless and natural. We demonstrated that the proposed method outperforms the existing state-of-the-art methods in quantitative evaluation, while it runs up to 4.39× faster than those methods for 4K videos.


2020 ◽  
Vol 34 (07) ◽  
pp. 11278-11286 ◽  
Author(s):  
Soo Ye Kim ◽  
Jihyong Oh ◽  
Munchurl Kim

Super-resolution (SR) has been widely used to convert low-resolution legacy videos to high-resolution (HR) ones, to suit the increasing resolution of displays (e.g. UHD TVs). However, it becomes easier for humans to notice motion artifacts (e.g. motion judder) in HR videos being rendered on larger-sized display devices. Thus, broadcasting standards support higher frame rates for UHD (Ultra High Definition) videos (4K@60 fps, 8K@120 fps), meaning that applying SR only is insufficient to produce genuine high quality videos. Hence, to up-convert legacy videos for realistic applications, not only SR but also video frame interpolation (VFI) is necessitated. In this paper, we first propose a joint VFI-SR framework for up-scaling the spatio-temporal resolution of videos from 2K 30 fps to 4K 60 fps. For this, we propose a novel training scheme with a multi-scale temporal loss that imposes temporal regularization on the input video sequence, which can be applied to any general video-related task. The proposed structure is analyzed in depth with extensive experiments.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Whan Choi ◽  
Yeong Jun Koh ◽  
Chang-Su Kim

2021 ◽  
Vol 10 (7) ◽  
pp. 488
Author(s):  
Peng Li ◽  
Dezheng Zhang ◽  
Aziguli Wulamu ◽  
Xin Liu ◽  
Peng Chen

A deep understanding of our visual world is more than an isolated perception on a series of objects, and the relationships between them also contain rich semantic information. Especially for those satellite remote sensing images, the span is so large that the various objects are always of different sizes and complex spatial compositions. Therefore, the recognition of semantic relations is conducive to strengthen the understanding of remote sensing scenes. In this paper, we propose a novel multi-scale semantic fusion network (MSFN). In this framework, dilated convolution is introduced into a graph convolutional network (GCN) based on an attentional mechanism to fuse and refine multi-scale semantic context, which is crucial to strengthen the cognitive ability of our model Besides, based on the mapping between visual features and semantic embeddings, we design a sparse relationship extraction module to remove meaningless connections among entities and improve the efficiency of scene graph generation. Meanwhile, to further promote the research of scene understanding in remote sensing field, this paper also proposes a remote sensing scene graph dataset (RSSGD). We carry out extensive experiments and the results show that our model significantly outperforms previous methods on scene graph generation. In addition, RSSGD effectively bridges the huge semantic gap between low-level perception and high-level cognition of remote sensing images.


2021 ◽  
Vol 13 (12) ◽  
pp. 2425
Author(s):  
Yiheng Cai ◽  
Dan Liu ◽  
Jin Xie ◽  
Jingxian Yang ◽  
Xiangbin Cui ◽  
...  

Analyzing the surface and bedrock locations in radar imagery enables the computation of ice sheet thickness, which is important for the study of ice sheets, their volume and how they may contribute to global climate change. However, the traditional handcrafted methods cannot quickly provide quantitative, objective and reliable extraction of information from radargrams. Most traditional handcrafted methods, designed to detect ice-surface and ice-bed layers from ice sheet radargrams, require complex human involvement and are difficult to apply to large datasets, while deep learning methods can obtain better results in a generalized way. In this study, an end-to-end multi-scale attention network (MsANet) is proposed to realize the estimation and reconstruction of layers in sequences of ice sheet radar tomographic images. First, we use an improved 3D convolutional network, C3D-M, whose first full connection layer is replaced by a convolution unit to better maintain the spatial relativity of ice layer features, as the backbone. Then, an adjustable multi-scale module uses different scale filters to learn scale information to enhance the feature extraction capabilities of the network. Finally, an attention module extended to 3D space removes a redundant bottleneck unit to better fuse and refine ice layer features. Radar sequential images collected by the Center of Remote Sensing of Ice Sheets in 2014 are used as training and testing data. Compared with state-of-the-art deep learning methods, the MsANet shows a 10% reduction (2.14 pixels) on the measurement of average mean absolute column-wise error for detecting the ice-surface and ice-bottom layers, runs faster and uses approximately 12 million fewer parameters.


Information ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 3
Author(s):  
Shuang Chen ◽  
Zengcai Wang ◽  
Wenxin Chen

The effective detection of driver drowsiness is an important measure to prevent traffic accidents. Most existing drowsiness detection methods only use a single facial feature to identify fatigue status, ignoring the complex correlation between fatigue features and the time information of fatigue features, and this reduces the recognition accuracy. To solve these problems, we propose a driver sleepiness estimation model based on factorized bilinear feature fusion and a long- short-term recurrent convolutional network to detect driver drowsiness efficiently and accurately. The proposed framework includes three models: fatigue feature extraction, fatigue feature fusion, and driver drowsiness detection. First, we used a convolutional neural network (CNN) to effectively extract the deep representation of eye and mouth-related fatigue features from the face area detected in each video frame. Then, based on the factorized bilinear feature fusion model, we performed a nonlinear fusion of the deep feature representations of the eyes and mouth. Finally, we input a series of fused frame-level features into a long-short-term memory (LSTM) unit to obtain the time information of the features and used the softmax classifier to detect sleepiness. The proposed framework was evaluated with the National Tsing Hua University drowsy driver detection (NTHU-DDD) video dataset. The experimental results showed that this method had better stability and robustness compared with other methods.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 319
Author(s):  
Yi Wang ◽  
Xiao Song ◽  
Guanghong Gong ◽  
Ni Li

Due to the rapid development of deep learning and artificial intelligence techniques, denoising via neural networks has drawn great attention due to their flexibility and excellent performances. However, for most convolutional network denoising methods, the convolution kernel is only one layer deep, and features of distinct scales are neglected. Moreover, in the convolution operation, all channels are treated equally; the relationships of channels are not considered. In this paper, we propose a multi-scale feature extraction-based normalized attention neural network (MFENANN) for image denoising. In MFENANN, we define a multi-scale feature extraction block to extract and combine features at distinct scales of the noisy image. In addition, we propose a normalized attention network (NAN) to learn the relationships between channels, which smooths the optimization landscape and speeds up the convergence process for training an attention model. Moreover, we introduce the NAN to convolutional network denoising, in which each channel gets gain; channels can play different roles in the subsequent convolution. To testify the effectiveness of the proposed MFENANN, we used both grayscale and color image sets whose noise levels ranged from 0 to 75 to do the experiments. The experimental results show that compared with some state-of-the-art denoising methods, the restored images of MFENANN have larger peak signal-to-noise ratios (PSNR) and structural similarity index measure (SSIM) values and get better overall appearance.


Sign in / Sign up

Export Citation Format

Share Document