Video Frame Interpolation Based on Multi-scale Convolutional Network and Adversarial Training

Recently, video frame interpolation research developed with a convolutional neural network has shown remarkable results. However, these methods demand huge amounts of memory and run time for high-resolution videos, and are unable to process a 4K frame in a single pass. In this paper, we propose a fast 4K video frame interpolation method, based upon a multi-scale optical flow reconstruction scheme. The proposed method predicts low resolution bi-directional optical flow, and reconstructs it into high resolution. We also proposed consistency and multi-scale smoothness loss to enhance the quality of the predicted optical flow. Furthermore, we use adversarial loss to make the interpolated frame more seamless and natural. We demonstrated that the proposed method outperforms the existing state-of-the-art methods in quantitative evaluation, while it runs up to 4.39× faster than those methods for 4K videos.

Download Full-text

FISR: Deep Joint Frame Interpolation and Super-Resolution with a Multi-Scale Temporal Loss

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6788 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11278-11286 ◽

Cited By ~ 2

Author(s):

Soo Ye Kim ◽

Jihyong Oh ◽

Munchurl Kim

Keyword(s):

Video Sequence ◽

Super Resolution ◽

Motion Artifacts ◽

Video Frame ◽

High Definition ◽

Frame Interpolation ◽

Display Devices ◽

Multi Scale ◽

Training Scheme ◽

Spatio Temporal

Super-resolution (SR) has been widely used to convert low-resolution legacy videos to high-resolution (HR) ones, to suit the increasing resolution of displays (e.g. UHD TVs). However, it becomes easier for humans to notice motion artifacts (e.g. motion judder) in HR videos being rendered on larger-sized display devices. Thus, broadcasting standards support higher frame rates for UHD (Ultra High Definition) videos (4K@60 fps, 8K@120 fps), meaning that applying SR only is insufficient to produce genuine high quality videos. Hence, to up-convert legacy videos for realistic applications, not only SR but also video frame interpolation (VFI) is necessitated. In this paper, we first propose a joint VFI-SR framework for up-scaling the spatio-temporal resolution of videos from 2K 30 fps to 4K 60 fps. For this, we propose a novel training scheme with a multi-scale temporal loss that imposes temporal regularization on the input video sequence, which can be applied to any general video-related task. The proposed structure is analyzed in depth with extensive experiments.

Download Full-text

Multi-Scale Warping for Video Frame Interpolation

IEEE Access ◽

10.1109/access.2021.3126593 ◽

2021 ◽

pp. 1-1

Author(s):

Whan Choi ◽

Yeong Jun Koh ◽

Chang-Su Kim

Keyword(s):

Video Frame ◽

Frame Interpolation ◽

Multi Scale

Download Full-text

Multi-Scale Attention Generative Adversarial Networks for Video Frame Interpolation

IEEE Access ◽

10.1109/access.2020.2995705 ◽

2020 ◽

Vol 8 ◽

pp. 94842-94851 ◽

Cited By ~ 1

Author(s):

Jian Xiao ◽

Xiaojun Bi

Keyword(s):

Generative Adversarial Networks ◽

Video Frame ◽

Frame Interpolation ◽

Multi Scale ◽

Adversarial Networks

Download Full-text

Semantic Relation Model and Dataset for Remote Sensing Scene Understanding

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070488 ◽

2021 ◽

Vol 10 (7) ◽

pp. 488

Author(s):

Peng Li ◽

Dezheng Zhang ◽

Aziguli Wulamu ◽

Xin Liu ◽

Peng Chen

Keyword(s):

Remote Sensing ◽

Scene Understanding ◽

Deep Understanding ◽

Remote Sensing Images ◽

Convolutional Network ◽

Scene Graph ◽

Multi Scale ◽

Relationship Extraction ◽

High Level ◽

Graph Generation

A deep understanding of our visual world is more than an isolated perception on a series of objects, and the relationships between them also contain rich semantic information. Especially for those satellite remote sensing images, the span is so large that the various objects are always of different sizes and complex spatial compositions. Therefore, the recognition of semantic relations is conducive to strengthen the understanding of remote sensing scenes. In this paper, we propose a novel multi-scale semantic fusion network (MSFN). In this framework, dilated convolution is introduced into a graph convolutional network (GCN) based on an attentional mechanism to fuse and refine multi-scale semantic context, which is crucial to strengthen the cognitive ability of our model Besides, based on the mapping between visual features and semantic embeddings, we design a sparse relationship extraction module to remove meaningless connections among entities and improve the efficiency of scene graph generation. Meanwhile, to further promote the research of scene understanding in remote sensing field, this paper also proposes a remote sensing scene graph dataset (RSSGD). We carry out extensive experiments and the results show that our model significantly outperforms previous methods on scene graph generation. In addition, RSSGD effectively bridges the huge semantic gap between low-level perception and high-level cognition of remote sensing images.

Download Full-text

Attention Multi-Scale Network for Automatic Layer Extraction of Ice Radar Topological Sequences

Remote Sensing ◽

10.3390/rs13122425 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2425

Author(s):

Yiheng Cai ◽

Dan Liu ◽

Jin Xie ◽

Jingxian Yang ◽

Xiangbin Cui ◽

...

Keyword(s):

Deep Learning ◽

Global Climate ◽

Ice Sheets ◽

Ice Sheet ◽

Sheet Thickness ◽

Learning Methods ◽

Convolutional Network ◽

Multi Scale ◽

Ice Surface ◽

Layer Extraction

Analyzing the surface and bedrock locations in radar imagery enables the computation of ice sheet thickness, which is important for the study of ice sheets, their volume and how they may contribute to global climate change. However, the traditional handcrafted methods cannot quickly provide quantitative, objective and reliable extraction of information from radargrams. Most traditional handcrafted methods, designed to detect ice-surface and ice-bed layers from ice sheet radargrams, require complex human involvement and are difficult to apply to large datasets, while deep learning methods can obtain better results in a generalized way. In this study, an end-to-end multi-scale attention network (MsANet) is proposed to realize the estimation and reconstruction of layers in sequences of ice sheet radar tomographic images. First, we use an improved 3D convolutional network, C3D-M, whose first full connection layer is replaced by a convolution unit to better maintain the spatial relativity of ice layer features, as the backbone. Then, an adjustable multi-scale module uses different scale filters to learn scale information to enhance the feature extraction capabilities of the network. Finally, an attention module extended to 3D space removes a redundant bottleneck unit to better fuse and refine ice layer features. Radar sequential images collected by the Center of Remote Sensing of Ice Sheets in 2014 are used as training and testing data. Compared with state-of-the-art deep learning methods, the MsANet shows a 10% reduction (2.14 pixels) on the measurement of average mean absolute column-wise error for detecting the ice-surface and ice-bottom layers, runs faster and uses approximately 12 million fewer parameters.

Download Full-text

Driver Drowsiness Estimation Based on Factorized Bilinear Feature Fusion and a Long-Short-Term Recurrent Convolutional Network

Information ◽

10.3390/info12010003 ◽

2020 ◽

Vol 12 (1) ◽

pp. 3

Author(s):

Shuang Chen ◽

Zengcai Wang ◽

Wenxin Chen

Keyword(s):

Short Term Memory ◽

Feature Fusion ◽

Detection Methods ◽

Video Frame ◽

Estimation Model ◽

Short Term ◽

Convolutional Network ◽

Drowsiness Detection ◽

Driver Drowsiness ◽

Time Information

The effective detection of driver drowsiness is an important measure to prevent traffic accidents. Most existing drowsiness detection methods only use a single facial feature to identify fatigue status, ignoring the complex correlation between fatigue features and the time information of fatigue features, and this reduces the recognition accuracy. To solve these problems, we propose a driver sleepiness estimation model based on factorized bilinear feature fusion and a long- short-term recurrent convolutional network to detect driver drowsiness efficiently and accurately. The proposed framework includes three models: fatigue feature extraction, fatigue feature fusion, and driver drowsiness detection. First, we used a convolutional neural network (CNN) to effectively extract the deep representation of eye and mouth-related fatigue features from the face area detected in each video frame. Then, based on the factorized bilinear feature fusion model, we performed a nonlinear fusion of the deep feature representations of the eyes and mouth. Finally, we input a series of fused frame-level features into a long-short-term memory (LSTM) unit to obtain the time information of the features and used the softmax classifier to detect sleepiness. The proposed framework was evaluated with the National Tsing Hua University drowsy driver detection (NTHU-DDD) video dataset. The experimental results showed that this method had better stability and robustness compared with other methods.

Download Full-text

A Multi-Scale Feature Extraction-Based Normalized Attention Neural Network for Image Denoising

Electronics ◽

10.3390/electronics10030319 ◽

2021 ◽

Vol 10 (3) ◽

pp. 319

Author(s):

Yi Wang ◽

Xiao Song ◽

Guanghong Gong ◽

Ni Li

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Image Denoising ◽

Color Image ◽

Rapid Development ◽

Similarity Index ◽

Structural Similarity ◽

Convolutional Network ◽

Scale Feature ◽

Multi Scale

Due to the rapid development of deep learning and artificial intelligence techniques, denoising via neural networks has drawn great attention due to their flexibility and excellent performances. However, for most convolutional network denoising methods, the convolution kernel is only one layer deep, and features of distinct scales are neglected. Moreover, in the convolution operation, all channels are treated equally; the relationships of channels are not considered. In this paper, we propose a multi-scale feature extraction-based normalized attention neural network (MFENANN) for image denoising. In MFENANN, we define a multi-scale feature extraction block to extract and combine features at distinct scales of the noisy image. In addition, we propose a normalized attention network (NAN) to learn the relationships between channels, which smooths the optimization landscape and speeds up the convergence process for training an attention model. Moreover, we introduce the NAN to convolutional network denoising, in which each channel gets gain; channels can play different roles in the subsequent convolution. To testify the effectiveness of the proposed MFENANN, we used both grayscale and color image sets whose noise levels ranged from 0 to 75 to do the experiments. The experimental results show that compared with some state-of-the-art denoising methods, the restored images of MFENANN have larger peak signal-to-noise ratios (PSNR) and structural similarity index measure (SSIM) values and get better overall appearance.

Download Full-text

Multi-scale Graph Convolutional Network for Intersection Detection from GPS Trajectories

Proceedings of the 3rd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery - GeoAI 2019 ◽

10.1145/3356471.3365234 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yifang Yin ◽

Abhinav Sunderrajan ◽

Xiaocheng Huang ◽

Jagannadan Varadarajan ◽

Guanfeng Wang ◽

...

Keyword(s):

Convolutional Network ◽

Multi Scale ◽

Intersection Detection ◽

Gps Trajectories

Download Full-text

Video Frame Interpolation Based on Multi-scale Convolutional Network and Adversarial Training

A Multi-Scale Position Feature Transform Network for Video Frame Interpolation

A Fast 4K Video Frame Interpolation Using a Multi-Scale Optical Flow Reconstruction Network

FISR: Deep Joint Frame Interpolation and Super-Resolution with a Multi-Scale Temporal Loss

Multi-Scale Warping for Video Frame Interpolation

Multi-Scale Attention Generative Adversarial Networks for Video Frame Interpolation

Semantic Relation Model and Dataset for Remote Sensing Scene Understanding

Attention Multi-Scale Network for Automatic Layer Extraction of Ice Radar Topological Sequences

Driver Drowsiness Estimation Based on Factorized Bilinear Feature Fusion and a Long-Short-Term Recurrent Convolutional Network

A Multi-Scale Feature Extraction-Based Normalized Attention Neural Network for Image Denoising

Multi-scale Graph Convolutional Network for Intersection Detection from GPS Trajectories

Export Citation Format