Video Face Super-Resolution with Motion-Adaptive Feedback Cell

Jingwei Xin; Nannan Wang; Jie Li; Xinbo Gao; Zhifeng Li

doi:10.1609/aaai.v34i07.6934

Video Face Super-Resolution with Motion-Adaptive Feedback Cell

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6934 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12468-12475

Author(s):

Jingwei Xin ◽

Nannan Wang ◽

Jie Li ◽

Xinbo Gao ◽

Zhifeng Li

Keyword(s):

Motion Estimation ◽

State Of The Art ◽

Super Resolution ◽

Complex Motion ◽

Deep Convolutional Neural Networks ◽

Adaptive Feedback ◽

Current State ◽

Model Complex ◽

Frame Motion ◽

Inter Frame

Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN). Current state-of-the-art CNN methods usually treat the VSR problem as a large number of separate multi-frame super-resolution tasks, at which a batch of low resolution (LR) frames is utilized to generate a single high resolution (HR) frame, and running a slide window to select LR frames over the entire video would obtain a series of HR frames. However, duo to the complex temporal dependency between frames, with the number of LR input frames increase, the performance of the reconstructed HR frames become worse. The reason is in that these methods lack the ability to model complex temporal dependencies and hard to give an accurate motion estimation and compensation for VSR process. Which makes the performance degrade drastically when the motion in frames is complex. In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way. Our approach efficiently utilizes the information of the inter-frame motion, the dependence of the network on motion estimation and compensation method can be avoid. In addition, benefiting from the excellent nature of MAFC, the network can achieve better performance in the case of extremely complex motion scenarios. Extensive evaluations and comparisons validate the strengths of our approach, and the experimental results demonstrated that the proposed framework is outperform the state-of-the-art methods.

Download Full-text

Frame and Feature-Context Video Super-Resolution

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015597 ◽

2019 ◽

Vol 33 ◽

pp. 5597-5604

Author(s):

Bo Yan ◽

Chuming Lin ◽

Weimin Tan

Keyword(s):

Information Flow ◽

State Of The Art ◽

Super Resolution ◽

Sliding Window ◽

Visual Quality ◽

Local Network ◽

Context Information ◽

High Quality ◽

Current State ◽

Inter Frame

For video super-resolution, current state-of-the-art approaches either process multiple low-resolution (LR) frames to produce each output high-resolution (HR) frame separately in a sliding window fashion or recurrently exploit the previously estimated HR frames to super-resolve the following frame. The main weaknesses of these approaches are: 1) separately generating each output frame may obtain high-quality HR estimates while resulting in unsatisfactory flickering artifacts, and 2) combining previously generated HR frames can produce temporally consistent results in the case of short information flow, but it will cause significant jitter and jagged artifacts because the previous super-resolving errors are constantly accumulated to the subsequent frames.In this paper, we propose a fully end-to-end trainable frame and feature-context video super-resolution (FFCVSR) network that consists of two key sub-networks: local network and context network, where the first one explicitly utilizes a sequence of consecutive LR frames to generate local feature and local SR frame, and the other combines the outputs of local network and the previously estimated HR frames and features to super-resolve the subsequent frame. Our approach takes full advantage of the inter-frame information from multiple LR frames and the context information from previously predicted HR frames, producing temporally consistent highquality results while maintaining real-time speed by directly reusing previous features and frames. Extensive evaluations and comparisons demonstrate that our approach produces state-of-the-art results on a standard benchmark dataset, with advantages in terms of accuracy, efficiency, and visual quality over the existing approaches.

Download Full-text

Bidirectional Temporal-Recurrent Propagation Networks for Video Super-Resolution

Electronics ◽

10.3390/electronics9122085 ◽

2020 ◽

Vol 9 (12) ◽

pp. 2085

Author(s):

Lei Han ◽

Cien Fan ◽

Ye Yang ◽

Lian Zou

Keyword(s):

Neural Networks ◽

Motion Estimation ◽

Motion Compensation ◽

Sampling Method ◽

State Of The Art ◽

Super Resolution ◽

Temporal Information ◽

Complex Motion ◽

One Step ◽

Model Size

Recently, convolutional neural networks have made a remarkable performance for video super-resolution. However, how to exploit the spatial and temporal information of video efficiently and effectively remains challenging. In this work, we design a bidirectional temporal-recurrent propagation unit. The bidirectional temporal-recurrent propagation unit makes it possible to flow temporal information in an RNN-like manner from frame to frame, which avoids complex motion estimation modeling and motion compensation. To better fuse the information of the two temporal-recurrent propagation units, we use channel attention mechanisms. Additionally, we recommend a progressive up-sampling method instead of one-step up-sampling. We find that progressive up-sampling gets better experimental results than one-stage up-sampling. Extensive experiments show that our algorithm outperforms several recent state-of-the-art video super-resolution (VSR) methods with a smaller model size.

Download Full-text

Mosaic Super-resolution via Sequential Feature Pyramid Networks

10.36227/techrxiv.11402130 ◽

2019 ◽

Author(s):

Mehrdad Shoeiby ◽

Mohammad Ali Armin ◽

Sadegh Aliakbarian ◽

Saeed Anwar ◽

Lars petersson

Keyword(s):

State Of The Art ◽

Super Resolution ◽

Autonomous Driving ◽

Single Shot ◽

Current State ◽

Wide Range ◽

Feature Pyramid ◽

Novel Method ◽

Convolutional Lstm ◽

Mosaic Images

<div>Advances in the design of multi-spectral cameras have</div><div>led to great interests in a wide range of applications, from</div><div>astronomy to autonomous driving. However, such cameras</div><div>inherently suffer from a trade-off between the spatial and</div><div>spectral resolution. In this paper, we propose to address</div><div>this limitation by introducing a novel method to carry out</div><div>super-resolution on raw mosaic images, multi-spectral or</div><div>RGB Bayer, captured by modern real-time single-shot mo-</div><div>saic sensors. To this end, we design a deep super-resolution</div><div>architecture that benefits from a sequential feature pyramid</div><div>along the depth of the network. This, in fact, is achieved</div><div>by utilizing a convolutional LSTM (ConvLSTM) to learn the</div><div>inter-dependencies between features at different receptive</div><div>fields. Additionally, by investigating the effect of different</div><div>attention mechanisms in our framework, we show that a</div><div>ConvLSTM inspired module is able to provide superior at-</div><div>tention in our context. Our extensive experiments and anal-</div><div>yses evidence that our approach yields significant super-</div><div>resolution quality, outperforming current state-of-the-art</div><div>mosaic super-resolution methods on both Bayer and multi-</div><div>spectral images. Additionally, to the best of our knowledge,</div><div>our method is the first specialized method to super-resolve</div><div>mosaic images, whether it be multi-spectral or Bayer.</div><div><br></div>

Download Full-text

A Robust Context-Based Deep Learning Approach for Highly Imbalanced Hyperspectral Classification

Computational Intelligence and Neuroscience ◽

10.1155/2021/9923491 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Juan F. Ramirez Rochac ◽

Nian Zhang ◽

Lara A. Thompson ◽

Tolessa Deksissa

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Mineral Exploration ◽

Classification Models ◽

Noise Resistance ◽

Deep Convolutional Neural Networks ◽

Current State ◽

Feature Augmentation ◽

Active Research ◽

Hyperspectral Classification

Hyperspectral imaging is an area of active research with many applications in remote sensing, mineral exploration, and environmental monitoring. Deep learning and, in particular, convolution-based approaches are the current state-of-the-art classification models. However, in the presence of noisy hyperspectral datasets, these deep convolutional neural networks underperform. In this paper, we proposed a feature augmentation approach to increase noise resistance in imbalanced hyperspectral classification. Our method calculates context-based features, and it uses a deep convolutional neuronet (DCN). We tested our proposed approach on the Pavia datasets and compared three models, DCN, PCA + DCN, and our context-based DCN, using the original datasets and the datasets plus noise. Our experimental results show that DCN and PCA + DCN perform well on the original datasets but not on the noisy datasets. Our robust context-based DCN was able to outperform others in the presence of noise and was able to maintain a comparable classification accuracy on clean hyperspectral images.

Download Full-text

AI-driven deep CNN approach for multi-label pathology classification using chest X-Rays

PeerJ Computer Science ◽

10.7717/peerj-cs.495 ◽

2021 ◽

Vol 7 ◽

pp. e495

Author(s):

Saleh Albahli ◽

Hafiz Tayyab Rauf ◽

Abdulelah Algosaibi ◽

Valentina Emilia Balas

Keyword(s):

Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Synthetic Data ◽

X Rays ◽

Deep Convolutional Neural Networks ◽

Current State ◽

Pathology Classification ◽

Wide Range ◽

Multi Class Classification

Artificial intelligence (AI) has played a significant role in image analysis and feature extraction, applied to detect and diagnose a wide range of chest-related diseases. Although several researchers have used current state-of-the-art approaches and have produced impressive chest-related clinical outcomes, specific techniques may not contribute many advantages if one type of disease is detected without the rest being identified. Those who tried to identify multiple chest-related diseases were ineffective due to insufficient data and the available data not being balanced. This research provides a significant contribution to the healthcare industry and the research community by proposing a synthetic data augmentation in three deep Convolutional Neural Networks (CNNs) architectures for the detection of 14 chest-related diseases. The employed models are DenseNet121, InceptionResNetV2, and ResNet152V2; after training and validation, an average ROC-AUC score of 0.80 was obtained competitive as compared to the previous models that were trained for multi-class classification to detect anomalies in x-ray images. This research illustrates how the proposed model practices state-of-the-art deep neural networks to classify 14 chest-related diseases with better accuracy.

Download Full-text

Deep Distillation Recursive Network for Remote Sensing Imagery Super-Resolution

Remote Sensing ◽

10.3390/rs10111700 ◽

2018 ◽

Vol 10 (11) ◽

pp. 1700 ◽

Cited By ~ 26

Author(s):

Kui Jiang ◽

Zhongyuan Wang ◽

Peng Yi ◽

Junjun Jiang ◽

Jing Xiao ◽

...

Keyword(s):

Remote Sensing ◽

Video Processing ◽

Satellite Images ◽

State Of The Art ◽

Satellite Image ◽

Super Resolution ◽

Poor Performance ◽

Feature Maps ◽

Deep Convolutional Neural Networks ◽

Frequency Components

Deep convolutional neural networks (CNNs) have been widely used and achieved state-of-the-art performance in many image or video processing and analysis tasks. In particular, for image super-resolution (SR) processing, previous CNN-based methods have led to significant improvements, when compared with shallow learning-based methods. However, previous CNN-based algorithms with simple direct or skip connections are of poor performance when applied to remote sensing satellite images SR. In this study, a simple but effective CNN framework, namely deep distillation recursive network (DDRN), is presented for video satellite image SR. DDRN includes a group of ultra-dense residual blocks (UDB), a multi-scale purification unit (MSPU), and a reconstruction module. In particular, through the addition of rich interactive links in and between multiple-path units in each UDB, features extracted from multiple parallel convolution layers can be shared effectively. Compared with classical dense-connection-based models, DDRN possesses the following main properties. (1) DDRN contains more linking nodes with the same convolution layers. (2) A distillation and compensation mechanism, which performs feature distillation and compensation in different stages of the network, is also constructed. In particular, the high-frequency components lost during information propagation can be compensated in MSPU. (3) The final SR image can benefit from the feature maps extracted from UDB and the compensated components obtained from MSPU. Experiments on Kaggle Open Source Dataset and Jilin-1 video satellite images illustrate that DDRN outperforms the conventional CNN-based baselines and some state-of-the-art feature extraction approaches.

Download Full-text

Rectification and Super-Resolution Enhancements for Forensic Text Recognition

Sensors ◽

10.3390/s20205850 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5850

Author(s):

Pablo Blanco-Medina ◽

Eduardo Fidalgo ◽

Enrique Alegre ◽

Rocío Alaiz-Rodríguez ◽

Francisco Jáñez-Martino ◽

...

Keyword(s):

Neural Networks ◽

Sexual Abuse ◽

Child Sexual Abuse ◽

State Of The Art ◽

Sexual Exploitation ◽

Super Resolution ◽

Text Recognition ◽

Deep Convolutional Neural Networks ◽

Final Score ◽

Illegal Activities

Retrieving text embedded within images is a challenging task in real-world settings. Multiple problems such as low-resolution and the orientation of the text can hinder the extraction of information. These problems are common in environments such as Tor Darknet and Child Sexual Abuse images, where text extraction is crucial in the prevention of illegal activities. In this work, we evaluate eight text recognizers and, to increase the performance of text transcription, we combine these recognizers with rectification networks and super-resolution algorithms. We test our approach on four state-of-the-art and two custom datasets (TOICO-1K and Child Sexual Abuse (CSA)-text, based on text retrieved from Tor Darknet and Child Sexual Exploitation Material, respectively). We obtained a 0.3170 score of correctly recognized words in the TOICO-1K dataset when we combined Deep Convolutional Neural Networks (CNN) and rectification-based recognizers. For the CSA-text dataset, applying resolution enhancements achieved a final score of 0.6960. The highest performance increase was achieved on the ICDAR 2015 dataset, with an improvement of 4.83% when combining the MORAN recognizer and the Residual Dense resolution approach. We conclude that rectification outperforms super-resolution when applied separately, while their combination achieves the best average improvements in the chosen datasets.

Download Full-text

Deep Residual Squeeze and Excitation Network for Remote Sensing Image Super-Resolution

Remote Sensing ◽

10.3390/rs11151817 ◽

2019 ◽

Vol 11 (15) ◽

pp. 1817 ◽

Cited By ~ 8

Author(s):

Jun Gu ◽

Xian Sun ◽

Yue Zhang ◽

Kun Fu ◽

Lei Wang

Keyword(s):

Remote Sensing ◽

State Of The Art ◽

Super Resolution ◽

Remote Sensing Image ◽

Single Image ◽

Remote Sensing Images ◽

Deep Convolutional Neural Networks ◽

Current Block ◽

Image Super Resolution ◽

Single Image Super Resolution

Recently, deep convolutional neural networks (DCNN) have obtained promising results in single image super-resolution (SISR) of remote sensing images. Due to the high complexity of remote sensing image distribution, most of the existing methods are not good enough for remote sensing image super-resolution. Enhancing the representation ability of the network is one of the critical factors to improve remote sensing image super-resolution performance. To address this problem, we propose a new SISR algorithm called a Deep Residual Squeeze and Excitation Network (DRSEN). Specifically, we propose a residual squeeze and excitation block (RSEB) as a building block in DRSEN. The RSEB fuses the input and its internal features of current block, and models the interdependencies and relationships between channels to enhance the representation power. At the same time, we improve the up-sampling module and the global residual pathway in the network to reduce the parameters of the network. Experiments on two public remote sensing datasets (UC Merced and NWPU-RESISC45) show that our DRSEN achieves better accuracy and visual improvements against most state-of-the-art methods. The DRSEN is beneficial for the progress in the remote sensing images super-resolution field.

Download Full-text

Attention Fusion for One-Stage Multispectral Pedestrian Detection

Sensors ◽

10.3390/s21124184 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4184

Author(s):

Zhiwei Cao ◽

Huihua Yang ◽

Juan Zhao ◽

Shuhong Guo ◽

Lingqiao Li

Keyword(s):

Feature Fusion ◽

State Of The Art ◽

Pedestrian Detection ◽

Complementary Information ◽

Deep Convolutional Neural Networks ◽

One Stage ◽

Current State ◽

Fusion Methods ◽

Feature Information ◽

Bounding Boxes

Multispectral pedestrian detection, which consists of a color stream and thermal stream, is essential under conditions of insufficient illumination because the fusion of the two streams can provide complementary information for detecting pedestrians based on deep convolutional neural networks (CNNs). In this paper, we introduced and adapted a simple and efficient one-stage YOLOv4 to replace the current state-of-the-art two-stage fast-RCNN for multispectral pedestrian detection and to directly predict bounding boxes with confidence scores. To further improve the detection performance, we analyzed the existing multispectral fusion methods and proposed a novel multispectral channel feature fusion (MCFF) module for integrating the features from the color and thermal streams according to the illumination conditions. Moreover, several fusion architectures, such as Early Fusion, Halfway Fusion, Late Fusion, and Direct Fusion, were carefully designed based on the MCFF to transfer the feature information from the bottom to the top at different stages. Finally, the experimental results on the KAIST and Utokyo pedestrian benchmarks showed that Halfway Fusion was used to obtain the best performance of all architectures and the MCFF could adapt fused features in the two modalities. The log-average miss rate (MR) for the two modalities with reasonable settings were 4.91% and 23.14%, respectively.

Download Full-text

Accelerated dynamic MRI via inter-frame motion estimation

2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI) ◽

10.1109/isbi.2014.6867905 ◽

2014 ◽

Cited By ~ 1

Author(s):

Chuqing Cao ◽

Ying Sun

Keyword(s):

Motion Estimation ◽

Dynamic Mri ◽

Frame Motion ◽

Inter Frame

Download Full-text