scholarly journals Panchromatic Image Super-Resolution via Self Attention-augmented WGAN

Author(s):  
Juan Du ◽  
Kuanhong Cheng ◽  
Yue Yu ◽  
Dabao Wang ◽  
Huixin Zhou

Panchromatic (PAN) images contain abundant spatial information that is useful for earth observation, but always suffer from low-resolution due to the sensor limitation and large-scale view field. The current super-resolution (SR) methods based on traditional attention mechanism have shown remarkable advantages but remain imperfect to reconstruct the edge details of SR images. To address this problem, an improved super-resolution model which involves the self-attention augmented WGAN is designed to dig out the reference information among multiple features for detail enhancement. We use an encoder-decoder network followed by a fully convolutional network (FCN) as the backbone to extract multi-scale features and reconstruct the HR results. To exploit the relevance between multi-layer feature maps, we first integrate a convolutional block attention module (CBAM) into each skip-connection of the encoder-decoder subnet, generating weighted maps to enhance both channel-wise and spatial-wise feature representation automatically. Besides, considering that the HR results and LR inputs are highly similar in structure, yet cannot be fully reflected in traditional attention mechanism, we therefore design a self augmented attention (SAA) module, where the attention weights are produced dynamically via a similarity function between hidden features, this design allows the network to flexibly adjust the fraction relevance among multi-layer features and keep the long-range inter information, which is helpful to preserve details. In addition, the pixel-wise loss is combined with perceptual and gradient loss to achieve comprehensive supervision. Experiments on benchmark datasets demonstrate that the proposed method outperforms other SR methods in terms of both objective evaluation and visual effect.

Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2158
Author(s):  
Juan Du ◽  
Kuanhong Cheng ◽  
Yue Yu ◽  
Dabao Wang ◽  
Huixin Zhou

Panchromatic (PAN) images contain abundant spatial information that is useful for earth observation, but always suffer from low-resolution ( LR) due to the sensor limitation and large-scale view field. The current super-resolution (SR) methods based on traditional attention mechanism have shown remarkable advantages but remain imperfect to reconstruct the edge details of SR images. To address this problem, an improved SR model which involves the self-attention augmented Wasserstein generative adversarial network ( SAA-WGAN) is designed to dig out the reference information among multiple features for detail enhancement. We use an encoder-decoder network followed by a fully convolutional network (FCN) as the backbone to extract multi-scale features and reconstruct the High-resolution (HR) results. To exploit the relevance between multi-layer feature maps, we first integrate a convolutional block attention module (CBAM) into each skip-connection of the encoder-decoder subnet, generating weighted maps to enhance both channel-wise and spatial-wise feature representation automatically. Besides, considering that the HR results and LR inputs are highly similar in structure, yet cannot be fully reflected in traditional attention mechanism, we, therefore, designed a self augmented attention (SAA) module, where the attention weights are produced dynamically via a similarity function between hidden features; this design allows the network to flexibly adjust the fraction relevance among multi-layer features and keep the long-range inter information, which is helpful to preserve details. In addition, the pixel-wise loss is combined with perceptual and gradient loss to achieve comprehensive supervision. Experiments on benchmark datasets demonstrate that the proposed method outperforms other SR methods in terms of both objective evaluation and visual effect.


2021 ◽  
Vol 11 (7) ◽  
pp. 3111
Author(s):  
Enjie Ding ◽  
Yuhao Cheng ◽  
Chengcheng Xiao ◽  
Zhongyu Liu ◽  
Wanli Yu

Light-weight convolutional neural networks (CNNs) suffer limited feature representation capabilities due to low computational budgets, resulting in degradation in performance. To make CNNs more efficient, dynamic neural networks (DyNet) have been proposed to increase the complexity of the model by using the Squeeze-and-Excitation (SE) module to adaptively obtain the importance of each convolution kernel through the attention mechanism. However, the attention mechanism in the SE network (SENet) selects all channel information for calculations, which brings essential challenges: (a) interference caused by the internal redundant information; and (b) increasing number of network calculations. To address the above problems, this work proposes a dynamic convolutional network (termed as EAM-DyNet) to reduce the number of channels in feature maps by extracting only the useful spatial information. EAM-DyNet first uses the random channel reduction and channel grouping reduction methods to remove the redundancy in the information. As the downsampling of information can lead to the loss of useful information, it then applies an adaptive average pooling method to maintain the information integrity. Extensive experimental results on the baseline demonstrate that EAM-DyNet outperformed the existing approaches, thus it can achieve higher accuracy of the network test and less network parameters.


Author(s):  
Xiaobin Zhu ◽  
Zhuangzi Li ◽  
Xiao-Yu Zhang ◽  
Changsheng Li ◽  
Yaqi Liu ◽  
...  

Video super-resolution is a challenging task, which has attracted great attention in research and industry communities. In this paper, we propose a novel end-to-end architecture, called Residual Invertible Spatio-Temporal Network (RISTN) for video super-resolution. The RISTN can sufficiently exploit the spatial information from low-resolution to high-resolution, and effectively models the temporal consistency from consecutive video frames. Compared with existing recurrent convolutional network based approaches, RISTN is much deeper but more efficient. It consists of three major components: In the spatial component, a lightweight residual invertible block is designed to reduce information loss during feature transformation and provide robust feature representations. In the temporal component, a novel recurrent convolutional model with residual dense connections is proposed to construct deeper network and avoid feature degradation. In the reconstruction component, a new fusion method based on the sparse strategy is proposed to integrate the spatial and temporal features. Experiments on public benchmark datasets demonstrate that RISTN outperforms the state-ofthe-art methods.


Author(s):  
Qiang Yu ◽  
Feiqiang Liu ◽  
Long Xiao ◽  
Zitao Liu ◽  
Xiaomin Yang

Deep-learning (DL)-based methods are of growing importance in the field of single image super-resolution (SISR). The practical application of these DL-based models is a remaining problem due to the requirement of heavy computation and huge storage resources. The powerful feature maps of hidden layers in convolutional neural networks (CNN) help the model learn useful information. However, there exists redundancy among feature maps, which can be further exploited. To address these issues, this paper proposes a lightweight efficient feature generating network (EFGN) for SISR by constructing the efficient feature generating block (EFGB). Specifically, the EFGB can conduct plain operations on the original features to produce more feature maps with parameters slightly increasing. With the help of these extra feature maps, the network can extract more useful information from low resolution (LR) images to reconstruct the desired high resolution (HR) images. Experiments conducted on the benchmark datasets demonstrate that the proposed EFGN can outperform other deep-learning based methods in most cases and possess relatively lower model complexity. Additionally, the running time measurement indicates the feasibility of real-time monitoring.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1234
Author(s):  
Lei Zha ◽  
Yu Yang ◽  
Zicheng Lai ◽  
Ziwei Zhang ◽  
Juan Wen

In recent years, neural networks for single image super-resolution (SISR) have applied more profound and deeper network structures to extract extra image details, which brings difficulties in model training. To deal with deep model training problems, researchers utilize dense skip connections to promote the model’s feature representation ability by reusing deep features of different receptive fields. Benefiting from the dense connection block, SRDensenet has achieved excellent performance in SISR. Despite the fact that the dense connected structure can provide rich information, it will also introduce redundant and useless information. To tackle this problem, in this paper, we propose a Lightweight Dense Connected Approach with Attention for Single Image Super-Resolution (LDCASR), which employs the attention mechanism to extract useful information in channel dimension. Particularly, we propose the recursive dense group (RDG), consisting of Dense Attention Blocks (DABs), which can obtain more significant representations by extracting deep features with the aid of both dense connections and the attention module, making our whole network attach importance to learning more advanced feature information. Additionally, we introduce the group convolution in DABs, which can reduce the number of parameters to 0.6 M. Extensive experiments on benchmark datasets demonstrate the superiority of our proposed method over five chosen SISR methods.


2018 ◽  
pp. 1307-1321
Author(s):  
Vinh-Tiep Nguyen ◽  
Thanh Duc Ngo ◽  
Minh-Triet Tran ◽  
Duy-Dinh Le ◽  
Duc Anh Duong

Large-scale image retrieval has been shown remarkable potential in real-life applications. The standard approach is based on Inverted Indexing, given images are represented using Bag-of-Words model. However, one major limitation of both Inverted Index and Bag-of-Words presentation is that they ignore spatial information of visual words in image presentation and comparison. As a result, retrieval accuracy is decreased. In this paper, the authors investigate an approach to integrate spatial information into Inverted Index to improve accuracy while maintaining short retrieval time. Experiments conducted on several benchmark datasets (Oxford Building 5K, Oxford Building 5K+100K and Paris 6K) demonstrate the effectiveness of our proposed approach.


Author(s):  
Vinh-Tiep Nguyen ◽  
Thanh Duc Ngo ◽  
Minh-Triet Tran ◽  
Duy-Dinh Le ◽  
Duc Anh Duong

Large-scale image retrieval has been shown remarkable potential in real-life applications. The standard approach is based on Inverted Indexing, given images are represented using Bag-of-Words model. However, one major limitation of both Inverted Index and Bag-of-Words presentation is that they ignore spatial information of visual words in image presentation and comparison. As a result, retrieval accuracy is decreased. In this paper, the authors investigate an approach to integrate spatial information into Inverted Index to improve accuracy while maintaining short retrieval time. Experiments conducted on several benchmark datasets (Oxford Building 5K, Oxford Building 5K+100K and Paris 6K) demonstrate the effectiveness of our proposed approach.


2020 ◽  
Vol 12 (10) ◽  
pp. 1660 ◽  
Author(s):  
Qiang Li ◽  
Qi Wang ◽  
Xuelong Li

Deep learning-based hyperspectral image super-resolution (SR) methods have achieved great success recently. However, there are two main problems in the previous works. One is to use the typical three-dimensional convolution analysis, resulting in more parameters of the network. The other is not to pay more attention to the mining of hyperspectral image spatial information, when the spectral information can be extracted. To address these issues, in this paper, we propose a mixed convolutional network (MCNet) for hyperspectral image super-resolution. We design a novel mixed convolutional module (MCM) to extract the potential features by 2D/3D convolution instead of one convolution, which enables the network to more mine spatial features of hyperspectral image. To explore the effective features from 2D unit, we design the local feature fusion to adaptively analyze from all the hierarchical features in 2D units. In 3D unit, we employ spatial and spectral separable 3D convolution to extract spatial and spectral information, which reduces unaffordable memory usage and training time. Extensive evaluations and comparisons on three benchmark datasets demonstrate that the proposed approach achieves superior performance in comparison to existing state-of-the-art methods.


2021 ◽  
Vol 13 (16) ◽  
pp. 3211
Author(s):  
Tian Tian ◽  
Zhengquan Chu ◽  
Qian Hu ◽  
Li Ma

Semantic segmentation is a fundamental task in remote sensing image interpretation, which aims to assign a semantic label for every pixel in the given image. Accurate semantic segmentation is still challenging due to the complex distributions of various ground objects. With the development of deep learning, a series of segmentation networks represented by fully convolutional network (FCN) has made remarkable progress on this problem, but the segmentation accuracy is still far from expectations. This paper focuses on the importance of class-specific features of different land cover objects, and presents a novel end-to-end class-wise processing framework for segmentation. The proposed class-wise FCN (C-FCN) is shaped in the form of an encoder-decoder structure with skip-connections, in which the encoder is shared to produce general features for all categories and the decoder is class-wise to process class-specific features. To be detailed, class-wise transition (CT), class-wise up-sampling (CU), class-wise supervision (CS), and class-wise classification (CC) modules are designed to achieve the class-wise transfer, recover the resolution of class-wise feature maps, bridge the encoder and modified decoder, and implement class-wise classifications, respectively. Class-wise and group convolutions are adopted in the architecture with regard to the control of parameter numbers. The method is tested on the public ISPRS 2D semantic labeling benchmark datasets. Experimental results show that the proposed C-FCN significantly improves the segmentation performances compared with many state-of-the-art FCN-based networks, revealing its potentials on accurate segmentation of complex remote sensing images.


2021 ◽  
Vol 15 ◽  
Author(s):  
Hongbo Wang ◽  
Yu Liu ◽  
Xiaoxiao Zhen ◽  
Xuyan Tu

Depression has become one of the main afflictions that threaten people's mental health. However, the current traditional diagnosis methods have certain limitations, so it is necessary to find a method of objective evaluation of depression based on intelligent technology to assist in the early diagnosis and treatment of patients. Because the abnormal speech features of patients with depression are related to their mental state to some extent, it is valuable to use speech acoustic features as objective indicators for the diagnosis of depression. In order to solve the problem of the complexity of speech in depression and the limited performance of traditional feature extraction methods for speech signals, this article suggests a Three-Dimensional Convolutional filter bank with Highway Networks and Bidirectional GRU (Gated Recurrent Unit) with an Attention mechanism (in short 3D-CBHGA), which includes two key strategies. (1) The three-dimensional feature extraction of the speech signal can timely realize the expression ability of those depression signals. (2) Based on the attention mechanism in the GRU network, the frame-level vector is weighted to get the hidden emotion vector by self-learning. Experiments show that the proposed 3D-CBHGA can well establish mapping from speech signals to depression-related features and improve the accuracy of depression detection in speech signals.


Sign in / Sign up

Export Citation Format

Share Document