scholarly journals A Cross-Direction and Progressive Network for Pan-Sharpening

2021 ◽  
Vol 13 (15) ◽  
pp. 3045
Author(s):  
Han Xu ◽  
Zhuliang Le ◽  
Jun Huang ◽  
Jiayi Ma

In this paper, we propose a cross-direction and progressive network, termed CPNet, to solve the pan-sharpening problem. The full processing of information is the main characteristic of our model, which is reflected as follows: on the one hand, we process the source images in a cross-direction manner to obtain the source images of different scales as the input of the fusion modules at different stages, which maximizes the usage of multi-scale information in the source images; on the other hand, the progressive reconstruction loss is designed to boost the training of our network and avoid partial inactivation, while maintaining the consistency of the fused result with the ground truth. Since the extraction of the information from the source images and the reconstruction of the fused image is based on the entire image rather than a single type of information, there is little loss of partial spatial or spectral information due to insufficient information processing. Extensive experiments, including qualitative and quantitative comparisons demonstrate that our model can maintain more spatial and spectral information compared to the state-of-the-art pan-sharpening methods.

Author(s):  
Han Xu ◽  
Pengwei Liang ◽  
Wei Yu ◽  
Junjun Jiang ◽  
Jiayi Ma

In this paper, we propose a new end-to-end model, called dual-discriminator conditional generative adversarial network (DDcGAN), for fusing infrared and visible images of different resolutions. Unlike the pixel-level methods and existing deep learning-based methods, the fusion task is accomplished through the adversarial process between a generator and two discriminators, in addition to the specially designed content loss. The generator is trained to generate real-like fused images to fool discriminators. The two discriminators are trained to calculate the JS divergence between the probability distribution of downsampled fused images and infrared images, and the JS divergence between the probability distribution of gradients of fused images and gradients of visible images, respectively. Thus, the fused images can compensate for the features that are not constrained by the single content loss. Consequently, the prominence of thermal targets in the infrared image and the texture details in the visible image can be preserved or even enhanced in the fused image simultaneously. Moreover, by constraining and distinguishing between the downsampled fused image and the low-resolution infrared image, DDcGAN can be preferably applied to the fusion of different resolution images. Qualitative and quantitative experiments on publicly available datasets demonstrate the superiority of our method over the state-of-the-art.


2020 ◽  
Vol 34 (07) ◽  
pp. 12484-12491 ◽  
Author(s):  
Han Xu ◽  
Jiayi Ma ◽  
Zhuliang Le ◽  
Junjun Jiang ◽  
Xiaojie Guo

In this paper, we present a new unsupervised and unified densely connected network for different types of image fusion tasks, termed as FusionDN. In our method, the densely connected network is trained to generate the fused image conditioned on source images. Meanwhile, a weight block is applied to obtain two data-driven weights as the retention degrees of features in different source images, which are the measurement of the quality and the amount of information in them. Losses of similarities based on these weights are applied for unsupervised learning. In addition, we obtain a single model applicable to multiple fusion tasks by applying elastic weight consolidation to avoid forgetting what has been learned from previous tasks when training multiple tasks sequentially, rather than train individual models for every fusion task or jointly train tasks roughly. Qualitative and quantitative results demonstrate the advantages of FusionDN compared with state-of-the-art methods in different fusion tasks.


2020 ◽  
Vol 34 (07) ◽  
pp. 10551-10558 ◽  
Author(s):  
Long Chen ◽  
Chujie Lu ◽  
Siliang Tang ◽  
Jun Xiao ◽  
Dong Zhang ◽  
...  

In this paper, we focus on the task query-based video localization, i.e., localizing a query in a long and untrimmed video. The prevailing solutions for this problem can be grouped into two categories: i) Top-down approach: It pre-cuts the video into a set of moment candidates, then it does classification and regression for each candidate; ii) Bottom-up approach: It injects the whole query content into each video frame, then it predicts the probabilities of each frame as a ground truth segment boundary (i.e., start or end). Both two frameworks have respective shortcomings: the top-down models suffer from heavy computations and they are sensitive to the heuristic rules, while the performance of bottom-up models is behind the performance of top-down counterpart thus far. However, we argue that the performance of bottom-up framework is severely underestimated by current unreasonable designs, including both the backbone and head network. To this end, we design a novel bottom-up model: Graph-FPN with Dense Predictions (GDP). For the backbone, GDP firstly generates a frame feature pyramid to capture multi-level semantics, then it utilizes graph convolution to encode the plentiful scene relationships, which incidentally mitigates the semantic gaps in the multi-scale feature pyramid. For the head network, GDP regards all frames falling in the ground truth segment as the foreground, and each foreground frame regresses the unique distances from its location to bi-directional boundaries. Extensive experiments on two challenging query-based video localization tasks (natural language video localization and video relocalization), involving four challenging benchmarks (TACoS, Charades-STA, ActivityNet Captions, and Activity-VRL), have shown that GDP surpasses the state-of-the-art top-down models.


Sensors ◽  
2021 ◽  
Vol 21 (19) ◽  
pp. 6336
Author(s):  
Shuai Yang ◽  
Rong Huang ◽  
Fang Han

Image inpainting aims to fill in corrupted regions with visually realistic and semantically plausible contents. In this paper, we propose a progressive image inpainting method, which is based on a forked-then-fused decoder network. A unit called PC-RN, which is the combination of partial convolution and region normalization, serves as the basic component to construct inpainting network. The PC-RN unit can extract useful features from the valid surroundings and can suppress incompleteness-caused interference at the same time. The forked-then-fused decoder network consists of a local reception branch, a long-range attention branch, and a squeeze-and-excitation-based fusing module. Two multi-scale contextual attention modules are deployed into the long-range attention branch for adaptively borrowing features from distant spatial positions. Progressive inpainting strategy allows the attention modules to use the previously filled region to reduce the risk of allocating wrong attention. We conduct extensive experiments on three benchmark databases: Places2, Paris StreetView, and CelebA. Qualitative and quantitative results show that the proposed inpainting model is superior to state-of-the-art works. Moreover, we perform ablation studies to reveal the functionality of each module for the image inpainting task.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1881
Author(s):  
Yuhui Chang ◽  
Jiangtao Xu ◽  
Zhiyuan Gao

To improve the accuracy of stereo matching, the multi-scale dense attention network (MDA-Net) is proposed. The network introduces two novel modules in the feature extraction stage to achieve better exploit of context information: dual-path upsampling (DU) block and attention-guided context-aware pyramid feature extraction (ACPFE) block. The DU block is introduced to fuse different scale feature maps. It introduces sub-pixel convolution to compensate for the loss of information caused by the traditional interpolation upsampling method. The ACPFE block is proposed to extract multi-scale context information. Pyramid atrous convolution is adopted to exploit multi-scale features and the channel-attention is used to fuse the multi-scale features. The proposed network has been evaluated on several benchmark datasets. The three-pixel-error evaluated over all ground truth pixels is 2.10% on KITTI 2015 dataset. The experiment results prove that MDA-Net achieves state-of-the-art accuracy on KITTI 2012 and 2015 datasets.


2021 ◽  
Vol 13 (10) ◽  
pp. 1962
Author(s):  
Qin Liu ◽  
Letong Han ◽  
Rui Tan ◽  
Hongfei Fan ◽  
Weiqi Li ◽  
...  

Pansharpening aims at fusing the rich spectral information of multispectral(MS) images and the spatial details of panchromatic(PAN) images to generate a fused image with both high resolutions. In general, the existing pansharpening methods suffer from the problems of spectral distortion and lack of spatial detail information, which might prevent the accuracy computation for ground object identification. To alleviate these problems, we propose a Hybrid Attention mechanism-based Residual Neural Network(HARNN) . In the proposed network, we develop an encoder attention module in the feature extraction part to better utilize the spectral and spatial features of MS and PAN images. Furthermore, the fusion attention module is designed to alleviate spectral distortion and improve contour details of the fused image. A series of ablation and contrast experiments are conducted on GF-1 and GF-2 datasets. The fusion results with less distorted pixels and more spatial details demonstrate that HARNN can implement the pansharpening task effectively, which outperforms the state-of-the-art algorithms.


Author(s):  
Markos Georgopoulos ◽  
James Oldfield ◽  
Mihalis A. Nicolaou ◽  
Yannis Panagakis ◽  
Maja Pantic

AbstractDeep learning has catalysed progress in tasks such as face recognition and analysis, leading to a quick integration of technological solutions in multiple layers of our society. While such systems have proven to be accurate by standard evaluation metrics and benchmarks, a surge of work has recently exposed the demographic bias that such algorithms exhibit–highlighting that accuracy does not entail fairness. Clearly, deploying biased systems under real-world settings can have grave consequences for affected populations. Indeed, learning methods are prone to inheriting, or even amplifying the bias present in a training set, manifested by uneven representation across demographic groups. In facial datasets, this particularly relates to attributes such as skin tone, gender, and age. In this work, we address the problem of mitigating bias in facial datasets by data augmentation. We propose a multi-attribute framework that can successfully transfer complex, multi-scale facial patterns even if these belong to underrepresented groups in the training set. This is achieved by relaxing the rigid dependence on a single attribute label, and further introducing a tensor-based mixing structure that captures multiplicative interactions between attributes in a multilinear fashion. We evaluate our method with an extensive set of qualitative and quantitative experiments on several datasets, with rigorous comparisons to state-of-the-art methods. We find that the proposed framework can successfully mitigate dataset bias, as evinced by extensive evaluations on established diversity metrics, while significantly improving fairness metrics such as equality of opportunity.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 239
Author(s):  
Yansong Gu ◽  
Xinya Wang ◽  
Can Zhang ◽  
Baiyang Li

Obtaining key and rich visual information under sophisticated road conditions is one of the key requirements for advanced driving assistance. In this paper, a newfangled end-to-end model is proposed for advanced driving assistance based on the fusion of infrared and visible images, termed as FusionADA. In our model, we are committed to extracting and fusing the optimal texture details and salient thermal targets from the source images. To achieve this goal, our model constitutes an adversarial framework between the generator and the discriminator. Specifically, the generator aims to generate a fused image with basic intensity information together with the optimal texture details from source images, while the discriminator aims to force the fused image to restore the salient thermal targets from the source infrared image. In addition, our FusionADA is a fully end-to-end model, solving the issues of manually designing complicated activity level measurements and fusion rules existing in traditional methods. Qualitative and quantitative experiments on publicly available datasets RoadScene and TNO demonstrate the superiority of our FusionADA over the state-of-the-art approaches.


Author(s):  
GURJIT SINGH WALIA ◽  
RAJIV KAPOOR

The aim of this paper is to give an overview of state-of-the-art human detection methods, classify them into reasonable categories and listing new trends in this field. Human detection in static images or video sequences is a challenging area of research with its main application in surveillance, defence, intelligent vehicles and robotics. In this survey paper, we classify the human detection methods based upon the mode of data acquisition and features used for detection of human. The review of different algorithms reported in last few years for human detection is tabulated. Also, we reviewed standard dataset for evaluation of human detection and studied the statistics about these dataset such as number of frames, challenging environments conditions, ground truth structure, camera details, etc. Different performance measures for human detection algorithms both qualitative and quantitative are listed out. Our survey gauges the gap between the present research and future requirements.


Author(s):  
Elisa González Torres

ResumenEste artículo establece las bases teóricas y metodológicas para el estudio de la productividad de los predicados afi jales en una base de datos del inglés antiguo. Tras un breve análisis del estado de la cuestión en productividad léxica, se propone distinguir la productividad cualitativa de la cuantitativa. La productividad cualitativa se analiza desde el punto de vista de la distribución y el comportamiento de los predicados afi jales. De manera tentativa, los predicados afi jales a-, æ-, be-, for-, ofer- y to- se consideran cuantitativamente productivos. También se propone y se ilustra la diferencia entre el fenómeno de hapax legomena absoluto y relativo. Tras un análisis de unos mil trescientos predicados verbales derivados, los predicados afi jales for-, on- y to- se confi rman como productivos.Palabras clave: Inglés antiguo, gramática, morfología, derivación, afijos.AbstractThis journal article establishes the theoretical and methodological bases for the study of the productivity of affi xal predicates in a database of Old English. After revising critically the state of the art about lexical productivity, we propose to distinguish between qualitative and quantitative productivity. Qualitative productivity, on the one hand, is analised from the point of view of distribution and the behaviour of affi xal predicates. In a tentative way, the affi xal predicates a-, æ-, be-, for-, ofer- y to- are considered quantitatively productive. Besides, it is proposed and illustrated the difference between the hapax legomena phenomenon absolute and relative. After an analysis of about one thousand three hundred derived verbal predicates, the affi xal predicates for-, on- and to- qualify as productive.Keywords: Old English, grammar, morphology, derivation, affixes.


Sign in / Sign up

Export Citation Format

Share Document