Singe Image Rain Removal with Unpaired Information: A Differentiable Programming Perspective

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019332 ◽

2019 ◽

Vol 33 ◽

pp. 9332-9339 ◽

Cited By ~ 7

Author(s):

Hongyuan Zhu ◽

Xi Peng ◽

Joey Tianyi Zhou ◽

Songfan Yang ◽

Vijay Chanderasekh ◽

...

Keyword(s):

Physical Model ◽

State Of The Art ◽

Input Image ◽

Training Data ◽

Challenging Problem ◽

Single Image ◽

Running Time ◽

Image Set ◽

Attention Memory ◽

Adversarial Model

Single image rain-streak removal is an extremely challenging problem due to the presence of non-uniform rain densities in images. Previous works solve this problem using various hand-designed priors or by explicitly mapping synthetic rain to paired clean image in a supervised way. In practice, however, the pre-defined priors are easily violated and the paired training data are hard to collect. To overcome these limitations, in this work, we propose RainRemoval-GAN (RRGAN), the first end-to-end adversarial model that generates realistic rain-free images using only unpaired supervision. Our approach alleviates the paired training constraints by introducing a physical-model which explicitly learns a recovered images and corresponding rain-streaks from the differentiable programming perspective. The proposed network consists of a novel multiscale attention memory generator and a novel multiscale deeply supervised discriminator. The multiscale attention memory generator uses a memory with attention mechanism to capture the latent rain streaks context at different stages to recover the clean images. The deeply supervised multiscale discriminator imposes constraints at the recovered output in terms of local details and global appearance to the clean image set. Together with the learned rainstreaks, a reconstruction constraint is employed to ensure the appearance consistent with the input image. Experimental results on public benchmark demonstrates our promising performance compared with nine state-of-the-art methods in terms of PSNR, SSIM, visual qualities and running time.

Download Full-text

Single Image Dehazing Using Sparse Contextual Representation

Atmosphere ◽

10.3390/atmos12101266 ◽

2021 ◽

Vol 12 (10) ◽

pp. 1266

Author(s):

Jing Qin ◽

Liang Chen ◽

Jian Xu ◽

Wenqi Ren

Keyword(s):

Sparse Representation ◽

State Of The Art ◽

Input Image ◽

Single Image ◽

Image Dehazing ◽

Dark Channel ◽

Local Patch ◽

Novel Method ◽

Contextual Representation ◽

Single Image Dehazing

In this paper, we propose a novel method to remove haze from a single hazy input image based on the sparse representation. In our method, the sparse representation is proposed to be used as a contextual regularization tool, which can reduce the block artifacts and halos produced by only using dark channel prior without soft matting as the transmission is not always constant in a local patch. A novel way to use dictionary is proposed to smooth an image and generate the sharp dehazed result. Experimental results demonstrate that our proposed method performs favorably against the state-of-the-art dehazing methods and produces high-quality dehazed and vivid color results.

Download Full-text

Direction-aware Feature-level Frequency Decomposition for Single Image Deraining

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/90 ◽

2021 ◽

Author(s):

Sen Deng ◽

Yidan Feng ◽

Mingqiang Wei ◽

Haoran Xie ◽

Yiping Chen ◽

...

Keyword(s):

High Frequency ◽

State Of The Art ◽

Low Frequency ◽

Communication Channels ◽

Input Image ◽

Experimental Results ◽

Training Procedure ◽

Single Image ◽

Frequency Decomposition

We present a novel direction-aware feature-level frequency decomposition network for single image deraining. Compared with existing solutions, the proposed network has three compelling characteristics. First, unlike previous algorithms, we propose to perform frequency decomposition at feature-level instead of image-level, allowing both low-frequency maps containing structures and high-frequency maps containing details to be continuously refined during the training procedure. Second, we further establish communication channels between low-frequency maps and high-frequency maps to interactively capture structures from high-frequency maps and add them back to low-frequency maps and, simultaneously, extract details from low-frequency maps and send them back to high-frequency maps, thereby removing rain streaks while preserving more delicate features in the input image. Third, different from existing algorithms using convolutional filters consistent in all directions, we propose a direction-aware filter to capture the direction of rain streaks in order to more effectively and thoroughly purge the input images of rain streaks. We extensively evaluate the proposed approach in three representative datasets and experimental results corroborate our approach consistently outperforms state-of-the-art deraining algorithms.

Download Full-text

A Deep Adversarial Model for Segmentation Assisted COVID-19 Diagnosis Using CT Images

10.21203/rs.3.rs-1147701/v1 ◽

2021 ◽

Author(s):

Hai-yan Yao ◽

Wang-gen Wan ◽

Xiang Li

Keyword(s):

State Of The Art ◽

Imaging Techniques ◽

Ct Images ◽

Training Data ◽

Superior Performance ◽

Common Complication ◽

Adversarial Network ◽

Global Pandemic ◽

Public Dataset ◽

Adversarial Model

Abstract The outbreak of coronavirus disease 2019(COVID-19) is spreading rapidly around the world, resulting in a global pandemic. Imaging techniques such as computed tomography (CT) play an essential role in the diagnosis and treatment of the disease since lung infection or pneumonia is a common complication. However, training a deep network to learn how to diagnose COVID-19 rapidly and accurately in CT images and segment the infected regions like a radiologist is challenging. Since the infectious area are difficult to distinguish, and manually annotation the segmentation results is time-consuming. To tackle these problems, we propose an efficient method based on a deep adversarial network to segment the infection regions automatically. Then the predicted segment results can assist the diagnosis network in identifying the COVID-19 samples from the CT images. On the other hand, a radiologist-like segmentation network provides detailed information of the infectious regions by separating areas of ground-glass, consolidation, and pleural effusion, respectively. Our method can accurately predict the COVID-19 infectious probability and provides lesion regions in CT images with limited training data. Additionally, we have established a public dataset for multitask learning. Extensive experiments on diagnoses and segmentation show superior performance over state-of-the-art methods.

Download Full-text

Single Image Defogging Method Based on Image Patch Decomposition and Multi-Exposure Image Fusion

Frontiers in Neurorobotics ◽

10.3389/fnbot.2021.700483 ◽

2021 ◽

Vol 15 ◽

Author(s):

Qiuzhuo Liu ◽

Yaqin Luo ◽

Ke Li ◽

Wenfeng Li ◽

Yi Chai ◽

...

Keyword(s):

Image Restoration ◽

Physical Model ◽

State Of The Art ◽

The State ◽

Depth Information ◽

Single Image ◽

Image Patch ◽

Image Defogging ◽

Image Patches ◽

Scene Depth

Bad weather conditions (such as fog, haze) seriously affect the visual quality of images. According to the scene depth information, physical model-based methods are used to improve image visibility for further image restoration. However, the unstable acquisition of the scene depth information seriously affects the defogging performance of physical model-based methods. Additionally, most of image enhancement-based methods focus on the global adjustment of image contrast and saturation, and lack the local details for image restoration. So, this paper proposes a single image defogging method based on image patch decomposition and multi-exposure fusion. First, a single foggy image is processed by gamma correction to obtain a set of underexposed images. Then the saturation of the obtained underexposed and original images is enhanced. Next, each image in the multi-exposure image set (including the set of underexposed images and the original image) is decomposed into the base and detail layers by a guided filter. The base layers are first decomposed into image patches, and then the fusion weight maps of the image patches are constructed. For detail layers, the exposure features are first extracted from the luminance components of images, and then the extracted exposure features are evaluated by constructing gaussian functions. Finally, both base and detail layers are combined to obtain the defogged image. The proposed method is compared with the state-of-the-art methods. The comparative experimental results confirm the effectiveness of the proposed method and its superiority over the state-of-the-art methods.

Download Full-text

Single Image Atmospheric Veil Removal Using New Priors for Better Genericity

Atmosphere ◽

10.3390/atmos12060772 ◽

2021 ◽

Vol 12 (6) ◽

pp. 772

Author(s):

Alexandra Duminil ◽

Jean-Philippe Tarel ◽

Roland Brémond

Keyword(s):

State Of The Art ◽

Input Image ◽

Airborne Particles ◽

Single Image ◽

Color Channel ◽

Qualitative And Quantitative ◽

Image Defogging ◽

Learning Techniques ◽

Different Types ◽

Broad Generalization

From an analysis of the priors used in state-of-the-art algorithms for single image defogging, a new prior is proposed to obtain a better atmospheric veil removal. Our hypothesis is based on a physical model, considering that the fog appears denser near the horizon rather than close to the camera. It leads to more restoration when the fog depth is more important, for a more natural rendering. For this purpose, the Naka–Rushton function is used to modulate the atmospheric veil according to empirical observations on synthetic foggy images. The parameters of this function are set from features of the input image. This method also prevents over-restoration and thus preserves the sky from artifacts and noises. The algorithm generalizes to different kinds of fog, airborne particles, and illumination conditions. The proposed method is extended to the nighttime and underwater images by computing the atmospheric veil on each color channel. Qualitative and quantitative evaluations show the benefit of the proposed algorithm. The quantitative evaluation shows the efficiency of the algorithm on four databases with different types of fog, which demonstrates the broad generalization allowed by the proposed algorithm, in contrast with most of the currently available deep learning techniques.

Download Full-text

An Appearance-and-Structure Fusion Network for Object Viewpoint Estimation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/684 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yueying Kao ◽

Weiming Li ◽

Zairan Wang ◽

Dongqing Zou ◽

Ran He ◽

...

Keyword(s):

State Of The Art ◽

Machine Intelligence ◽

Intelligence Community ◽

Challenging Problem ◽

Single Image ◽

Structure Information ◽

Current State ◽

Art Methods ◽

Viewpoint Estimation ◽

Visual Ambiguity

Automatic object viewpoint estimation from a single image is an important but challenging problem in machine intelligence community. Although impressive performance has been achieved, current state-of-the-art methods still have difficulty to deal with the visual ambiguity and structure ambiguity in real world images. To tackle these problems, a novel Appearance-and-Structure Fusion network, which we call it ASFnet that estimates viewpoint by fusing both appearance and structure information, is proposed in this paper. The structure information is encoded by precise semantic keypoints and can help address the visual ambiguity. Meanwhile, distinguishable appearance features contribute to overcoming the structure ambiguity. Our ASFnet integrates an appearance path and a structure path to an end-to-end network and allows deep features effectively share supervision from both the two complementary aspects. A convolutional layer is learned to fuse the two path results adaptively. To balance the influence from the two supervision sources, a piecewise loss weight strategy is employed during training. Experimentally, our proposed network outperforms state-of-the-art methods on a public PASCAL 3D+ dataset, which verifies the effectiveness of our method and further corroborates the above proposition.

Download Full-text

Im2Mesh GAN: Accurate 3D Hand Mesh Recovery from a Single RGB Image

10.21203/rs.3.rs-955086/v1 ◽

2021 ◽

Author(s):

Akila Pemasiri ◽

Kien Nguyen ◽

Sridha Sridha ◽

Clinton Fookes

Keyword(s):

State Of The Art ◽

Input Image ◽

Data Availability ◽

The Other ◽

Challenging Problem ◽

Topological Relationship ◽

New Type ◽

Adversarial Training ◽

Rgb Image ◽

Better Than

Abstract This work addresses hand mesh recovery from a single RGB image. In contrast to most of the existing approaches where parametric hand models are employed as the prior, we show that the hand mesh can be learned directly from the input image. We propose a new type of GAN called Im2Mesh GAN to learn the mesh through end-to-end adversarial training. By interpreting the mesh as a graph, our model is able to capture the topological relationship among the mesh vertices. We also introduce a 3D surface descriptor into the GAN architecture to further capture the associated 3D features. We conduct experiments with the proposed Im2Mesh GAN architecture in two settings: one where we can reap the benefits of coupled groundtruth data availability of the images and the corresponding meshes; and the other which combats the more challenging problem of mesh estimation without the corresponding groundtruth. Through extensive evaluations we demonstrate that even without using any hand priors the proposed method performs on par or better than the state-of-the-art.

Download Full-text

CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5643 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2594-2601

Author(s):

Arjun Akula ◽

Shuai Wang ◽

Song-Chun Zhu

Keyword(s):

Neural Network ◽

State Of The Art ◽

Input Image ◽

Classification Model ◽

Learning Models ◽

Fault Line ◽

Semantic Level ◽

Explainable Ai ◽

Fault Lines ◽

Classification Category

We present CoCoX (short for Conceptual and Counterfactual Explanations), a model for explaining decisions made by a deep convolutional neural network (CNN). In Cognitive Psychology, the factors (or semantic-level features) that humans zoom in on when they imagine an alternative to a model prediction are often referred to as fault-lines. Motivated by this, our CoCoX model explains decisions made by a CNN using fault-lines. Specifically, given an input image I for which a CNN classification model M predicts class cpred, our fault-line based explanation identifies the minimal semantic-level features (e.g., stripes on zebra, pointed ears of dog), referred to as explainable concepts, that need to be added to or deleted from I in order to alter the classification category of I by M to another specified class calt. We argue that, due to the conceptual and counterfactual nature of fault-lines, our CoCoX explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex deep learning models. Extensive quantitative and qualitative experiments verify our hypotheses, showing that CoCoX significantly outperforms the state-of-the-art explainable AI models. Our implementation is available at https://github.com/arjunakula/CoCoX

Download Full-text

Automatic Detection of Discrimination Actions from Social Images

Electronics ◽

10.3390/electronics10030325 ◽

2021 ◽

Vol 10 (3) ◽

pp. 325

Author(s):

Zhihao Wu ◽

Baopeng Zhang ◽

Tianchen Zhou ◽

Yan Li ◽

Jianping Fan

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Automatic Detection ◽

Experimental Results ◽

Practical Approach ◽

Detection And Identification ◽

Art Methods ◽

Image Set ◽

Social Images ◽

Relationship Identification

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.

Download Full-text

Improving Semi-Supervised Learning for Audio Classification with FixMatch

Electronics ◽

10.3390/electronics10151807 ◽

2021 ◽

Vol 10 (15) ◽

pp. 1807

Author(s):

Sascha Grollmisch ◽

Estefanía Cano

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Transfer Learning ◽

Data Transfer ◽

State Of The Art ◽

Training Data ◽

Audio Classification ◽

Image Domain ◽

Full Dataset ◽

Audio Data

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.

Download Full-text