scholarly journals Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

Author(s):  
Ryosuke Furuta ◽  
Naoto Inoue ◽  
Toshihiko Yamasaki

This paper tackles a new problem setting: reinforcement learning with pixel-wise rewards (pixelRL) for image processing. After the introduction of the deep Q-network, deep RL has been achieving great success. However, the applications of deep RL for image processing are still limited. Therefore, we extend deep RL to pixelRL for various image processing applications. In pixelRL, each pixel has an agent, and the agent changes the pixel value by taking an action. We also propose an effective learning method for pixelRL that significantly improves the performance by considering not only the future states of the own pixel but also those of the neighbor pixels. The proposed method can be applied to some image processing tasks that require pixel-wise manipulations, where deep RL has never been applied.We apply the proposed method to three image processing tasks: image denoising, image restoration, and local color enhancement. Our experimental results demonstrate that the proposed method achieves comparable or better performance, compared with the state-of-the-art methods based on supervised learning.

2020 ◽  
Vol 34 (07) ◽  
pp. 12935-12942 ◽  
Author(s):  
Yungeng Zhang ◽  
Yuru Pei ◽  
Yuke Guo ◽  
Gengyu Ma ◽  
Tianmin Xu ◽  
...  

In this paper, we propose a fully convolutional network-based dense map from voxels to invertible pair of displacement vector fields regarding a template grid for the consistent voxel-wise correspondence. We parameterize the volumetric mapping using a convolutional network and train it in an unsupervised way by leveraging the spatial transformer to minimize the gap between the warped volumetric image and the template grid. Instead of learning the unidirectional map, we learn the nonlinear mapping functions for both forward and backward transformations. We introduce the combinational inverse constraints for the volumetric one-to-one maps, where the pairwise and triple constraints are utilized to learn the cycle-consistent correspondence maps between volumes. Experiments on both synthetic and clinically captured volumetric cone-beam CT (CBCT) images show that the proposed framework is effective and competitive against state-of-the-art deformable registration techniques.


2019 ◽  
Vol 9 (10) ◽  
pp. 2042 ◽  
Author(s):  
Rachida Tobji ◽  
Wu Di ◽  
Naeem Ayoub

In Deep Learning, recent works show that neural networks have a high potential in the field of biometric security. The advantage of using this type of architecture, in addition to being robust, is that the network learns the characteristic vectors by creating intelligent filters in an automatic way, grace to the layers of convolution. In this paper, we propose an algorithm “FMnet” for iris recognition by using Fully Convolutional Network (FCN) and Multi-scale Convolutional Neural Network (MCNN). By taking into considerations the property of Convolutional Neural Networks to learn and work at different resolutions, our proposed iris recognition method overcomes the existing issues in the classical methods which only use handcrafted features extraction, by performing features extraction and classification together. Our proposed algorithm shows better classification results as compared to the other state-of-the-art iris recognition approaches.


Author(s):  
Rajae Moumen ◽  
Raddouane Chiheb ◽  
Rdouan Faizi

The aim of this research is to propose a fully convolutional approach to address the problem of real-time scene text detection for Arabic language. Text detection is performed using a two-steps multi-scale approach. The first step uses light-weighted fully convolutional network: TextBlockDetector FCN, an adaptation of VGG-16 to eliminate non-textual elements, localize wide scale text and give text scale estimation. The second step determines narrow scale range of text using fully convolutional network for maximum performance. To evaluate the system, we confront the results of the framework to the results obtained with single VGG-16 fully deployed for text detection in one-shot; in addition to previous results in the state-of-the-art. For training and testing, we initiate a dataset of 575 images manually processed along with data augmentation to enrich training process. The system scores a precision of 0.651 vs 0.64 in the state-of-the-art and a FPS of 24.3 vs 31.7 for a VGG-16 fully deployed.


2020 ◽  
Vol 34 (04) ◽  
pp. 5883-5891
Author(s):  
Jianwen Sun ◽  
Tianwei Zhang ◽  
Xiaofei Xie ◽  
Lei Ma ◽  
Yan Zheng ◽  
...  

Adversarial attacks against conventional Deep Learning (DL) systems and algorithms have been widely studied, and various defenses were proposed. However, the possibility and feasibility of such attacks against Deep Reinforcement Learning (DRL) are less explored. As DRL has achieved great success in various complex tasks, designing effective adversarial attacks is an indispensable prerequisite towards building robust DRL algorithms. In this paper, we introduce two novel adversarial attack techniques to stealthily and efficiently attack the DRL agents. These two techniques enable an adversary to inject adversarial samples in a minimal set of critical moments while causing the most severe damage to the agent. The first technique is the critical point attack: the adversary builds a model to predict the future environmental states and agent's actions, assesses the damage of each possible attack strategy, and selects the optimal one. The second technique is the antagonist attack: the adversary automatically learns a domain-agnostic model to discover the critical moments of attacking the agent in an episode. Experimental results demonstrate the effectiveness of our techniques. Specifically, to successfully attack the DRL agent, our critical point technique only requires 1 (TORCS) or 2 (Atari Pong and Breakout) steps, and the antagonist technique needs fewer than 5 steps (4 Mujoco tasks), which are significant improvements over state-of-the-art methods.


2021 ◽  
Vol 13 (24) ◽  
pp. 5100
Author(s):  
Teerapong Panboonyuen ◽  
Kulsawasd Jitkajornwanich ◽  
Siam Lawawirojwong ◽  
Panu Srestasathiern ◽  
Peerapon Vateekul

Transformers have demonstrated remarkable accomplishments in several natural language processing (NLP) tasks as well as image processing tasks. Herein, we present a deep-learning (DL) model that is capable of improving the semantic segmentation network in two ways. First, utilizing the pre-training Swin Transformer (SwinTF) under Vision Transformer (ViT) as a backbone, the model weights downstream tasks by joining task layers upon the pretrained encoder. Secondly, decoder designs are applied to our DL network with three decoder designs, U-Net, pyramid scene parsing (PSP) network, and feature pyramid network (FPN), to perform pixel-level segmentation. The results are compared with other image labeling state of the art (SOTA) methods, such as global convolutional network (GCN) and ViT. Extensive experiments show that our Swin Transformer (SwinTF) with decoder designs reached a new state of the art on the Thailand Isan Landsat-8 corpus (89.8% F1 score), Thailand North Landsat-8 corpus (63.12% F1 score), and competitive results on ISPRS Vaihingen. Moreover, both our best-proposed methods (SwinTF-PSP and SwinTF-FPN) even outperformed SwinTF with supervised pre-training ViT on the ImageNet-1K in the Thailand, Landsat-8, and ISPRS Vaihingen corpora.


2018 ◽  
Author(s):  
Rikifumi Ota ◽  
Takahiro Ide ◽  
Tatsuo Michiue

AbstractCell segmentation is crucial in the study of morphogenesis in developing embryos, but it is limited in its accuracy. In this study we provide a novel method for cell segmentation using machine-learning, termed Cell Segmenter using Machine Learning (CSML). CSML performed better than state-of-the-art methods, such as RACE and watershed, in the segmentation of ectodermal cells in the Xenopus embryo. CSML required only one whole embryo image for training a Fully Convolutional Network classifier, and it took 20 seconds per each image to return a segmented image. To validate its accuracy, we compared it to other methods in assessing several indicators of cell shape. We also examined the generality by measuring its performance in segmenting independent images. Our data demonstrates the superiority of CSML, and we expect this application to significantly improve efficiency in cell shape studies.


2019 ◽  
Author(s):  
Marcelo Pecenin ◽  
André Murbach Maidl ◽  
Daniel Weingaertner

Writing efficient image processing code is a very demanding task and much programming effort is put into porting existing code to new generations of hardware. Besides, the definition of what is an efficient code varies according to the desired optimization target, such as runtime, energy consumption or memory usage. We present a semi-automatic schedule generation system for the Halide DSL that uses a Reinforcement Learning agent to choose a set of scheduling options that optimizes the runtime of the resulting application. We compare our results to the state of the art implementations of three Halide pipelines and show that our agent is able to surpass hand-tuned code and Halide’s auto-scheduler on most scenarios for CPU and GPU architectures.


2018 ◽  
Vol 10 (12) ◽  
pp. 1984 ◽  
Author(s):  
Yangyang Li ◽  
Yanqiao Chen ◽  
Guangyuan Liu ◽  
Licheng Jiao

Polarimetric synthetic aperture radar (PolSAR) image classification has become more and more popular in recent years. As we all know, PolSAR image classification is actually a dense prediction problem. Fortunately, the recently proposed fully convolutional network (FCN) model can be used to solve the dense prediction problem, which means that FCN has great potential in PolSAR image classification. However, there are some problems to be solved in PolSAR image classification by FCN. Therefore, we propose sliding window fully convolutional network and sparse coding (SFCN-SC) for PolSAR image classification. The merit of our method is twofold: (1) Compared with convolutional neural network (CNN), SFCN-SC can avoid repeated calculation and memory occupation; (2) Sparse coding is used to reduce the computation burden and memory occupation, and meanwhile the image integrity can be maintained in the maximum extent. We use three PolSAR images to test the performance of SFCN-SC. Compared with several state-of-the-art methods, SFCN-SC achieves promising results in PolSAR image classification.


2019 ◽  
Vol 11 (3) ◽  
pp. 280 ◽  
Author(s):  
Yongyong Fu ◽  
Kunkun Liu ◽  
Zhangquan Shen ◽  
Jinsong Deng ◽  
Muye Gan ◽  
...  

Impervious surfaces play an important role in urban planning and sustainable environmental management. High-spatial-resolution (HSR) images containing pure pixels have significant potential for the detailed delineation of land surfaces. However, due to high intraclass variability and low interclass distance, the mapping and monitoring of impervious surfaces in complex town–rural areas using HSR images remains a challenge. The fully convolutional network (FCN) model, a variant of convolution neural networks (CNNs), recently achieved state-of-the-art performance in HSR image classification applications. However, due to the inherent nature of FCN processing, it is challenging for an FCN to precisely capture the detailed information of classification targets. To solve this problem, we propose an object-based deep CNN framework that integrates object-based image analysis (OBIA) with deep CNNs to accurately extract and estimate impervious surfaces. Specifically, we also adopted two widely used transfer learning technologies to expedite the training of deep CNNs. Finally, we compare our approach with conventional OBIA classification and state-of-the-art FCN-based methods, such as FCN-8s and the U-Net methods. Both of these FCN-based methods are well designed for pixel-wise classification applications and have achieved great success. Our results show that the proposed approach effectively identified impervious surfaces, with 93.9% overall accuracy. Compared with the existing methods, i.e., OBIA, FCN-8s and U-Net methods, it shows that our method achieves obviously improvement in accuracy. Our findings also suggest that the classification performance of our proposed method is related to training strategy, indicating that significantly higher accuracy can be achieved through transfer learning by fine-tuning rather than feature extraction. Our approach for the automatic extraction and mapping of impervious surfaces also lays a solid foundation for intelligent monitoring and the management of land use and land cover.


Sign in / Sign up

Export Citation Format

Share Document