Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013598 ◽

2019 ◽

Vol 33 ◽

pp. 3598-3605 ◽

Cited By ~ 6

Author(s):

Ryosuke Furuta ◽

Naoto Inoue ◽

Toshihiko Yamasaki

Keyword(s):

Image Processing ◽

Reinforcement Learning ◽

State Of The Art ◽

Great Success ◽

Local Color ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Effective Learning ◽

Pixel Value ◽

Problem Setting

This paper tackles a new problem setting: reinforcement learning with pixel-wise rewards (pixelRL) for image processing. After the introduction of the deep Q-network, deep RL has been achieving great success. However, the applications of deep RL for image processing are still limited. Therefore, we extend deep RL to pixelRL for various image processing applications. In pixelRL, each pixel has an agent, and the agent changes the pixel value by taking an action. We also propose an effective learning method for pixelRL that significantly improves the performance by considering not only the future states of the own pixel but also those of the neighbor pixels. The proposed method can be applied to some image processing tasks that require pixel-wise manipulations, where deep RL has never been applied.We apply the proposed method to three image processing tasks: image denoising, image restoration, and local color enhancement. Our experimental results demonstrate that the proposed method achieves comparable or better performance, compared with the state-of-the-art methods based on supervised learning.

Download Full-text

PixelRL: Fully Convolutional Network With Reinforcement Learning for Image Processing

IEEE Transactions on Multimedia ◽

10.1109/tmm.2019.2960636 ◽

2020 ◽

Vol 22 (7) ◽

pp. 1704-1719 ◽

Cited By ~ 1

Author(s):

Ryosuke Furuta ◽

Naoto Inoue ◽

Toshihiko Yamasaki

Keyword(s):

Image Processing ◽

Reinforcement Learning ◽

Convolutional Network ◽

Fully Convolutional Network

Download Full-text

Fully Convolutional Network for Consistent Voxel-Wise Correspondence

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6992 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12935-12942 ◽

Cited By ~ 1

Author(s):

Yungeng Zhang ◽

Yuru Pei ◽

Yuke Guo ◽

Gengyu Ma ◽

Tianmin Xu ◽

...

Keyword(s):

Vector Fields ◽

State Of The Art ◽

Displacement Vector ◽

Cone Beam Ct ◽

Deformable Registration ◽

Cone Beam ◽

Nonlinear Mapping ◽

Convolutional Network ◽

Mapping Functions ◽

Fully Convolutional Network

In this paper, we propose a fully convolutional network-based dense map from voxels to invertible pair of displacement vector fields regarding a template grid for the consistent voxel-wise correspondence. We parameterize the volumetric mapping using a convolutional network and train it in an unsupervised way by leveraging the spatial transformer to minimize the gap between the warped volumetric image and the template grid. Instead of learning the unidirectional map, we learn the nonlinear mapping functions for both forward and backward transformations. We introduce the combinational inverse constraints for the volumetric one-to-one maps, where the pairwise and triple constraints are utilized to learn the cycle-consistent correspondence maps between volumes. Experiments on both synthetic and clinically captured volumetric cone-beam CT (CBCT) images show that the proposed framework is effective and competitive against state-of-the-art deformable registration techniques.

Download Full-text

FMnet: Iris Segmentation and Recognition by Using Fully and Multi-Scale CNN for Biometric Security

Applied Sciences ◽

10.3390/app9102042 ◽

2019 ◽

Vol 9 (10) ◽

pp. 2042 ◽

Cited By ~ 6

Author(s):

Rachida Tobji ◽

Wu Di ◽

Naeem Ayoub

Keyword(s):

Neural Networks ◽

Iris Recognition ◽

State Of The Art ◽

Features Extraction ◽

Iris Segmentation ◽

Recognition Method ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Multi Scale ◽

Biometric Security

In Deep Learning, recent works show that neural networks have a high potential in the field of biometric security. The advantage of using this type of architecture, in addition to being robust, is that the network learns the characteristic vectors by creating intelligent filters in an automatic way, grace to the layers of convolution. In this paper, we propose an algorithm “FMnet” for iris recognition by using Fully Convolutional Network (FCN) and Multi-scale Convolutional Neural Network (MCNN). By taking into considerations the property of Convolutional Neural Networks to learn and work at different resolutions, our proposed iris recognition method overcomes the existing issues in the classical methods which only use handcrafted features extraction, by performing features extraction and classification together. Our proposed algorithm shows better classification results as compared to the other state-of-the-art iris recognition approaches.

Download Full-text

Real-time Arabic scene text detection using fully convolutional neural networks

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i2.pp1634-1640 ◽

2021 ◽

Vol 11 (2) ◽

pp. 1634

Author(s):

Rajae Moumen ◽

Raddouane Chiheb ◽

Rdouan Faizi

Keyword(s):

Real Time ◽

Data Augmentation ◽

State Of The Art ◽

Arabic Language ◽

Text Detection ◽

The State ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Scene Text Detection ◽

Scene Text

The aim of this research is to propose a fully convolutional approach to address the problem of real-time scene text detection for Arabic language. Text detection is performed using a two-steps multi-scale approach. The first step uses light-weighted fully convolutional network: TextBlockDetector FCN, an adaptation of VGG-16 to eliminate non-textual elements, localize wide scale text and give text scale estimation. The second step determines narrow scale range of text using fully convolutional network for maximum performance. To evaluate the system, we confront the results of the framework to the results obtained with single VGG-16 fully deployed for text detection in one-shot; in addition to previous results in the state-of-the-art. For training and testing, we initiate a dataset of 575 images manually processed along with data augmentation to enrich training process. The system scores a precision of 0.651 vs 0.64 in the state-of-the-art and a FPS of 24.3 vs 31.7 for a VGG-16 fully deployed.

Download Full-text

Stealthy and Efficient Adversarial Attacks against Deep Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6047 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5883-5891

Author(s):

Jianwen Sun ◽

Tianwei Zhang ◽

Xiaofei Xie ◽

Lei Ma ◽

Yan Zheng ◽

...

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Critical Point ◽

State Of The Art ◽

Great Success ◽

Severe Damage ◽

Minimal Set ◽

Adversarial Attack ◽

Attack Strategy ◽

Critical Moments

Adversarial attacks against conventional Deep Learning (DL) systems and algorithms have been widely studied, and various defenses were proposed. However, the possibility and feasibility of such attacks against Deep Reinforcement Learning (DRL) are less explored. As DRL has achieved great success in various complex tasks, designing effective adversarial attacks is an indispensable prerequisite towards building robust DRL algorithms. In this paper, we introduce two novel adversarial attack techniques to stealthily and efficiently attack the DRL agents. These two techniques enable an adversary to inject adversarial samples in a minimal set of critical moments while causing the most severe damage to the agent. The first technique is the critical point attack: the adversary builds a model to predict the future environmental states and agent's actions, assesses the damage of each possible attack strategy, and selects the optimal one. The second technique is the antagonist attack: the adversary automatically learns a domain-agnostic model to discover the critical moments of attacking the agent in an episode. Experimental results demonstrate the effectiveness of our techniques. Specifically, to successfully attack the DRL agent, our critical point technique only requires 1 (TORCS) or 2 (Atari Pong and Breakout) steps, and the antagonist technique needs fewer than 5 steps (4 Mujoco tasks), which are significant improvements over state-of-the-art methods.

Download Full-text

Transformer-Based Decoder Designs for Semantic Segmentation on Remotely Sensed Images

Remote Sensing ◽

10.3390/rs13245100 ◽

2021 ◽

Vol 13 (24) ◽

pp. 5100

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Image Processing ◽

Deep Learning ◽

Natural Language Processing ◽

Language Processing ◽

State Of The Art ◽

Semantic Segmentation ◽

Landsat 8 ◽

Convolutional Network ◽

Image Labeling ◽

Feature Pyramid

Transformers have demonstrated remarkable accomplishments in several natural language processing (NLP) tasks as well as image processing tasks. Herein, we present a deep-learning (DL) model that is capable of improving the semantic segmentation network in two ways. First, utilizing the pre-training Swin Transformer (SwinTF) under Vision Transformer (ViT) as a backbone, the model weights downstream tasks by joining task layers upon the pretrained encoder. Secondly, decoder designs are applied to our DL network with three decoder designs, U-Net, pyramid scene parsing (PSP) network, and feature pyramid network (FPN), to perform pixel-level segmentation. The results are compared with other image labeling state of the art (SOTA) methods, such as global convolutional network (GCN) and ViT. Extensive experiments show that our Swin Transformer (SwinTF) with decoder designs reached a new state of the art on the Thailand Isan Landsat-8 corpus (89.8% F1 score), Thailand North Landsat-8 corpus (63.12% F1 score), and competitive results on ISPRS Vaihingen. Moreover, both our best-proposed methods (SwinTF-PSP and SwinTF-FPN) even outperformed SwinTF with supervised pre-training ViT on the ImageNet-1K in the Thailand, Landsat-8, and ISPRS Vaihingen corpora.

Download Full-text

A novel cell segmentation method for developing embryos using machine learning

10.1101/288720 ◽

2018 ◽

Author(s):

Rikifumi Ota ◽

Takahiro Ide ◽

Tatsuo Michiue

Keyword(s):

Machine Learning ◽

Cell Shape ◽

State Of The Art ◽

Cell Segmentation ◽

Segmentation Method ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Segmented Image ◽

Novel Method ◽

Better Than

AbstractCell segmentation is crucial in the study of morphogenesis in developing embryos, but it is limited in its accuracy. In this study we provide a novel method for cell segmentation using machine-learning, termed Cell Segmenter using Machine Learning (CSML). CSML performed better than state-of-the-art methods, such as RACE and watershed, in the segmentation of ectodermal cells in the Xenopus embryo. CSML required only one whole embryo image for training a Fully Convolutional Network classifier, and it took 20 seconds per each image to return a segmented image. To validate its accuracy, we compared it to other methods in assessing several indicators of cell shape. We also examined the generality by measuring its performance in segmenting independent images. Our data demonstrates the superiority of CSML, and we expect this application to significantly improve efficiency in cell shape studies.

Download Full-text

Optimization of Halide Image Processing Schedules with Reinforcement Learning

10.5753/wscad.2019.8655 ◽

2019 ◽

Author(s):

Marcelo Pecenin ◽

André Murbach Maidl ◽

Daniel Weingaertner

Keyword(s):

Image Processing ◽

Reinforcement Learning ◽

State Of The Art ◽

Memory Usage ◽

Generation System ◽

Learning Agent ◽

Programming Effort ◽

Efficient Code ◽

Gpu Architectures ◽

Definition Of

Writing efficient image processing code is a very demanding task and much programming effort is put into porting existing code to new generations of hardware. Besides, the definition of what is an efficient code varies according to the desired optimization target, such as runtime, energy consumption or memory usage. We present a semi-automatic schedule generation system for the Halide DSL that uses a Reinforcement Learning agent to choose a set of scheduling options that optimizes the runtime of the resulting application. We compare our results to the state of the art implementations of three Halide pipelines and show that our agent is able to surpass hand-tuned code and Halide’s auto-scheduler on most scenarios for CPU and GPU architectures.

Download Full-text

A Novel Deep Fully Convolutional Network for PolSAR Image Classification

Remote Sensing ◽

10.3390/rs10121984 ◽

2018 ◽

Vol 10 (12) ◽

pp. 1984 ◽

Cited By ~ 10

Author(s):

Yangyang Li ◽

Yanqiao Chen ◽

Guangyuan Liu ◽

Licheng Jiao

Keyword(s):

Image Classification ◽

Sparse Coding ◽

State Of The Art ◽

Sliding Window ◽

Synthetic Aperture ◽

Prediction Problem ◽

Convolutional Network ◽

Polarimetric Synthetic Aperture Radar ◽

Fully Convolutional Network ◽

Image Integrity

Polarimetric synthetic aperture radar (PolSAR) image classification has become more and more popular in recent years. As we all know, PolSAR image classification is actually a dense prediction problem. Fortunately, the recently proposed fully convolutional network (FCN) model can be used to solve the dense prediction problem, which means that FCN has great potential in PolSAR image classification. However, there are some problems to be solved in PolSAR image classification by FCN. Therefore, we propose sliding window fully convolutional network and sparse coding (SFCN-SC) for PolSAR image classification. The merit of our method is twofold: (1) Compared with convolutional neural network (CNN), SFCN-SC can avoid repeated calculation and memory occupation; (2) Sparse coding is used to reduce the computation burden and memory occupation, and meanwhile the image integrity can be maintained in the maximum extent. We use three PolSAR images to test the performance of SFCN-SC. Compared with several state-of-the-art methods, SFCN-SC achieves promising results in PolSAR image classification.

Download Full-text

Mapping Impervious Surfaces in Town–Rural Transition Belts Using China’s GF-2 Imagery and Object-Based Deep CNNs

Remote Sensing ◽

10.3390/rs11030280 ◽

2019 ◽

Vol 11 (3) ◽

pp. 280 ◽

Cited By ~ 7

Author(s):

Yongyong Fu ◽

Kunkun Liu ◽

Zhangquan Shen ◽

Jinsong Deng ◽

Muye Gan ◽

...

Keyword(s):

Transfer Learning ◽

Rural Areas ◽

State Of The Art ◽

Classification Performance ◽

Learning Technologies ◽

Fine Tuning ◽

Great Success ◽

Impervious Surfaces ◽

Convolutional Network ◽

Object Based

Impervious surfaces play an important role in urban planning and sustainable environmental management. High-spatial-resolution (HSR) images containing pure pixels have significant potential for the detailed delineation of land surfaces. However, due to high intraclass variability and low interclass distance, the mapping and monitoring of impervious surfaces in complex town–rural areas using HSR images remains a challenge. The fully convolutional network (FCN) model, a variant of convolution neural networks (CNNs), recently achieved state-of-the-art performance in HSR image classification applications. However, due to the inherent nature of FCN processing, it is challenging for an FCN to precisely capture the detailed information of classification targets. To solve this problem, we propose an object-based deep CNN framework that integrates object-based image analysis (OBIA) with deep CNNs to accurately extract and estimate impervious surfaces. Specifically, we also adopted two widely used transfer learning technologies to expedite the training of deep CNNs. Finally, we compare our approach with conventional OBIA classification and state-of-the-art FCN-based methods, such as FCN-8s and the U-Net methods. Both of these FCN-based methods are well designed for pixel-wise classification applications and have achieved great success. Our results show that the proposed approach effectively identified impervious surfaces, with 93.9% overall accuracy. Compared with the existing methods, i.e., OBIA, FCN-8s and U-Net methods, it shows that our method achieves obviously improvement in accuracy. Our findings also suggest that the classification performance of our proposed method is related to training strategy, indicating that significantly higher accuracy can be achieved through transfer learning by fine-tuning rather than feature extraction. Our approach for the automatic extraction and mapping of impervious surfaces also lays a solid foundation for intelligent monitoring and the management of land use and land cover.

Download Full-text