Towards High-Level Intrinsic Exploration in Reinforcement Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/733 ◽

2020 ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

State Of The Art ◽

Experimental Results ◽

Prior Work ◽

Extrinsic Rewards ◽

Intrinsic Reward ◽

Long Time ◽

End To End ◽

High Level

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.

Download Full-text

Fast and slow curiosity for high-level exploration in reinforcement learning

Applied Intelligence ◽

10.1007/s10489-020-01849-3 ◽

2020 ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Open Problem ◽

Time Horizon ◽

Experimental Results ◽

Time Horizons ◽

Designed Environment ◽

Long Time ◽

Efficient Exploration ◽

High Level

Abstract Deep reinforcement learning (DRL) algorithms rely on carefully designed environment rewards that are extrinsic to the agent. However, in many real-world scenarios rewards are sparse or delayed, motivating the need for discovering efficient exploration strategies. While intrinsically motivated agents hold promise of better local exploration, solving problems that require coordinated decisions over long-time horizons remains an open problem. We postulate that to discover such strategies, a DRL agent should be able to combine local and high-level exploration behaviors. To this end, we introduce the concept of fast and slow curiosity that aims to incentivize long-time horizon exploration. Our method decomposes the curiosity bonus into a fast reward that deals with local exploration and a slow reward that encourages global exploration. We formulate this bonus as the error in an agent’s ability to reconstruct the observations given their contexts. We further propose to dynamically weight local and high-level strategies by measuring state diversity. We evaluate our method on a variety of benchmark environments, including Minigrid, Super Mario Bros, and Atari games. Experimental results show that our agent outperforms prior approaches in most tasks in terms of exploration efficiency and mean scores.

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text

An End-To-End Model for Pipe Crack Three-Dimensional Visualization Based on a Cascade Neural Network

Applied Sciences ◽

10.3390/app10041290 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1290

Author(s):

Xia Fang ◽

Yang Wang ◽

Yong Li ◽

Jie Wang ◽

Libin Zhou

Keyword(s):

Neural Network ◽

Crack Detection ◽

Three Dimensional ◽

Side Wall ◽

Long Time ◽

End To End ◽

Cascade Neural Network ◽

Ambiguous Data ◽

High Level ◽

Navigation Equipment

With the continuous progress of machine vision technology, crack detection in pipelines has been greatly improved. For crack detection in deep holes, inner tubes, and other environments, it is not only necessary to detect the existence of cracks, but also to collect important information regarding the crack detection direction for further analysis. Because shooting with a frontal field of view causes the real side wall images to produce certain distortions, the detection and calibration of cracks requires a certain amount of professional technology and time. It usually takes a long time to collect the image to eliminate the distortion, and then to identify the crack and mark the direction according to the data line. Therefore, a simple and efficient end-to-end neural network model for crack recognition and three-dimensional visualization are proposed by using a cascade network and simple recognition technology in conjunction with inertial navigation equipment. In addition, we screen the crack data via pixel calibration and eliminate the ambiguous data to make the visualization more accurate. Experiments in pipelines and burrows show that the accuracy, performance, and efficiency of the proposed method reached a high level.

Download Full-text

A new joint CTC-attention-based speech recognition model with multi-level multi-head attention

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-019-0161-0 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 2

Author(s):

Chu-Xiong Qin ◽

Wen-Lin Zhang ◽

Dan Qu

Keyword(s):

Speech Recognition ◽

Nonnegative Matrix Factorization ◽

State Of The Art ◽

Nonnegative Matrix ◽

Attention Mechanism ◽

Word Error Rate ◽

Absolute Value ◽

Multi Level ◽

End To End ◽

High Level

Abstract A method called joint connectionist temporal classification (CTC)-attention-based speech recognition has recently received increasing focus and has achieved impressive performance. A hybrid end-to-end architecture that adds an extra CTC loss to the attention-based model could force extra restrictions on alignments. To explore better the end-to-end models, we propose improvements to the feature extraction and attention mechanism. First, we introduce a joint model trained with nonnegative matrix factorization (NMF)-based high-level features. Then, we put forward a hybrid attention mechanism by incorporating multi-head attentions and calculating attention scores over multi-level outputs. Experiments on TIMIT indicate that the new method achieves state-of-the-art performance with our best model. Experiments on WSJ show that our method exhibits a word error rate (WER) that is only 0.2% worse in absolute value than the best referenced method, which is trained on a much larger dataset, and it beats all present end-to-end methods. Further experiments on LibriSpeech show that our method is also comparable to the state-of-the-art end-to-end system in WER.

Download Full-text

Deriving Subgoals Autonomously to Accelerate Learning in Sparse Reward Domains

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301881 ◽

2019 ◽

Vol 33 ◽

pp. 881-889 ◽

Cited By ~ 1

Author(s):

Michael Dann ◽

Fabio Zambetta ◽

John Thangarajah

Keyword(s):

Reinforcement Learning ◽

Domain Knowledge ◽

State Of The Art ◽

Significant Challenge ◽

Intrinsic Reward ◽

Art Methods ◽

Efficient Exploration

Sparse reward games, such as the infamous Montezuma’s Revenge, pose a significant challenge for Reinforcement Learning (RL) agents. Hierarchical RL, which promotes efficient exploration via subgoals, has shown promise in these games. However, existing agents rely either on human domain knowledge or slow autonomous methods to derive suitable subgoals. In this work, we describe a new, autonomous approach for deriving subgoals from raw pixels that is more efficient than competing methods. We propose a novel intrinsic reward scheme for exploiting the derived subgoals, applying it to three Atari games with sparse rewards. Our agent’s performance is comparable to that of state-of-the-art methods, demonstrating the usefulness of the subgoals found.

Download Full-text

Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016722 ◽

2019 ◽

Vol 33 ◽

pp. 6722-6729 ◽

Cited By ~ 4

Author(s):

Ziming Li ◽

Julia Kiseleva ◽

Maarten De Rijke

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Imitation Learning ◽

Local Optimum ◽

Inverse Reinforcement Learning ◽

High Quality ◽

Overall Performance

The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.

Download Full-text

Scene text removal via cascaded text stroke detection and erasing

Computational Visual Media ◽

10.1007/s41095-021-0242-8 ◽

2021 ◽

Vol 8 (2) ◽

pp. 273-287

Author(s):

Xuewei Bian ◽

Chaoqun Wang ◽

Weize Quan ◽

Juntao Ye ◽

Xiaopeng Zhang ◽

...

Keyword(s):

Performance Improvement ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Processing Unit ◽

Final Model ◽

Scene Text ◽

End To End

AbstractRecent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.

Download Full-text

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018465 ◽

2019 ◽

Vol 33 ◽

pp. 8465-8472 ◽

Cited By ~ 8

Author(s):

Qiuyuan Huang ◽

Zhe Gan ◽

Asli Celikyilmaz ◽

Dapeng Wu ◽

Jianfeng Wang ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Approach ◽

Semantic Concept ◽

Sentence Generation ◽

Visual Storytelling ◽

Empirical Results ◽

Low Level ◽

Story Generation ◽

End To End ◽

High Level

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task. Within our framework, the task of generating a story given a sequence of images is divided across a two-level hierarchical decoder. The high-level decoder constructs a plan by generating a semantic concept (i.e., topic) for each image in sequence. The low-level decoder generates a sentence for each image using a semantic compositional network, which effectively grounds the sentence generation conditioned on the topic. The two decoders are jointly trained end-to-end using reinforcement learning. We evaluate our model on the visual storytelling (VIST) dataset. Empirical results from both automatic and human evaluations demonstrate that the proposed hierarchically structured reinforced training achieves significantly better performance compared to a strong flat deep reinforcement learning baseline.

Download Full-text

A Topic-Aware Reinforced Model for Weakly Supervised Stance Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017249 ◽

2019 ◽

Vol 33 ◽

pp. 7249-7256

Author(s):

Penghui Wei ◽

Wenji Mao ◽

Guandan Chen

Keyword(s):

Reinforcement Learning ◽

Opinion Mining ◽

State Of The Art ◽

Public Attitudes ◽

Representation Learning ◽

Experimental Results ◽

Training Data ◽

Policy Network ◽

Proposed Model ◽

Weakly Supervised

Analyzing public attitudes plays an important role in opinion mining systems. Stance detection aims to determine from a text whether its author is in favor of, against, or neutral towards a given target. One challenge of this task is that a text may not explicitly express an attitude towards the target, but existing approaches utilize target content alone to build models. Moreover, although weakly supervised approaches have been proposed to ease the burden of manually annotating largescale training data, such approaches are confronted with noisy labeling problem. To address the above two issues, in this paper, we propose a Topic-Aware Reinforced Model (TARM) for weakly supervised stance detection. Our model consists of two complementary components: (1) a detection network that incorporates target-related topic information into representation learning for identifying stance effectively; (2) a policy network that learns to eliminate noisy instances from auto-labeled data based on off-policy reinforcement learning. Two networks are alternately optimized to improve each other’s performances. Experimental results demonstrate that our proposed model TARM outperforms the state-of-the-art approaches.

Download Full-text

AAR-CNNs: Auto Adaptive Regularized Convolutional Neural Networks

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/348 ◽

2018 ◽

Author(s):

Yao Lu ◽

Guangming Lu ◽

Yuanrong Xu ◽

Bob Zhang

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Experimental Results ◽

Training Phase ◽

Low Resolution ◽

Adaptive Regularization ◽

End To End ◽

Overfitting Problem

In order to address the overfitting problem caused by the small or simple training datasets and the large model’s size in Convolutional Neural Networks (CNNs), a novel Auto Adaptive Regularization (AAR) method is proposed in this paper. The relevant networks can be called AAR-CNNs. AAR is the first method using the “abstraction extent” (predicted by AE net) and a tiny learnable module (SE net) to auto adaptively predict more accurate and individualized regularization information. The AAR module can be directly inserted into every stage of any popular networks and trained end to end to improve the networks’ flexibility. This method can not only regularize the network at both the forward and the backward processes in the training phase, but also regularize the network on a more refined level (channel or pixel level) depending on the abstraction extent’s form. Comparative experiments are performed on low resolution ImageNet, CIFAR and SVHN datasets. Experimental results show that the AAR-CNNs can achieve state-of-the-art performances on these datasets.

Download Full-text