scholarly journals Towards High-Level Intrinsic Exploration in Reinforcement Learning

Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.

Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Abstract Deep reinforcement learning (DRL) algorithms rely on carefully designed environment rewards that are extrinsic to the agent. However, in many real-world scenarios rewards are sparse or delayed, motivating the need for discovering efficient exploration strategies. While intrinsically motivated agents hold promise of better local exploration, solving problems that require coordinated decisions over long-time horizons remains an open problem. We postulate that to discover such strategies, a DRL agent should be able to combine local and high-level exploration behaviors. To this end, we introduce the concept of fast and slow curiosity that aims to incentivize long-time horizon exploration. Our method decomposes the curiosity bonus into a fast reward that deals with local exploration and a slow reward that encourages global exploration. We formulate this bonus as the error in an agent’s ability to reconstruct the observations given their contexts. We further propose to dynamically weight local and high-level strategies by measuring state diversity. We evaluate our method on a variety of benchmark environments, including Minigrid, Super Mario Bros, and Atari games. Experimental results show that our agent outperforms prior approaches in most tasks in terms of exploration efficiency and mean scores.


2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


2020 ◽  
Vol 10 (4) ◽  
pp. 1290
Author(s):  
Xia Fang ◽  
Yang Wang ◽  
Yong Li ◽  
Jie Wang ◽  
Libin Zhou

With the continuous progress of machine vision technology, crack detection in pipelines has been greatly improved. For crack detection in deep holes, inner tubes, and other environments, it is not only necessary to detect the existence of cracks, but also to collect important information regarding the crack detection direction for further analysis. Because shooting with a frontal field of view causes the real side wall images to produce certain distortions, the detection and calibration of cracks requires a certain amount of professional technology and time. It usually takes a long time to collect the image to eliminate the distortion, and then to identify the crack and mark the direction according to the data line. Therefore, a simple and efficient end-to-end neural network model for crack recognition and three-dimensional visualization are proposed by using a cascade network and simple recognition technology in conjunction with inertial navigation equipment. In addition, we screen the crack data via pixel calibration and eliminate the ambiguous data to make the visualization more accurate. Experiments in pipelines and burrows show that the accuracy, performance, and efficiency of the proposed method reached a high level.


Author(s):  
Chu-Xiong Qin ◽  
Wen-Lin Zhang ◽  
Dan Qu

Abstract A method called joint connectionist temporal classification (CTC)-attention-based speech recognition has recently received increasing focus and has achieved impressive performance. A hybrid end-to-end architecture that adds an extra CTC loss to the attention-based model could force extra restrictions on alignments. To explore better the end-to-end models, we propose improvements to the feature extraction and attention mechanism. First, we introduce a joint model trained with nonnegative matrix factorization (NMF)-based high-level features. Then, we put forward a hybrid attention mechanism by incorporating multi-head attentions and calculating attention scores over multi-level outputs. Experiments on TIMIT indicate that the new method achieves state-of-the-art performance with our best model. Experiments on WSJ show that our method exhibits a word error rate (WER) that is only 0.2% worse in absolute value than the best referenced method, which is trained on a much larger dataset, and it beats all present end-to-end methods. Further experiments on LibriSpeech show that our method is also comparable to the state-of-the-art end-to-end system in WER.


Author(s):  
Michael Dann ◽  
Fabio Zambetta ◽  
John Thangarajah

Sparse reward games, such as the infamous Montezuma’s Revenge, pose a significant challenge for Reinforcement Learning (RL) agents. Hierarchical RL, which promotes efficient exploration via subgoals, has shown promise in these games. However, existing agents rely either on human domain knowledge or slow autonomous methods to derive suitable subgoals. In this work, we describe a new, autonomous approach for deriving subgoals from raw pixels that is more efficient than competing methods. We propose a novel intrinsic reward scheme for exploiting the derived subgoals, applying it to three Atari games with sparse rewards. Our agent’s performance is comparable to that of state-of-the-art methods, demonstrating the usefulness of the subgoals found.


Author(s):  
Ziming Li ◽  
Julia Kiseleva ◽  
Maarten De Rijke

The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.


2021 ◽  
Vol 8 (2) ◽  
pp. 273-287
Author(s):  
Xuewei Bian ◽  
Chaoqun Wang ◽  
Weize Quan ◽  
Juntao Ye ◽  
Xiaopeng Zhang ◽  
...  

AbstractRecent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.


Author(s):  
Qiuyuan Huang ◽  
Zhe Gan ◽  
Asli Celikyilmaz ◽  
Dapeng Wu ◽  
Jianfeng Wang ◽  
...  

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task. Within our framework, the task of generating a story given a sequence of images is divided across a two-level hierarchical decoder. The high-level decoder constructs a plan by generating a semantic concept (i.e., topic) for each image in sequence. The low-level decoder generates a sentence for each image using a semantic compositional network, which effectively grounds the sentence generation conditioned on the topic. The two decoders are jointly trained end-to-end using reinforcement learning. We evaluate our model on the visual storytelling (VIST) dataset. Empirical results from both automatic and human evaluations demonstrate that the proposed hierarchically structured reinforced training achieves significantly better performance compared to a strong flat deep reinforcement learning baseline.


Author(s):  
Penghui Wei ◽  
Wenji Mao ◽  
Guandan Chen

Analyzing public attitudes plays an important role in opinion mining systems. Stance detection aims to determine from a text whether its author is in favor of, against, or neutral towards a given target. One challenge of this task is that a text may not explicitly express an attitude towards the target, but existing approaches utilize target content alone to build models. Moreover, although weakly supervised approaches have been proposed to ease the burden of manually annotating largescale training data, such approaches are confronted with noisy labeling problem. To address the above two issues, in this paper, we propose a Topic-Aware Reinforced Model (TARM) for weakly supervised stance detection. Our model consists of two complementary components: (1) a detection network that incorporates target-related topic information into representation learning for identifying stance effectively; (2) a policy network that learns to eliminate noisy instances from auto-labeled data based on off-policy reinforcement learning. Two networks are alternately optimized to improve each other’s performances. Experimental results demonstrate that our proposed model TARM outperforms the state-of-the-art approaches.


Author(s):  
Yao Lu ◽  
Guangming Lu ◽  
Yuanrong Xu ◽  
Bob Zhang

In order to address the overfitting problem caused by the small or simple training datasets and the large model’s size in Convolutional Neural Networks (CNNs), a novel Auto Adaptive Regularization (AAR) method is proposed in this paper. The relevant networks can be called AAR-CNNs. AAR is the first method using the “abstraction extent” (predicted by AE net) and a tiny learnable module (SE net) to auto adaptively predict more accurate and individualized regularization information. The AAR module can be directly inserted into every stage of any popular networks and trained end to end to improve the networks’ flexibility. This method can not only regularize the network at both the forward and the backward processes in the training phase, but also regularize the network on a more refined level (channel or pixel level) depending on the abstraction extent’s form. Comparative experiments are performed on low resolution ImageNet, CIFAR and SVHN datasets. Experimental results show that the AAR-CNNs can achieve state-of-the-art performances on these datasets.


Sign in / Sign up

Export Citation Format

Share Document