scholarly journals Deriving Subgoals Autonomously to Accelerate Learning in Sparse Reward Domains

Author(s):  
Michael Dann ◽  
Fabio Zambetta ◽  
John Thangarajah

Sparse reward games, such as the infamous Montezuma’s Revenge, pose a significant challenge for Reinforcement Learning (RL) agents. Hierarchical RL, which promotes efficient exploration via subgoals, has shown promise in these games. However, existing agents rely either on human domain knowledge or slow autonomous methods to derive suitable subgoals. In this work, we describe a new, autonomous approach for deriving subgoals from raw pixels that is more efficient than competing methods. We propose a novel intrinsic reward scheme for exploiting the derived subgoals, applying it to three Atari games with sparse rewards. Our agent’s performance is comparable to that of state-of-the-art methods, demonstrating the usefulness of the subgoals found.

Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.


2019 ◽  
Vol 63 (7) ◽  
pp. 995-1003
Author(s):  
Z Xu ◽  
L Cao ◽  
X Chen

Abstract Simple and efficient exploration remains a core challenge in deep reinforcement learning. While many exploration methods can be applied to high-dimensional tasks, these methods manually adjust exploration parameters according to domain knowledge. This paper proposes a novel method that can automatically balance exploration and exploitation, as well as combine on-policy and off-policy update targets through a dynamic weighted way based on value difference. The proposed method does not directly affect the probability of a selected action but utilizes the value difference produced during the learning process to adjust update target for guiding the direction of agent’s learning. We demonstrate the performance of the proposed method on CartPole-v1, MountainCar-v0, and LunarLander-v2 classic control tasks from the OpenAI Gym. Empirical evaluation results show that by integrating on-policy and off-policy update targets dynamically, this method exhibits superior performance and stability than does the exclusive use of the update target.


2019 ◽  
Vol 109 (3) ◽  
pp. 493-512 ◽  
Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Abstract Reinforcement learning methods rely on rewards provided by the environment that are extrinsic to the agent. However, many real-world scenarios involve sparse or delayed rewards. In such cases, the agent can develop its own intrinsic reward function called curiosity to enable the agent to explore its environment in the quest of new skills. We propose a novel end-to-end curiosity mechanism for deep reinforcement learning methods, that allows an agent to gradually acquire new skills. Our method scales to high-dimensional problems, avoids the need of directly predicting the future, and, can perform in sequential decision scenarios. We formulate the curiosity as the ability of the agent to predict its own knowledge about the task. We base the prediction on the idea of skill learning to incentivize the discovery of new skills, and guide exploration towards promising solutions. To further improve data efficiency and generalization of the agent, we propose to learn a latent representation of the skills. We present a variety of sparse reward tasks in MiniGrid, MuJoCo, and Atari games. We compare the performance of an augmented agent that uses our curiosity reward to state-of-the-art learners. Experimental evaluation exhibits higher performance compared to reinforcement learning models that only learn by maximizing extrinsic rewards.


Author(s):  
Jinwei Qi ◽  
Yuxin Peng

The inconsistent distribution and representation of image and text make it quite challenging to measure their similarity, and construct correlation between them. Inspired by neural machine translation to establish a corresponding relationship between two entirely different languages, we attempt to treat images as a special kind of language to provide visual descriptions, so that translation can be conduct between bilingual pair of image and text to effectively explore cross-modal correlation. Thus, we propose Cross-modal Bidirectional Translation (CBT) approach, and further explore the utilization of reinforcement learning to improve the translation process. First, a cross-modal translation mechanism is proposed, where image and text are treated as bilingual pairs, and cross-modal correlation can be effectively captured in both feature spaces of image and text by bidirectional translation training. Second, cross-modal reinforcement learning is proposed to perform a bidirectional game between image and text, which is played as a round to promote the bidirectional translation process. Besides, both inter-modality and intra-modality reward signals can be extracted to provide complementary clues for boosting cross-modal correlation learning. Experiments are conducted to verify the performance of our proposed approach on cross-modal retrieval, compared with 11 state-of-the-art methods on 3 datasets.


Author(s):  
Mathieu Seurin ◽  
Florian Strub ◽  
Philippe Preux ◽  
Olivier Pietquin

Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize. Intrinsic motivation guidances have thus been developed toward alleviating the resulting exploration problem. They usually incentivize agents to look for new states through novelty signals. Yet, such methods encourage exhaustive exploration of the state space rather than focusing on the environment's salient interaction opportunities. We propose a new exploration method, called Don't Do What Doesn't Matter (DoWhaM), shifting the emphasis from state novelty to state with relevant actions. While most actions consistently change the state when used, e.g. moving the agent, some actions are only effective in specific states, e.g., opening a door, grabbing an object. DoWhaM detects and rewards actions that seldom affect the environment. We evaluate DoWhaM on the procedurally-generated environment MiniGrid against state-of-the-art methods. Experiments consistently show that DoWhaM greatly reduces sample complexity, installing the new state-of-the-art in MiniGrid.


2018 ◽  
Vol 7 (4) ◽  
pp. 603-622 ◽  
Author(s):  
Leonardo Gutiérrez-Gómez ◽  
Jean-Charles Delvenne

Abstract Several social, medical, engineering and biological challenges rely on discovering the functionality of networks from their structure and node metadata, when it is available. For example, in chemoinformatics one might want to detect whether a molecule is toxic based on structure and atomic types, or discover the research field of a scientific collaboration network. Existing techniques rely on counting or measuring structural patterns that are known to show large variations from network to network, such as the number of triangles, or the assortativity of node metadata. We introduce the concept of multi-hop assortativity, that captures the similarity of the nodes situated at the extremities of a randomly selected path of a given length. We show that multi-hop assortativity unifies various existing concepts and offers a versatile family of ‘fingerprints’ to characterize networks. These fingerprints allow in turn to recover the functionalities of a network, with the help of the machine learning toolbox. Our method is evaluated empirically on established social and chemoinformatic network benchmarks. Results reveal that our assortativity based features are competitive providing highly accurate results often outperforming state of the art methods for the network classification task.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 325
Author(s):  
Zhihao Wu ◽  
Baopeng Zhang ◽  
Tianchen Zhou ◽  
Yan Li ◽  
Jianping Fan

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.


2021 ◽  
Vol 11 (8) ◽  
pp. 3636
Author(s):  
Faria Zarin Subah ◽  
Kaushik Deb ◽  
Pranab Kumar Dhar ◽  
Takeshi Koshiba

Autism spectrum disorder (ASD) is a complex and degenerative neuro-developmental disorder. Most of the existing methods utilize functional magnetic resonance imaging (fMRI) to detect ASD with a very limited dataset which provides high accuracy but results in poor generalization. To overcome this limitation and to enhance the performance of the automated autism diagnosis model, in this paper, we propose an ASD detection model using functional connectivity features of resting-state fMRI data. Our proposed model utilizes two commonly used brain atlases, Craddock 200 (CC200) and Automated Anatomical Labelling (AAL), and two rarely used atlases Bootstrap Analysis of Stable Clusters (BASC) and Power. A deep neural network (DNN) classifier is used to perform the classification task. Simulation results indicate that the proposed model outperforms state-of-the-art methods in terms of accuracy. The mean accuracy of the proposed model was 88%, whereas the mean accuracy of the state-of-the-art methods ranged from 67% to 85%. The sensitivity, F1-score, and area under receiver operating characteristic curve (AUC) score of the proposed model were 90%, 87%, and 96%, respectively. Comparative analysis on various scoring strategies show the superiority of BASC atlas over other aforementioned atlases in classifying ASD and control.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Iram Tazim Hoque ◽  
Nabil Ibtehaz ◽  
Saumitra Chakravarty ◽  
M. Saifur Rahman ◽  
M. Sohel Rahman

Abstract Background Segmentation of nuclei in cervical cytology pap smear images is a crucial stage in automated cervical cancer screening. The task itself is challenging due to the presence of cervical cells with spurious edges, overlapping cells, neutrophils, and artifacts. Methods After the initial preprocessing steps of adaptive thresholding, in our approach, the image passes through a convolution filter to filter out some noise. Then, contours from the resultant image are filtered by their distinctive contour properties followed by a nucleus size recovery procedure based on contour average intensity value. Results We evaluate our method on a public (benchmark) dataset collected from ISBI and also a private real dataset. The results show that our algorithm outperforms other state-of-the-art methods in nucleus segmentation on the ISBI dataset with a precision of 0.978 and recall of 0.933. A promising precision of 0.770 and a formidable recall of 0.886 on the private real dataset indicate that our algorithm can effectively detect and segment nuclei on real cervical cytology images. Tuning various parameters, the precision could be increased to as high as 0.949 with an acceptable decrease of recall to 0.759. Our method also managed an Aggregated Jaccard Index of 0.681 outperforming other state-of-the-art methods on the real dataset. Conclusion We have proposed a contour property-based approach for segmentation of nuclei. Our algorithm has several tunable parameters and is flexible enough to adapt to real practical scenarios and requirements.


Author(s):  
Matteo Chiara ◽  
Federico Zambelli ◽  
Marco Antonio Tangaro ◽  
Pietro Mandreoli ◽  
David S Horner ◽  
...  

Abstract Summary While over 200 000 genomic sequences are currently available through dedicated repositories, ad hoc methods for the functional annotation of SARS-CoV-2 genomes do not harness all currently available resources for the annotation of functionally relevant genomic sites. Here, we present CorGAT, a novel tool for the functional annotation of SARS-CoV-2 genomic variants. By comparisons with other state of the art methods we demonstrate that, by providing a more comprehensive and rich annotation, our method can facilitate the identification of evolutionary patterns in the genome of SARS-CoV-2. Availabilityand implementation Galaxy   http://corgat.cloud.ba.infn.it/galaxy; software: https://github.com/matteo14c/CorGAT/tree/Revision_V1; docker: https://hub.docker.com/r/laniakeacloud/galaxy_corgat. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document