Combined Reinforcement Learning via Abstract Representations

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013582 ◽

2019 ◽

Vol 33 ◽

pp. 3582-3589 ◽

Cited By ~ 4

Author(s):

Vincent Francois-Lavet ◽

Yoshua Bengio ◽

Doina Precup ◽

Joelle Pineau

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Transfer Learning ◽

Computationally Efficient ◽

Dimensional Representation ◽

Learning Methods ◽

Model Free ◽

Abstract Representations ◽

Low Dimensional ◽

New Strategies

In the quest for efficient and robust reinforcement learning methods, both model-free and model-based approaches offer advantages. In this paper we propose a new way of explicitly bridging both approaches via a shared low-dimensional learned encoding of the environment, meant to capture summarizing abstractions. We show that the modularity brought by this approach leads to good generalization while being computationally efficient, with planning happening in a smaller latent state space. In addition, this approach recovers a sufficient low-dimensional representation of the environment, which opens up new strategies for interpretable AI, exploration and transfer learning.

Download Full-text

Methods and Algorithms for Knowledge Reuse in Multiagent Reinforcement Learning

10.5753/ctd.2020.11360 ◽

2020 ◽

Author(s):

Felipe Leno Da Silva ◽

Anna Helena Reali Costa

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Learning Process ◽

Trial And Error ◽

Knowledge Reuse ◽

Previous Knowledge ◽

Learning Methods ◽

Types Of Knowledge ◽

Learning Agent ◽

Multiagent Reinforcement Learning

Reinforcement Learning (RL) is a powerful tool that has been used to solve increasingly complex tasks. RL operates through repeated interactions of the learning agent with the environment, via trial and error. However, this learning process is extremely slow, requiring many interactions. In this thesis, we leverage previous knowledge so as to accelerate learning in multiagent RL problems. We propose knowledge reuse both from previous tasks and from other agents. Several flexible methods are introduced so that each of these two types of knowledge reuse is possible. This thesis adds important steps towards more flexible and broadly applicable multiagent transfer learning methods.

Download Full-text

Generalized Representation Learning Methods for Deep Reinforcement Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/748 ◽

2020 ◽

Author(s):

Hanhua Zhu

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Representation Learning ◽

Learning Methods ◽

New Methods ◽

Compact State

Deep reinforcement learning (DRL) increases the successful applications of reinforcement learning (RL) techniques but also brings challenges such as low sample efficiency. In this work, I propose generalized representation learning methods to obtain compact state space suitable for RL from a raw observation state. I expect my new methods will increase sample efficiency of RL by understandable representations of state and therefore improve the performance of RL.

Download Full-text

Hierarchical Neuro-Fuzzy Systems Part II

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch121 ◽

2011 ◽

pp. 817-824

Author(s):

Marley Vellasco ◽

Marco Pacheco ◽

Karla Figueiredo ◽

Flavio Souza

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Learning Process ◽

Fuzzy Systems ◽

Space Partitioning ◽

Learning Methods ◽

New Class ◽

Neuro Fuzzy ◽

Binary Space Partitioning ◽

Priori Information

This paper describes a new class of neuro-fuzzy models, called Reinforcement Learning Hierarchical Neuro- Fuzzy Systems (RL-HNF). These models employ the BSP (Binary Space Partitioning) and Politree partitioning of the input space [Chrysanthou,1992] and have been developed in order to bypass traditional drawbacks of neuro-fuzzy systems: the reduced number of allowed inputs and the poor capacity to create their own structure and rules (ANFIS [Jang,1997], NEFCLASS [Kruse,1995] and FSOM [Vuorimaa,1994]). These new models, named Reinforcement Learning Hierarchical Neuro-Fuzzy BSP (RL-HNFB) and Reinforcement Learning Hierarchical Neuro-Fuzzy Politree (RL-HNFP), descend from the original HNFB that uses Binary Space Partitioning (see Hierarchical Neuro-Fuzzy Systems Part I). By using hierarchical partitioning, together with the Reinforcement Learning (RL) methodology, a new class of Neuro-Fuzzy Systems (SNF) was obtained, which executes, in addition to automatically learning its structure, the autonomous learning of the actions to be taken by an agent, dismissing a priori information (number of rules, fuzzy rules and sets) relative to the learning process. These characteristics represent an important differential when compared with existing intelligent agents learning systems, because in applications involving continuous environments and/or environments considered to be highly dimensional, the use of traditional Reinforcement Learning methods based on lookup tables (a table that stores value functions for a small or discrete state space) is no longer possible, since the state space becomes too large. This second part of hierarchical neuro-fuzzy systems focus on the use of reinforcement learning process. The first part presented HNFB models based on supervised learning methods. The RL-HNFB and RL-HNFP models were evaluated in a benchmark control application and a simulated Khepera robot environment with multiple obstacles.

Download Full-text

A novel state space representation for the solution of 2D-HP protein folding problem using reinforcement learning methods

Applied Soft Computing ◽

10.1016/j.asoc.2014.09.047 ◽

2015 ◽

Vol 26 ◽

pp. 213-223 ◽

Cited By ~ 3

Author(s):

Berat Doğan ◽

Tamer Ölmez

Keyword(s):

Protein Folding ◽

Reinforcement Learning ◽

State Space ◽

Space Representation ◽

Learning Methods ◽

State Space Representation

Download Full-text

Dictionary learning allows model-free pseudotime estimation of transcriptomic data

BMC Genomics ◽

10.1186/s12864-021-08276-9 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Mona Rams ◽

Tim O.F. Conrad

Keyword(s):

Dimension Reduction ◽

Dictionary Learning ◽

Real World ◽

Estimation Methods ◽

Dynamic Processes ◽

Dimensional Representation ◽

Transcriptomic Data ◽

Model Free ◽

Real World Datasets ◽

Low Dimensional

Abstract Background Pseudotime estimation from dynamic single-cell transcriptomic data enables characterisation and understanding of the underlying processes, for example developmental processes. Various pseudotime estimation methods have been proposed during the last years. Typically, these methods start with a dimension reduction step because the low-dimensional representation is usually easier to analyse. Approaches such as PCA, ICA or t-SNE belong to the most widely used methods for dimension reduction in pseudotime estimation methods. However, these methods usually make assumptions on the derived dimensions, which can result in important dataset properties being missed. In this paper, we suggest a new dictionary learning based approach, dynDLT, for dimension reduction and pseudotime estimation of dynamic transcriptomic data. Dictionary learning is a matrix factorisation approach that does not restrict the dependence of the derived dimensions. To evaluate the performance, we conduct a large simulation study and analyse 8 real-world datasets. Results The simulation studies reveal that firstly, dynDLT preserves the simulated patterns in low-dimension and the pseudotimes can be derived from the low-dimensional representation. Secondly, the results show that dynDLT is suitable for the detection of genes exhibiting the simulated dynamic patterns, thereby facilitating the interpretation of the compressed representation and thus the dynamic processes. For the real-world data analysis, we select datasets with samples that are taken at different time points throughout an experiment. The pseudotimes found by dynDLT have high correlations with the experimental times. We compare the results to other approaches used in pseudotime estimation, or those that are method-wise closely connected to dictionary learning: ICA, NMF, PCA, t-SNE, and UMAP. DynDLT has the best overall performance for the simulated and real-world datasets. Conclusions We introduce dynDLT, a method that is suitable for pseudotime estimation. Its main advantages are: (1) It presents a model-free approach, meaning that it does not restrict the dependence of the derived dimensions; (2) Genes that are relevant in the detected dynamic processes can be identified from the dictionary matrix; (3) By a restriction of the dictionary entries to positive values, the dictionary atoms are highly interpretable.

Download Full-text

Policy Optimization with Model-Based Explorations

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014675 ◽

2019 ◽

Vol 33 ◽

pp. 4675-4682 ◽

Cited By ~ 2

Author(s):

Feiyang Pan ◽

Qingpeng Cai ◽

An-Xiang Zeng ◽

Chun-Xiang Pan ◽

Qing Da ◽

...

Keyword(s):

Reinforcement Learning ◽

Optimization Method ◽

Monte Carlo Sampling ◽

New Technique ◽

Learning Methods ◽

Model Based ◽

Model Free ◽

Hand Model ◽

Target Values ◽

Policy Optimization

Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high sample complexity. On the other hand, model-based reinforcement learning methods that learn the transition dynamics are more sample efficient, but they often suffer from the bias of the transition estimation. How to make use of both model-based and model-free learning is a central problem in reinforcement learning.In this paper, we present a new technique to address the tradeoff between exploration and exploitation, which regards the difference between model-free and model-based estimations as a measure of exploration value. We apply this new technique to the PPO algorithm and arrive at a new policy optimization method, named Policy Optimization with Modelbased Explorations (POME). POME uses two components to predict the actions’ target values: a model-free one estimated by Monte-Carlo sampling and a model-based one which learns a transition model and predicts the value of the next state. POME adds the error of these two target estimations as the additional exploration value for each state-action pair, i.e, encourages the algorithm to explore the states with larger target errors which are hard to estimate. We compare POME with PPO on Atari 2600 games, and it shows that POME outperforms PPO on 33 games out of 49 games.

Download Full-text

Dynamic tuning of PI-controllers based on model-free Reinforcement Learning methods

ICCAS 2010 ◽

10.1109/iccas.2010.5669655 ◽

2010 ◽

Author(s):

Lena Abbasi Brujeni ◽

Jong Min Lee ◽

Sirish L. Shah

Keyword(s):

Reinforcement Learning ◽

Learning Methods ◽

Dynamic Tuning ◽

Model Free ◽

Pi Controllers

Download Full-text

Task complexity interacts with state-space uncertainty in the arbitration process between model-based and model-free reinforcement-learning at both behavioral and neural levels

10.1101/393983 ◽

2018 ◽

Cited By ~ 2

Author(s):

Dongjae Kim ◽

Geon Yeong Park ◽

John P. O’Doherty ◽

Sang Wan Lee

Keyword(s):

Prefrontal Cortex ◽

Reinforcement Learning ◽

State Space ◽

Task Complexity ◽

Model Based ◽

Model Free ◽

Arbitration Process ◽

Two Systems ◽

The Brain

SUMMARYA major open question concerns how the brain governs the allocation of control between two distinct strategies for learning from reinforcement: model-based and model-free reinforcement learning. While there is evidence to suggest that the reliability of the predictions of the two systems is a key variable responsible for the arbitration process, another key variable has remained relatively unexplored: the role of task complexity. By using a combination of novel task design, computational modeling, and model-based fMRI analysis, we examined the role of task complexity alongside state-space uncertainty in the arbitration process between model-based and model-free RL. We found evidence to suggest that task complexity plays a role in influencing the arbitration process alongside state-space uncertainty. Participants tended to increase model-based RL control in response to increasing task complexity. However, they resorted to model-free RL when both uncertainty and task complexity were high, suggesting that these two variables interact during the arbitration process. Computational fMRI revealed that task complexity interacts with neural representations of the reliability of the two systems in the inferior prefrontal cortex bilaterally. These findings provide insight into how the inferior prefrontal cortex negotiates the trade-off between model-based and model-free RL in the presence of uncertainty and complexity, and more generally, illustrates how the brain resolves uncertainty and complexity in dynamically changing environments.SUMMARY OF FINDINGS- Elucidated the role of state-space uncertainty and complexity in model-based and model-free RL.- Found behavioral and neural evidence for complexity-sensitive prefrontal arbitration.- High task complexity induces explorative model-based RL.

Download Full-text

Learning a Low-Dimensional Representation of a Safe Region for Safe Reinforcement Learning on Dynamical Systems

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3106818 ◽

2021 ◽

pp. 1-15

Author(s):

Zhehua Zhou ◽

Ozgur S. Oguz ◽

Marion Leibold ◽

Martin Buss

Keyword(s):

Reinforcement Learning ◽

Dynamical Systems ◽

Dimensional Representation ◽

Safe Region ◽

Low Dimensional

Download Full-text

Graph Laplacian Based Transfer Learning Methods in Reinforcement Learning

Autonomous Agents ◽

10.5772/9654 ◽

2010 ◽

Author(s):

Yi-Ting Tsao ◽

Ke-Ting Xiao ◽

Von-Wun Soo ◽

Chung-Cheng Chiu

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Graph Laplacian ◽

Learning Methods

Download Full-text