Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments

We introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state. Our \phi-pseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The \phi-Exploration-Bonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks.

Download Full-text

Multiagent reinforcement learning with the partly high-dimensional state space

Systems and Computers in Japan ◽

10.1002/scj.20526 ◽

2006 ◽

Vol 37 (9) ◽

pp. 22-31 ◽

Cited By ~ 3

Author(s):

Kazuyuki Fujita ◽

Hiroshi Matsuo

Keyword(s):

Reinforcement Learning ◽

State Space ◽

High Dimensional ◽

Multiagent Reinforcement Learning ◽

Dimensional State Space

Download Full-text

Successor Options: An Option Discovery Framework for Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/458 ◽

2019 ◽

Author(s):

Rahul Ramesh ◽

Manan Tomar ◽

Balaraman Ravindran

Keyword(s):

Reinforcement Learning ◽

State Space ◽

High Dimensional ◽

Control Environment ◽

Robotic Control ◽

Learning Models ◽

Relative Ease ◽

Complementary Approach ◽

Extended Sequence ◽

Reinforcement Learning Models

The options framework in reinforcement learning models the notion of a skill or a temporally extended sequence of actions. The discovery of a reusable set of skills has typically entailed building options, that navigate to bottleneck states. In this work, we instead adopt a complementary approach, where we attempt to discover options that navigate to landmark states. These states are prototypical representatives of well-connected regions and can hence access the associated region with relative ease. In this work, we propose Successor Options, which leverages Successor representations to build a model of the state space. The intra-option policies are learnt using a novel pseudo-reward and the model scales to high-dimensional spaces since it does not construct an explicit graph of the entire state space. Additionally, we also propose an Incremental Successor Options model that iterates between constructing Successor representations and building options, which is useful when robust Successor representations cannot be built solely from primitive actions. We demonstrate the efficacy of our approach on a collection of grid-worlds, and on the high-dimensional robotic control environment of Fetch.

Download Full-text

A State Space Compression Method Based on Multivariate Analysis for Reinforcement Learning in High-Dimensional Continuous State Spaces

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1093/ietfec/e89-a.8.2181 ◽

2006 ◽

Vol E89-A (8) ◽

pp. 2181-2191 ◽

Cited By ~ 2

Author(s):

H. SATOH

Keyword(s):

Multivariate Analysis ◽

Reinforcement Learning ◽

State Space ◽

High Dimensional ◽

Compression Method ◽

State Spaces ◽

Continuous State ◽

Space Compression

Download Full-text

Task complexity interacts with state-space uncertainty in the arbitration process between model-based and model-free reinforcement-learning at both behavioral and neural levels

10.1101/393983 ◽

2018 ◽

Cited By ~ 2

Author(s):

Dongjae Kim ◽

Geon Yeong Park ◽

John P. O’Doherty ◽

Sang Wan Lee

Keyword(s):

Prefrontal Cortex ◽

Reinforcement Learning ◽

State Space ◽

Task Complexity ◽

Model Based ◽

Model Free ◽

Arbitration Process ◽

Two Systems ◽

The Brain

SUMMARYA major open question concerns how the brain governs the allocation of control between two distinct strategies for learning from reinforcement: model-based and model-free reinforcement learning. While there is evidence to suggest that the reliability of the predictions of the two systems is a key variable responsible for the arbitration process, another key variable has remained relatively unexplored: the role of task complexity. By using a combination of novel task design, computational modeling, and model-based fMRI analysis, we examined the role of task complexity alongside state-space uncertainty in the arbitration process between model-based and model-free RL. We found evidence to suggest that task complexity plays a role in influencing the arbitration process alongside state-space uncertainty. Participants tended to increase model-based RL control in response to increasing task complexity. However, they resorted to model-free RL when both uncertainty and task complexity were high, suggesting that these two variables interact during the arbitration process. Computational fMRI revealed that task complexity interacts with neural representations of the reliability of the two systems in the inferior prefrontal cortex bilaterally. These findings provide insight into how the inferior prefrontal cortex negotiates the trade-off between model-based and model-free RL in the presence of uncertainty and complexity, and more generally, illustrates how the brain resolves uncertainty and complexity in dynamically changing environments.SUMMARY OF FINDINGS- Elucidated the role of state-space uncertainty and complexity in model-based and model-free RL.- Found behavioral and neural evidence for complexity-sensitive prefrontal arbitration.- High task complexity induces explorative model-based RL.

Download Full-text

Metacognition facilitates the exploitation of unconscious brain states

10.1101/548941 ◽

2019 ◽

Cited By ~ 1

Author(s):

Aurelio Cortese ◽

Hakwan Lau ◽

Mitsuo Kawato

Keyword(s):

Reinforcement Learning ◽

Neural Activity ◽

Pattern Analysis ◽

Brain Activity ◽

High Dimensional ◽

Multivoxel Pattern Analysis ◽

Brain States ◽

Optimal Action ◽

The Brain ◽

Do So

AbstractCan humans be trained to make strategic use of unconscious representations in their own brains? We investigated how one can derive reward-maximizing choices from latent high-dimensional information represented stochastically in neural activity. In a novel decision-making task, reinforcement learning contingencies were defined in real-time by fMRI multivoxel pattern analysis; optimal action policies thereby depended on multidimensional brain activity that took place below the threshold of consciousness. We found that subjects could solve the task, when their reinforcement learning processes were boosted by implicit metacognition to estimate the relevant brain states. With these results we identified a frontal-striatal mechanism by which the brain can untangle tasks of great dimensionality, and can do so much more flexibly than current artificial intelligence.

Download Full-text

Reinforcement Learning based Evolutionary Metric Filtering for High Dimensional Problems

2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) ◽

10.1109/icmla51294.2020.00045 ◽

2020 ◽

Author(s):

Bassel Ali ◽

Koichi Moriyama ◽

Masayuki Numao ◽

Ken-ichi Fukui

Keyword(s):

Reinforcement Learning ◽

High Dimensional

Download Full-text

Reinforcement learning versus swarm intelligence for autonomous multi-HAPS coordination

SN Applied Sciences ◽

10.1007/s42452-021-04658-6 ◽

2021 ◽

Vol 3 (6) ◽

Author(s):

Ogbonnaya Anicho ◽

Philip B. Charlesworth ◽

Gurvinder S. Baicher ◽

Atulya K. Nagar

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Swarm Intelligence ◽

Performance Indicators ◽

Convergence Rates ◽

Tuning Parameters ◽

Continuous State Space ◽

Continuous State ◽

User Coverage ◽

Better Than

AbstractThis work analyses the performance of Reinforcement Learning (RL) versus Swarm Intelligence (SI) for coordinating multiple unmanned High Altitude Platform Stations (HAPS) for communications area coverage. It builds upon previous work which looked at various elements of both algorithms. The main aim of this paper is to address the continuous state-space challenge within this work by using partitioning to manage the high dimensionality problem. This enabled comparing the performance of the classical cases of both RL and SI establishing a baseline for future comparisons of improved versions. From previous work, SI was observed to perform better across various key performance indicators. However, after tuning parameters and empirically choosing suitable partitioning ratio for the RL state space, it was observed that the SI algorithm still maintained superior coordination capability by achieving higher mean overall user coverage (about 20% better than the RL algorithm), in addition to faster convergence rates. Though the RL technique showed better average peak user coverage, the unpredictable coverage dip was a key weakness, making SI a more suitable algorithm within the context of this work.

Download Full-text

Anomaly detection using state‐space models and reinforcement learning

Structural Control and Health Monitoring ◽

10.1002/stc.2720 ◽

2021 ◽

Author(s):

Shervin Khazaeli ◽

Luong Ha Nguyen ◽

James A. Goulet

Keyword(s):

Reinforcement Learning ◽

Anomaly Detection ◽

State Space ◽

State Space Models

Download Full-text

LSTM-Guided Coaching Assistant for Table Tennis Practice

Sensors ◽

10.3390/s18124112 ◽

2018 ◽

Vol 18 (12) ◽

pp. 4112 ◽

Cited By ~ 6

Author(s):

Se-Min Lim ◽

Hyeong-Cheol Oh ◽

Jaein Kim ◽

Juwon Lee ◽

Jooyoung Park

Keyword(s):

Time Series ◽

State Space ◽

Time Series Data ◽

State Space Model ◽

Skill Assessment ◽

Series Data ◽

High Dimensional ◽

Table Tennis ◽

Space Model ◽

Low Dimensional

Recently, wearable devices have become a prominent health care application domain by incorporating a growing number of sensors and adopting smart machine learning technologies. One closely related topic is the strategy of combining the wearable device technology with skill assessment, which can be used in wearable device apps for coaching and/or personal training. Particularly pertinent to skill assessment based on high-dimensional time series data from wearable sensors is classifying whether a player is an expert or a beginner, which skills the player is exercising, and extracting some low-dimensional representations useful for coaching. In this paper, we present a deep learning-based coaching assistant method, which can provide useful information in supporting table tennis practice. Our method uses a combination of LSTM (Long short-term memory) with a deep state space model and probabilistic inference. More precisely, we use the expressive power of LSTM when handling high-dimensional time series data, and state space model and probabilistic inference to extract low-dimensional latent representations useful for coaching. Experimental results show that our method can yield promising results for characterizing high-dimensional time series patterns and for providing useful information when working with wearable IMU (Inertial measurement unit) sensors for table tennis coaching.

Download Full-text