Robustly Learning Composable Options in Deep Reinforcement Learning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/298 ◽

2021 ◽

Author(s):

Akhil Bagaria ◽

Jason Senthil ◽

Matthew Slivinski ◽

George Konidaris

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Hierarchical Reinforcement Learning ◽

Model Based ◽

Maze Navigation ◽

Horizon Problems ◽

High Level ◽

Discovery Method

Hierarchical reinforcement learning (HRL) is only effective for long-horizon problems when high-level skills can be reliably sequentially executed. Unfortunately, learning reliably composable skills is difficult, because all the components of every skill are constantly changing during learning. We propose three methods for improving the composability of learned skills: representing skill initiation regions using a combination of pessimistic and optimistic classifiers; learning re-targetable policies that are robust to non-stationary subgoal regions; and learning robust option policies using model-based RL. We test these improvements on four sparse-reward maze navigation tasks involving a simulated quadrupedal robot. Each method successively improves the robustness of a baseline skill discovery method, substantially outperforming state-of-the-art flat and hierarchical methods.

Download Full-text

The best laid plans: Computational principles of ACC

10.31234/osf.io/3df8y ◽

2020 ◽

Cited By ~ 1

Author(s):

Clay B. Holroyd ◽

Tom Verguts

Keyword(s):

Reinforcement Learning ◽

Anterior Cingulate Cortex ◽

Hierarchical Model ◽

Computational Models ◽

Cingulate Cortex ◽

Anterior Cingulate ◽

Hierarchical Reinforcement Learning ◽

The Past ◽

Model Based ◽

High Level

Despite continual debate for the past thirty years about the function of anterior cingulate cortex (ACC), its key contribution to neurocognition remains unknown. Here we review computational models that illustrate three core principles of ACC function (related to hierarchy, world models and cost), as well as four constraints on the neural implementation of these principles (related to modularity, binding, encoding and learning and regulation). These observations suggest a role for ACC in hierarchical model-based hierarchical reinforcement learning, which instantiates a mechanism for motivating the execution of high-level plans.

Download Full-text

Proximal policy optimization with model-based methods

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211935 ◽

2022 ◽

pp. 1-12

Author(s):

Shuailong Li ◽

Wei Zhang ◽

Huiwen Zhang ◽

Xin Zhang ◽

Yuquan Leng

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Transition Model ◽

Practical Applications ◽

Original Algorithm ◽

Policy Performance ◽

Model Based ◽

Model Free ◽

Future State ◽

Policy Optimization

Model-free reinforcement learning methods have successfully been applied to practical applications such as decision-making problems in Atari games. However, these methods have inherent shortcomings, such as a high variance and low sample efficiency. To improve the policy performance and sample efficiency of model-free reinforcement learning, we propose proximal policy optimization with model-based methods (PPOMM), a fusion method of both model-based and model-free reinforcement learning. PPOMM not only considers the information of past experience but also the prediction information of the future state. PPOMM adds the information of the next state to the objective function of the proximal policy optimization (PPO) algorithm through a model-based method. This method uses two components to optimize the policy: the error of PPO and the error of model-based reinforcement learning. We use the latter to optimize a latent transition model and predict the information of the next state. For most games, this method outperforms the state-of-the-art PPO algorithm when we evaluate across 49 Atari games in the Arcade Learning Environment (ALE). The experimental results show that PPOMM performs better or the same as the original algorithm in 33 games.

Download Full-text

Towards High-Level Intrinsic Exploration in Reinforcement Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/733 ◽

2020 ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

State Of The Art ◽

Experimental Results ◽

Prior Work ◽

Extrinsic Rewards ◽

Intrinsic Reward ◽

Long Time ◽

End To End ◽

High Level

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.

Download Full-text

Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/196 ◽

2017 ◽

Cited By ~ 5

Author(s):

Aijun Bai ◽

Stuart Russell

Keyword(s):

Reinforcement Learning ◽

Hierarchical Structure ◽

Learning Algorithm ◽

State Of The Art ◽

State Machines ◽

Learning To Learn ◽

Hierarchical Reinforcement Learning ◽

Abstract Machines ◽

Finite State ◽

Q Values

In the context of hierarchical reinforcement learning, the idea of hierarchies of abstract machines (HAMs) is to write a partial policy as a set of hierarchical finite state machines with unspecified choice states, and use reinforcement learning to learn an optimal completion of this partial policy. Given a HAM with potentially deep hierarchical structure, there often exist many internal transitions where a machine calls another machine with the environment state unchanged. In this paper, we propose a new hierarchical reinforcement learning algorithm that discovers such internal transitions automatically, and shortcircuits them recursively in computation of Q values. The resulting HAMQ-INT algorithm outperforms the state of the art significantly on the benchmark Taxi domain and a much more complex RoboCup Keepaway domain.

Download Full-text

Model-based hierarchical reinforcement learning and human action control

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2013.0480 ◽

2014 ◽

Vol 369 (1655) ◽

pp. 20130480 ◽

Cited By ~ 63

Author(s):

Matthew Botvinick ◽

Ari Weinstein

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Hierarchical Model ◽

Human Action ◽

Action Control ◽

Computational Framework ◽

Hierarchical Reinforcement Learning ◽

Model Based ◽

Human Decision ◽

Areas Of Interest

Recent work has reawakened interest in goal-directed or ‘model-based’ choice, where decisions are based on prospective evaluation of potential action outcomes. Concurrently, there has been growing attention to the role of hierarchy in decision-making and action control. We focus here on the intersection between these two areas of interest, considering the topic of hierarchical model-based control. To characterize this form of action control, we draw on the computational framework of hierarchical reinforcement learning, using this to interpret recent empirical findings. The resulting picture reveals how hierarchical model-based mechanisms might play a special and pivotal role in human decision-making, dramatically extending the scope and complexity of human behaviour.

Download Full-text

Learning agile and dynamic motor skills for legged robots

Science Robotics ◽

10.1126/scirobotics.aau5872 ◽

2019 ◽

Vol 4 (26) ◽

pp. eaau5872 ◽

Cited By ~ 93

Author(s):

Jemin Hwangbo ◽

Joonho Lee ◽

Alexey Dosovitskiy ◽

Dario Bellicoso ◽

Vassilios Tsounis ◽

...

Keyword(s):

Reinforcement Learning ◽

Motor Skills ◽

State Of The Art ◽

Control Policy ◽

Cost Effective ◽

Legged Robots ◽

Data Generation ◽

Natural Evolution ◽

Body Velocity ◽

High Level

Legged robots pose one of the greatest challenges in robotics. Dynamic and agile maneuvers of animals cannot be imitated by existing methods that are crafted by humans. A compelling alternative is reinforcement learning, which requires minimal craftsmanship and promotes the natural evolution of a control policy. However, so far, reinforcement learning research for legged robots is mainly limited to simulation, and only few and comparably simple examples have been deployed on real systems. The primary reason is that training with real robots, particularly with dynamically balancing systems, is complicated and expensive. In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes. The approach is applied to the ANYmal robot, a sophisticated medium-dog–sized quadrupedal system. Using policies trained in simulation, the quadrupedal machine achieves locomotion skills that go beyond what had been achieved with prior methods: ANYmal is capable of precisely and energy-efficiently following high-level body velocity commands, running faster than before, and recovering from falling even in complex configurations.

Download Full-text

Risk-Aware Model-Based Control

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.617839 ◽

2021 ◽

Vol 8 ◽

Author(s):

Chen Yu ◽

Andre Rosendo

Keyword(s):

Reinforcement Learning ◽

Value At Risk ◽

Learning Algorithm ◽

State Of The Art ◽

Training Data ◽

Conditional Value At Risk ◽

High Dimensional ◽

Model Based Control ◽

Model Based ◽

Model Free

Model-Based Reinforcement Learning (MBRL) algorithms have been shown to have an advantage on data-efficiency, but often overshadowed by state-of-the-art model-free methods in performance, especially when facing high-dimensional and complex problems. In this work, a novel MBRL method is proposed, called Risk-Aware Model-Based Control (RAMCO). It combines uncertainty-aware deep dynamics models and the risk assessment technique Conditional Value at Risk (CVaR). This mechanism is appropriate for real-world application since it takes epistemic risk into consideration. In addition, we use a model-free solver to produce warm-up training data, and this setting improves the performance in low-dimensional environments and covers the shortage of MBRL’s nature in the high-dimensional scenarios. In comparison with other state-of-the-art reinforcement learning algorithms, we show that it produces superior results on a walking robot model. We also evaluate the method with an Eidos environment, which is a novel experimental method with multi-dimensional randomly initialized deep neural networks to measure the performance of any reinforcement learning algorithm, and the advantages of RAMCO are highlighted.

Download Full-text

Playing FPS Games With Environment-Aware Hierarchical Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/482 ◽

2019 ◽

Author(s):

Shihong Song ◽

Jiayi Weng ◽

Hang Su ◽

Dong Yan ◽

Haosheng Zou ◽

...

Keyword(s):

Reinforcement Learning ◽

Hierarchical Model ◽

Intrinsic Rewards ◽

Environmental Signals ◽

Agent Based ◽

Hierarchical Reinforcement Learning ◽

Intrinsic Reward ◽

Level Manager ◽

High Level ◽

First Person Shooter

Learning rational behaviors in First-person-shooter (FPS) games is a challenging task for Reinforcement Learning (RL) with the primary difficulties of huge action space and insufficient exploration. To address this, we propose a hierarchical agent based on combined options with intrinsic rewards to drive exploration. Specifically, we present a hierarchical model that works in a manager-worker fashion over two levels of hierarchy. The high-level manager learns a policy over options, and the low-level workers, motivated by intrinsic reward, learn to execute the options. Performance is further improved with environmental signals appropriately harnessed. Extensive experiments demonstrate that our trained bot significantly outperforms the alternative RL-based models on FPS games requiring maze solving and combat skills, etc. Notably, we achieved first place in VDAIC 2018 Track(1).

Download Full-text

Diversity-Driven Extensible Hierarchical Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014992 ◽

2019 ◽

Vol 33 ◽

pp. 4992-4999 ◽

Cited By ~ 1

Author(s):

Yuhang Song ◽

Jianyi Wang ◽

Thomas Lukasiewicz ◽

Zhenghua Xu ◽

Mai Xu

Keyword(s):

Reinforcement Learning ◽

Real World ◽

State Of The Art ◽

Experimental Studies ◽

The State ◽

Transferable Skills ◽

Hierarchical Reinforcement Learning ◽

Multiple Levels

Hierarchical reinforcement learning (HRL) has recently shown promising advances on speeding up learning, improving the exploration, and discovering intertask transferable skills. Most recent works focus on HRL with two levels, i.e., a master policy manipulates subpolicies, which in turn manipulate primitive actions. However, HRL with multiple levels is usually needed in many real-world scenarios, whose ultimate goals are highly abstract, while their actions are very primitive. Therefore, in this paper, we propose a diversitydriven extensible HRL (DEHRL), where an extensible and scalable framework is built and learned levelwise to realize HRL with multiple levels. DEHRL follows a popular assumption: diverse subpolicies are useful, i.e., subpolicies are believed to be more useful if they are more diverse. However, existing implementations of this diversity assumption usually have their own drawbacks, which makes them inapplicable to HRL with multiple levels. Consequently, we further propose a novel diversity-driven solution to achieve this assumption in DEHRL. Experimental studies evaluate DEHRL with nine baselines from four perspectives in two domains; the results show that DEHRL outperforms the state-of-the-art baselines in all four aspects.

Download Full-text

Hierarchical Reinforcement Learning for Course Recommendation in MOOCs

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301435 ◽

2019 ◽

Vol 33 ◽

pp. 435-442 ◽

Cited By ~ 9

Author(s):

Jing Zhang ◽

Bowen Hao ◽

Bo Chen ◽

Cuiping Li ◽

Hong Chen ◽

...

Keyword(s):

Reinforcement Learning ◽

Online Courses ◽

Learning Algorithm ◽

State Of The Art ◽

Massive Open Online Courses ◽

User Profiles ◽

Massive Open Online ◽

Hierarchical Reinforcement Learning ◽

Proposed Model ◽

Recent Attention

The proliferation of massive open online courses (MOOCs) demands an effective way of personalized course recommendation. The recent attention-based recommendation models can distinguish the effects of different historical courses when recommending different target courses. However, when a user has interests in many different courses, the attention mechanism will perform poorly as the effects of the contributing courses are diluted by diverse historical courses. To address such a challenge, we propose a hierarchical reinforcement learning algorithm to revise the user profiles and tune the course recommendation model on the revised profiles.Systematically, we evaluate the proposed model on a real dataset consisting of 1,302 courses, 82,535 users and 458,454 user enrolled behaviors, which were collected from XuetangX—one of the largest MOOCs in China. Experimental results show that the proposed model significantly outperforms the state-of-the-art recommendation models (improving 5.02% to 18.95% in terms of HR@10).

Download Full-text