Overcoming model bias for robust offline deep reinforcement learning

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2021.104366 ◽

2021 ◽

Vol 104 ◽

pp. 104366

Author(s):

Phillip Swazinna ◽

Steffen Udluft ◽

Thomas Runkler

Keyword(s):

Reinforcement Learning ◽

Download Full-text

Deterministic Value-Policy Gradients

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5732 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3316-3323

Author(s):

Qingpeng Cai ◽

Ling Pan ◽

Pingzhong Tang

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Infinite Horizon ◽

Gradient Algorithm ◽

Continuous Control ◽

Policy Gradient ◽

Analytical Gradients

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.

Download Full-text

Supplemental Material for Reconciling Reinforcement Learning Models With Behavioral Extinction and Renewal: Implications for Addiction, Relapse, and Problem Gambling

Psychological Review ◽

10.1037/0033-295x.114.3.784.supp ◽

2007 ◽

Keyword(s):

Reinforcement Learning ◽

Problem Gambling ◽

Learning Models ◽

Behavioral Extinction ◽

Reinforcement Learning Models

Download Full-text

Bayes factors for reinforcement-learning models of the Iowa gambling task.

Decision ◽

10.1037/dec0000040 ◽

2016 ◽

Vol 3 (2) ◽

pp. 115-131 ◽

Author(s):

Helen Steingroever ◽

Ruud Wetzels ◽

Eric-Jan Wagenmakers

Keyword(s):

Reinforcement Learning ◽

Iowa Gambling Task ◽

Bayes Factors ◽

Gambling Task ◽

Learning Models ◽

Reinforcement Learning Models

Download Full-text

Analogical Reinforcement Learning With Two-Stage Memory Retrieval

PsycEXTRA Dataset ◽

10.1037/e528942014-705 ◽

2014 ◽

Author(s):

James Foster ◽

Matt Jones

Keyword(s):

Reinforcement Learning ◽

Memory Retrieval ◽

Download Full-text

Effects of Working Memory Capacity on the Speed and Accuracy of Learning in Reinforcement Learning Models

PsycEXTRA Dataset ◽

10.1037/e528942014-552 ◽

2014 ◽

Author(s):

Adnane Ez-Zizi ◽

Simon Farrell ◽

David Leslie

Keyword(s):

Working Memory ◽

Reinforcement Learning ◽

Working Memory Capacity ◽

Memory Capacity ◽

Learning Models ◽

Reinforcement Learning Models ◽

Speed And Accuracy

Download Full-text

Supplemental Material for Reinforcement Learning Models of Risky Choice and the Promotion of Risk-Taking by Losses Disguised as Wins in Rats

Journal of Experimental Psychology Animal Learning and Cognition ◽

10.1037/xan0000141.supp ◽

2017 ◽

Keyword(s):

Reinforcement Learning ◽

Risk Taking ◽

Risky Choice ◽

Learning Models ◽

Losses Disguised As Wins ◽

Reinforcement Learning Models

Download Full-text

Reinforcement learning of irrelevant stimulus-response associations modulates cognitive control.

Journal of Experimental Psychology Learning Memory and Cognition ◽

10.1037/xlm0000850 ◽

2020 ◽

Author(s):

Jinglu Chen ◽

Ling Tan ◽

Lu Liu ◽

Ling Wang

Keyword(s):

Reinforcement Learning ◽

Cognitive Control ◽

Irrelevant Stimulus ◽

Stimulus Response

Download Full-text

A Collaborative Scheduling Lane Changing Model for Intelligent Connected Vehicles Based on Deep Reinforcement Learning

10.1061/9780784483053.178 ◽

2020 ◽

Author(s):

Zheyu Cui ◽

Jianming Hu

Keyword(s):

Reinforcement Learning ◽

Connected Vehicles ◽

Lane Changing ◽

Collaborative Scheduling

Download Full-text

Multi-Agent Deep Reinforcement Learning for Decentralized Cooperative Traffic Signal Control

10.1061/9780784483053.039 ◽

2020 ◽

Author(s):

Yang Zhao ◽

Jian-Ming Hu ◽

Ming-Yang Gao ◽

Zuo Zhang

Keyword(s):

Reinforcement Learning ◽

Traffic Signal ◽

Signal Control ◽

Traffic Signal Control ◽

Download Full-text

Research on Signal Control Method of Single Intersection Based on Reinforcement Learning

10.1061/9780784483053.015 ◽

2020 ◽

Author(s):

Yilong Ren ◽

Le Zhang ◽

Han Jiang ◽

Chengsheng Liu

Keyword(s):

Reinforcement Learning ◽

Control Method ◽

Download Full-text