Reinforcement Learning for Control Using Value Function Approximation

Author(s):  
Konstantinos Gatsis ◽  
George J. Pappas
2018 ◽  
Vol 2018 ◽  
pp. 1-6 ◽  
Author(s):  
Xi-liang Chen ◽  
Lei Cao ◽  
Chen-xi Li ◽  
Zhi-xiong Xu ◽  
Jun Lai

The popular deepQlearning algorithm is known to be instability because of theQ-value’s shake and overestimation action values under certain conditions. These issues tend to adversely affect their performance. In this paper, we develop the ensemble network architecture for deep reinforcement learning which is based on value function approximation. The temporal ensemble stabilizes the training process by reducing the variance of target approximation error and the ensemble of target values reduces the overestimate and makes better performance by estimating more accurateQ-value. Our results show that this architecture leads to statistically significant better value evaluation and more stable and better performance on several classical control tasks at OpenAI Gym environment.


Sign in / Sign up

Export Citation Format

Share Document