LC-Learning: Phased Method for Average Reward Reinforcement Learning —Preliminary Results —

This paper proposes the reinforcement learning agent that estimates internal rewards using external rewards in order to avoid conflict in multi-step dilemma problem. Intensive simulation results have revealed that the agent succeeds in avoiding local convergence and obtains a behavior policy for reaching a higher reward by updating the Q-value using the value that is subtracted the average reward from an external reward.

Download Full-text

Research on improvement of model-free average reward reinforcement learning and its simulation experiment

2009 Chinese Control and Decision Conference ◽

10.1109/ccdc.2009.5194915 ◽

2009 ◽

Author(s):

Wei Chen ◽

Zhenkun Zhai ◽

Xiong Li ◽

Jing Guo ◽

Jie Wang

Keyword(s):

Reinforcement Learning ◽

Simulation Experiment ◽

Average Reward ◽

Model Free

Download Full-text

Continuous-time Markov decision process with average reward: Using reinforcement learning method

2015 34th Chinese Control Conference (CCC) ◽

10.1109/chicc.2015.7260117 ◽

2015 ◽

Author(s):

Shengde Jia ◽

Lincheng Shen ◽

Hongtao Xue

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Continuous Time ◽

Decision Process ◽

Learning Method ◽

Average Reward ◽

Markov Decision

Download Full-text

Tuning Local Search by Average-Reward Reinforcement Learning

Lecture Notes in Computer Science - Learning and Intelligent Optimization ◽

10.1007/978-3-540-92695-5_15 ◽

2008 ◽

pp. 192-205 ◽

Cited By ~ 1

Author(s):

Steven Prestwich

Keyword(s):

Reinforcement Learning ◽

Local Search ◽

Average Reward

Download Full-text

Average-Reward Reinforcement Learning

10.1007/springerreference_178761 ◽

2012 ◽

Keyword(s):

Reinforcement Learning ◽

Average Reward

Download Full-text

Heuristic Q-learning based on experience replay for three-dimensional path planning of the unmanned aerial vehicle

Science Progress ◽

10.1177/0036850419879024 ◽

2019 ◽

Vol 103 (1) ◽

pp. 003685041987902 ◽

Cited By ~ 2

Author(s):

Ronglei Xie ◽

Zhijun Meng ◽

Yaoming Zhou ◽

Yunpeng Ma ◽

Zhe Wu

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Unmanned Aerial Vehicle ◽

Learning Algorithm ◽

Three Dimensional ◽

Convergence Speed ◽

Average Reward ◽

Heuristic Function ◽

Experience Replay ◽

Aerial Vehicle

In order to solve the problem that the existing reinforcement learning algorithm is difficult to converge due to the excessive state space of the three-dimensional path planning of the unmanned aerial vehicle, this article proposes a reinforcement learning algorithm based on the heuristic function and the maximum average reward value of the experience replay mechanism. The knowledge of track performance is introduced to construct heuristic function to guide the unmanned aerial vehicles’ action selection and reduce the useless exploration. Experience replay mechanism based on maximum average reward increases the utilization rate of excellent samples and the convergence speed of the algorithm. The simulation results show that the proposed three-dimensional path planning algorithm has good learning efficiency, and the convergence speed and training performance are significantly improved.

Download Full-text