scholarly journals Weighted Double Q-learning

Author(s):  
Zongzhang Zhang ◽  
Zhiyuan Pan ◽  
Mykel J. Kochenderfer

Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the maximum expected action value. To avoid overestimation in Q-learning, the double Q-learning algorithm was recently proposed, which uses the double estimator method. It uses two estimators from independent sets of experiences, with one estimator determining the maximizing action and the other providing the estimate of its value. Double Q-learning sometimes underestimates the action values. This paper introduces a weighted double Q-learning algorithm, which is based on the construction of the weighted double estimator, with the goal of balancing between the overestimation in the single estimator and the underestimation in the double estimator. Empirically, the new algorithm is shown to perform well on several MDP problems.

2012 ◽  
Vol 19 (Special) ◽  
pp. 31-36 ◽  
Author(s):  
Andrzej Rak ◽  
Witold Gierusz

ABSTRACT This paper presents the application of the reinforcement learning algorithms to the task of autonomous determination of the ship trajectory during the in-harbour and harbour approaching manoeuvres. Authors used Markov decision processes formalism to build up the background of algorithm presentation. Two versions of RL algorithms were tested in the simulations: discrete (Q-learning) and continuous form (Least-Squares Policy Iteration). The results show that in both cases ship trajectory can be found. However discrete Q-learning algorithm suffered from many limitations (mainly curse of dimensionality) and practically is not applicable to the examined task. On the other hand, LSPI gave promising results. To be fully operational, proposed solution should be extended by taking into account ship heading and velocity and coupling with advanced multi-variable controller.


2012 ◽  
Vol 588-589 ◽  
pp. 1515-1518
Author(s):  
Yong Song ◽  
Bing Liu ◽  
Yi Bin Li

Reinforcement learning algorithm for multi-robot may will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequential Q-learning base on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be joined in the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multi-robot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.


2020 ◽  
pp. 78-87
Author(s):  
Vijayalakshmi Anand ◽  
Chittaranjan Hota

Crowdsourcing is a model where individuals or organizations receive services from a large group of Internet users including ideas, finances, completing a complex task, etc. Several crowdsourcing websites have failed due to lack of user participation; hence, the success of crowdsourcing platforms is manifested by the mass of user participation. However, an issue of motivating users to participate in crowdsourcing platform stays challenging. We have proposed a new approach, i.e., reinforcement learning-based gamification method to motivate users. Gamification has been a practical approach to engaging users in many fields, but still, it needs an improvement in the Crowdsourcing platform. In this paper, the gamification approach is strengthened by a reinforcement learning algorithm. We have created an intelligent agent using the Reinforcement learning algorithm (Q-learning). This agent suggests an optimal action plan that yields maximum reward points to the users for their active participation in the Crowdsourcing application. Also, its performance is compared with the SARSA algorithm (On- policy learning), which is another Reinforcement learning algorithm.


Sign in / Sign up

Export Citation Format

Share Document