Weighted Double Q-learning

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/483 ◽

2017 ◽

Cited By ~ 9

Author(s):

Zongzhang Zhang ◽

Zhiyuan Pan ◽

Mykel J. Kochenderfer

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Independent Sets ◽

The Other ◽

Q Learning ◽

Stochastic Environments ◽

Action Value ◽

Reinforcement Learning Algorithm

Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the maximum expected action value. To avoid overestimation in Q-learning, the double Q-learning algorithm was recently proposed, which uses the double estimator method. It uses two estimators from independent sets of experiences, with one estimator determining the maximizing action and the other providing the estimate of its value. Double Q-learning sometimes underestimates the action values. This paper introduces a weighted double Q-learning algorithm, which is based on the construction of the weighted double estimator, with the goal of balancing between the overestimation in the single estimator and the underestimation in the double estimator. Empirically, the new algorithm is shown to perform well on several MDP problems.

Download Full-text

Reinforcement learning in discrete and continuous domains applied to ship trajectory generation

Polish Maritime Research ◽

10.2478/v10012-012-0020-8 ◽

2012 ◽

Vol 19 (Special) ◽

pp. 31-36 ◽

Cited By ~ 2

Author(s):

Andrzej Rak ◽

Witold Gierusz

Keyword(s):

Reinforcement Learning ◽

Least Squares ◽

Learning Algorithm ◽

Learning Algorithms ◽

Trajectory Generation ◽

Decision Processes ◽

The Other ◽

Q Learning ◽

Continuous Domains

ABSTRACT This paper presents the application of the reinforcement learning algorithms to the task of autonomous determination of the ship trajectory during the in-harbour and harbour approaching manoeuvres. Authors used Markov decision processes formalism to build up the background of algorithm presentation. Two versions of RL algorithms were tested in the simulations: discrete (Q-learning) and continuous form (Least-Squares Policy Iteration). The results show that in both cases ship trajectory can be found. However discrete Q-learning algorithm suffered from many limitations (mainly curse of dimensionality) and practically is not applicable to the examined task. On the other hand, LSPI gave promising results. To be fully operational, proposed solution should be extended by taking into account ship heading and velocity and coupling with advanced multi-variable controller.

Download Full-text

The Knowledge Sharing Based Reinforcement Learning Algorithm for Collective Behaviors of Mobile Robots

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.588-589.1515 ◽

2012 ◽

Vol 588-589 ◽

pp. 1515-1518

Author(s):

Yong Song ◽

Bing Liu ◽

Yi Bin Li

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Knowledge Sharing ◽

State Space ◽

Learning Algorithm ◽

Collective Behaviors ◽

Q Learning ◽

Exponential Increase ◽

Multi Robot ◽

Reinforcement Learning Algorithm

Reinforcement learning algorithm for multi-robot may will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequential Q-learning base on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be joined in the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multi-robot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.

Download Full-text

MOTIVATION OF PARTICIPANTS IN CROWDSOURCING PLATFORMS USING INTELLIGENT AGENTS

International Journal of Computing ◽

10.47839/ijc.19.1.1696 ◽

2020 ◽

pp. 78-87

Author(s):

Vijayalakshmi Anand ◽

Chittaranjan Hota

Keyword(s):

Reinforcement Learning ◽

Intelligent Agents ◽

Learning Algorithm ◽

Action Plan ◽

User Participation ◽

New Approach ◽

Q Learning ◽

Internet Users ◽

Optimal Action ◽

Reinforcement Learning Algorithm

Crowdsourcing is a model where individuals or organizations receive services from a large group of Internet users including ideas, finances, completing a complex task, etc. Several crowdsourcing websites have failed due to lack of user participation; hence, the success of crowdsourcing platforms is manifested by the mass of user participation. However, an issue of motivating users to participate in crowdsourcing platform stays challenging. We have proposed a new approach, i.e., reinforcement learning-based gamification method to motivate users. Gamification has been a practical approach to engaging users in many fields, but still, it needs an improvement in the Crowdsourcing platform. In this paper, the gamification approach is strengthened by a reinforcement learning algorithm. We have created an intelligent agent using the Reinforcement learning algorithm (Q-learning). This agent suggests an optimal action plan that yields maximum reward points to the users for their active participation in the Crowdsourcing application. Also, its performance is compared with the SARSA algorithm (On- policy learning), which is another Reinforcement learning algorithm.

Download Full-text