Minibatch Recursive Least Squares Q-Learning

The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximation, called the minibatch recursive least squares Q-learning (MRLS-Q). Different from the traditional Q-learning algorithm with linear function approximation, the learning mechanism and model structure of MRLS-Q are more similar to those of DQNs with only one input layer and one linear output layer. It uses the experience replay and the minibatch training mode and uses the agent’s states rather than the agent’s state-action pairs as the inputs. As a result, it can be used alone for low-dimensional problems and can be seamlessly integrated into DQN as the last layer for high-dimensional problems as well. In addition, MRLS-Q uses our proposed average RLS optimization technique, so that it can achieve better convergence performance whether it is used alone or integrated with DQN. At the end of this paper, we demonstrate the effectiveness of MRLS-Q on the CartPole problem and four Atari games and investigate the influences of its hyperparameters experimentally.

Download Full-text

Reinforcement learning in discrete and continuous domains applied to ship trajectory generation

Polish Maritime Research ◽

10.2478/v10012-012-0020-8 ◽

2012 ◽

Vol 19 (Special) ◽

pp. 31-36 ◽

Cited By ~ 2

Author(s):

Andrzej Rak ◽

Witold Gierusz

Keyword(s):

Reinforcement Learning ◽

Least Squares ◽

Learning Algorithm ◽

Learning Algorithms ◽

Trajectory Generation ◽

Decision Processes ◽

The Other ◽

Q Learning ◽

Continuous Domains

ABSTRACT This paper presents the application of the reinforcement learning algorithms to the task of autonomous determination of the ship trajectory during the in-harbour and harbour approaching manoeuvres. Authors used Markov decision processes formalism to build up the background of algorithm presentation. Two versions of RL algorithms were tested in the simulations: discrete (Q-learning) and continuous form (Least-Squares Policy Iteration). The results show that in both cases ship trajectory can be found. However discrete Q-learning algorithm suffered from many limitations (mainly curse of dimensionality) and practically is not applicable to the examined task. On the other hand, LSPI gave promising results. To be fully operational, proposed solution should be extended by taking into account ship heading and velocity and coupling with advanced multi-variable controller.

Download Full-text

Convergence of Q-learning with linear function approximation

2007 European Control Conference (ECC) ◽

10.23919/ecc.2007.7068926 ◽

2007 ◽

Cited By ~ 6

Author(s):

Francisco S. Melo ◽

M. Isabel Ribeiro

Keyword(s):

Linear Function ◽

Function Approximation ◽

Q Learning ◽

Linear Function Approximation

Download Full-text

Q-Learning with Linear Function Approximation

Learning Theory - Lecture Notes in Computer Science ◽

10.1007/978-3-540-72927-3_23 ◽

2007 ◽

pp. 308-322 ◽

Cited By ~ 18

Author(s):

Francisco S. Melo ◽

M. Isabel Ribeiro

Keyword(s):

Linear Function ◽

Function Approximation ◽

Q Learning ◽

Linear Function Approximation

Download Full-text

A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms

Neural Computation ◽

10.1162/089976699300016070 ◽

1999 ◽

Vol 11 (8) ◽

pp. 2017-2060 ◽

Cited By ~ 70

Author(s):

Csaba Szepesvári ◽

Michael L. Littman

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Learning Algorithm ◽

Learning Algorithms ◽

Sequential Decision ◽

Q Learning ◽

Markov Games ◽

Optimal Behavior ◽

Risk Sensitive ◽

Optimal Value

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.

Download Full-text

Convergence analysis of temporal-difference learning algorithms with linear function approximation

Proceedings of the twelfth annual conference on Computational learning theory - COLT '99 ◽

10.1145/307400.307438 ◽

1999 ◽

Cited By ~ 4

Author(s):

Vladislav Tadić

Keyword(s):

Convergence Analysis ◽

Linear Function ◽

Function Approximation ◽

Learning Algorithms ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Linear Function Approximation

Download Full-text

Iterative SARSA: The Modified SARSA Algorithm for Finding the Optimal Path

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9429.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 4333-4338

Keyword(s):

Reinforcement Learning ◽

Comparative Analysis ◽

Mobile Robots ◽

Learning Algorithm ◽

Learning Algorithms ◽

Optimal Path ◽

Autonomous Mobile Robots ◽

Current Standard ◽

Q Learning ◽

Better Than

This paper presents a thorough comparative analysis of various reinforcement learning algorithms used by autonomous mobile robots for optimal path finding and, we propose a new algorithm called Iterative SARSA for the same. The main objective of the paper is to differentiate between the Q-learning and SARSA, and modify the latter. These algorithms use either the on-policy or off-policy methods of reinforcement learning. For the on-policy method, we have used the SARSA algorithm and for the off-policy method, the Q-learning algorithm has been used. These algorithms also have an impacting effect on finding the shortest path possible for the robot. Based on the results obtained, we have concluded how our algorithm is better than the current standard reinforcement learning algorithms

Download Full-text

A line follower robot implementation using Lego's Mindstorms Kit and Q-Learning

Acta Universitaria ◽

10.15174/au.2012.350 ◽

2012 ◽

Vol 22 ◽

pp. 113-118 ◽

Cited By ~ 1

Author(s):

Víctor Ricardo Cruz-Álvarez ◽

Enrique Hidalgo-Peña ◽

Hector-Gabriel Acosta-Mesa

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Learning Algorithm ◽

Learning Algorithms ◽

Programming Environment ◽

Q Learning

A common problem working with mobile robots is that programming phase could be a long, expensive and heavy process for programmers. The reinforcement learning algorithms offer one of the most general frameworks in learning subjects. This work presents an approach using the Q-Learning algorithm on a Lego robot in order for it to learn by itself how to follow a blackline drawn down on a white surface, using Matlab [5] as programming environment.

Download Full-text