Least Squares Temporal Difference Methods: An Analysis under General Conditions

SIAM Journal on Control and Optimization ◽

10.1137/100807879 ◽

2012 ◽

Vol 50 (6) ◽

pp. 3310-3343 ◽

Author(s):

Huizhen Yu

Keyword(s):

Least Squares ◽

Temporal Difference ◽

Difference Methods ◽

Temporal Difference Methods ◽

General Conditions

Download Full-text

Convergence Results for Some Temporal Difference Methods Based on Least Squares

IEEE Transactions on Automatic Control ◽

10.1109/tac.2009.2022097 ◽

2009 ◽

Vol 54 (7) ◽

pp. 1515-1531 ◽

Author(s):

Huizhen Yu ◽

D.P. Bertsekas

Keyword(s):

Least Squares ◽

Temporal Difference ◽

Convergence Results ◽

Difference Methods ◽

Temporal Difference Methods

Download Full-text

Sequential Detection Using Least Squares Temporal Difference Methods

2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings ◽

10.1109/icassp.2006.1661372 ◽

2006 ◽

Author(s):

A. Kuh ◽

D. Mandic

Keyword(s):

Least Squares ◽

Sequential Detection ◽

Temporal Difference ◽

Difference Methods ◽

Temporal Difference Methods

Download Full-text

Glucose level control using Temporal Difference methods

2017 Iranian Conference on Electrical Engineering (ICEE) ◽

10.1109/iraniancee.2017.7985166 ◽

2017 ◽

Author(s):

Amin Noori ◽

Mohammad Ali Sadrnia ◽

Mohammad bagher Naghibi Sistani

Keyword(s):

Glucose Level ◽

Temporal Difference ◽

Level Control ◽

Difference Methods ◽

Temporal Difference Methods

Download Full-text

Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts

2010 20th International Conference on Pattern Recognition ◽

10.1109/icpr.2010.717 ◽

2010 ◽

Author(s):

Stefan Fausser ◽

Friedhelm Schwenker

Keyword(s):

Temporal Difference ◽

Difference Methods ◽

Temporal Difference Methods

Download Full-text

Proximal algorithms and temporal difference methods for solving fixed point problems

Computational Optimization and Applications ◽

10.1007/s10589-018-9990-5 ◽

2018 ◽

Vol 70 (3) ◽

pp. 709-736 ◽

Author(s):

Dimitri P. Bertsekas

Keyword(s):

Fixed Point ◽

Fixed Point Problems ◽

Temporal Difference ◽

Proximal Algorithms ◽

Difference Methods ◽

Temporal Difference Methods

Download Full-text

Improved Temporal Difference Methods with Linear Function Approximation

Handbook of Learning and Approximate Dynamic Programming ◽

10.1109/9780470544785.ch9 ◽

2009 ◽

Keyword(s):

Linear Function ◽

Function Approximation ◽

Temporal Difference ◽

Linear Function Approximation ◽

Difference Methods ◽

Temporal Difference Methods

Download Full-text

A unified framework for temporal difference methods

2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning ◽

10.1109/adprl.2009.4927518 ◽

2009 ◽

Author(s):

Dimitri P. Bertsekas

Keyword(s):

Temporal Difference ◽

Unified Framework ◽

Difference Methods ◽

Temporal Difference Methods

Download Full-text

Comparing evolutionary and temporal difference methods in a reinforcement learning domain

Proceedings of the 8th annual conference on Genetic and evolutionary computation - GECCO '06 ◽

10.1145/1143997.1144202 ◽

2006 ◽

Author(s):

Matthew E. Taylor ◽

Shimon Whiteson ◽

Peter Stone

Keyword(s):

Reinforcement Learning ◽

Temporal Difference ◽

Difference Methods ◽

Temporal Difference Methods

Download Full-text

Learning to Evaluate Go Positions via Temporal Difference Methods

Computational Intelligence in Games - Studies in Fuzziness and Soft Computing ◽

10.1007/978-3-7908-1833-8_4 ◽

2001 ◽

pp. 77-98 ◽

Author(s):

N. N. Schraudolph ◽

P. Dayan ◽

T. J. Sejnowski

Keyword(s):

Temporal Difference ◽

Difference Methods ◽

Temporal Difference Methods

Download Full-text

A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5779 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3701-3708

Author(s):

Gal Dalal ◽

Balazs Szorenyi ◽

Gugan Thoppe

Keyword(s):

Reinforcement Learning ◽

Convergence Rate ◽

Policy Evaluation ◽

Finite Time ◽

High Probability ◽

Temporal Difference ◽

Time Analysis ◽

Difference Methods ◽

Temporal Difference Methods ◽

Two Timescale Stochastic Approximation

Policy evaluation in reinforcement learning is often conducted using two-timescale stochastic approximation, which results in various gradient temporal difference methods such as GTD(0), GTD2, and TDC. Here, we provide convergence rate bounds for this suite of algorithms. Algorithms such as these have two iterates, θn and wn, which are updated using two distinct stepsize sequences, αn and βn, respectively. Assuming αn = n−α and βn = n−β with 1 > α > β > 0, we show that, with high probability, the two iterates converge to their respective solutions θ* and w* at rates given by ∥θn - θ*∥ = Õ(n−α/2) and ∥wn - w*∥ = Õ(n−β/2); here, Õ hides logarithmic terms. Via comparable lower bounds, we show that these bounds are, in fact, tight. To the best of our knowledge, ours is the first finite-time analysis which achieves these rates. While it was known that the two timescale components decouple asymptotically, our results depict this phenomenon more explicitly by showing that it in fact happens from some finite time onwards. Lastly, compared to existing works, our result applies to a broader family of stepsizes, including non-square summable ones.

Download Full-text