linear function approximation Latest Research Papers

The exponential explosion of joint actions and massive data collection are two main challenges in multiagent reinforcement learning algorithms with centralized training. To overcome these problems, in this paper, we propose a model-free and fully decentralized actor-critic multiagent reinforcement learning algorithm based on message diffusion. To this end, the agents are assumed to be placed in a time-varying communication network. Each agent makes limited observations regarding the global state and joint actions; therefore, it needs to obtain and share information with others over the network. In the proposed algorithm, agents hold local estimations of the global state and joint actions and update them with local observations and the messages received from neighbors. Under the hypothesis of the global value decomposition, the gradient of the global objective function to an individual agent is derived. The convergence of the proposed algorithm with linear function approximation is guaranteed according to the stochastic approximation theory. In the experiments, the proposed algorithm was applied to a passive location task multiagent environment and achieved superior performance compared to state-of-the-art algorithms.

Download Full-text

Minibatch Recursive Least Squares Q-Learning

Computational Intelligence and Neuroscience ◽

10.1155/2021/5370281 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Chunyuan Zhang ◽

Qi Song ◽

Zeng Meng

Keyword(s):

Reinforcement Learning ◽

Least Squares ◽

Linear Function ◽

Function Approximation ◽

Learning Algorithm ◽

Learning Algorithms ◽

Optimization Technique ◽

Recursive Least Squares ◽

Q Learning ◽

Linear Function Approximation

The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximation, called the minibatch recursive least squares Q-learning (MRLS-Q). Different from the traditional Q-learning algorithm with linear function approximation, the learning mechanism and model structure of MRLS-Q are more similar to those of DQNs with only one input layer and one linear output layer. It uses the experience replay and the minibatch training mode and uses the agent’s states rather than the agent’s state-action pairs as the inputs. As a result, it can be used alone for low-dimensional problems and can be seamlessly integrated into DQN as the last layer for high-dimensional problems as well. In addition, MRLS-Q uses our proposed average RLS optimization technique, so that it can achieve better convergence performance whether it is used alone or integrated with DQN. At the end of this paper, we demonstrate the effectiveness of MRLS-Q on the CartPole problem and four Atari games and investigate the influences of its hyperparameters experimentally.

Download Full-text

Byzantine-Resilient Decentralized TD Learning with Linear Function Approximation

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413992 ◽

2021 ◽

Author(s):

Zhaoxian Wu ◽

Han Shen ◽

Tianyi Chen ◽

Qing Ling

Keyword(s):

Linear Function ◽

Function Approximation ◽

Linear Function Approximation

Download Full-text

Gaussian Based Non-linear Function Approximation for Reinforcement Learning

SN Computer Science ◽

10.1007/s42979-021-00642-4 ◽

2021 ◽

Vol 2 (3) ◽

Author(s):

Abbas Haider ◽

Glenn Hawe ◽

Hui Wang ◽

Bryan Scotney

Keyword(s):

Reinforcement Learning ◽

Linear Function ◽

Function Approximation ◽

World Market ◽

Information Loss ◽

State Spaces ◽

State Information ◽

Linear Function Approximation ◽

Non Linear ◽

Tile Coding

AbstractReinforcement learning (RL) problems with continuous states and discrete actions (CSDA) can be found in classic examples such as Cart Pole and Puck World, as well as real world applications such as Market Making. Solutions to CSDA problems typically involve a function approximation (FA) of the mapping from states to actions and can be linear or nonlinear. Linear FAs such as tile-coding (Sutton and Barto in Reinforcement learning, 2nd ed, 2009) suffer from state information loss due to state discretization, whilst non-linear FAs such as DQN (Mnih et al. in Playing atari with deep reinforcement learning, https://arxiv.org/abs/1312.5602, 2013) are practically infeasible in infinitely large state spaces due to their cubic time complexity ($$O(n^3)$$ O ( n 3 ) ). In this paper, we propose a novel, general solution to CSDA problems, called Gaussian distribution based non-linear function approximation (GBNLFA). Experimentation on three CSDA RL problems (Cart Pole, Puck World, Market Marking) demonstrates the superiority of GBNLFA over state-of-the-art FAs, namely tile-coding and DQN. In particular, GBNLFA resolves the state information loss problem with linear FAs and provides an asymptotically faster algorithm (O(n)) than linear FAs ($$O(n^2)$$ O ( n 2 ) ) and neural network based nonlinear FAs ($$O(n^3)$$ O ( n 3 ) ).

Download Full-text

A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation

Operations Research ◽

10.1287/opre.2020.2024 ◽

2021 ◽

Author(s):

Jalaj Bhandari ◽

Daniel Russo ◽

Raghav Singal

Keyword(s):

Linear Function ◽

Finite Time ◽

Function Approximation ◽

Gradient Descent ◽

Convergence Rates ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Convergence Results ◽

Linear Function Approximation ◽

Markov Reward

Temporal difference learning (TD) is a simple iterative algorithm widely used for policy evaluation in Markov reward processes. Bhandari et al. prove finite time convergence rates for TD learning with linear function approximation. The analysis follows using a key insight that establishes rigorous connections between TD updates and those of online gradient descent. In a model where observations are corrupted by i.i.d. noise, convergence results for TD follow by essentially mirroring the analysis for online gradient descent. Using an information-theoretic technique, the authors also provide results for the case when TD is applied to a single Markovian data stream where the algorithm’s updates can be severely biased. Their analysis seamlessly extends to the study of TD learning with eligibility traces and Q-learning for high-dimensional optimal stopping problems.

Download Full-text

Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling

Machine Learning ◽

10.1007/s10994-020-05912-5 ◽

2021 ◽

Author(s):

L. A. Prashanth ◽

Nathaniel Korda ◽

Rémi Munos

Keyword(s):

Linear Function ◽

Function Approximation ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Uniform Sampling ◽

Linear Function Approximation ◽

Concentration Bounds ◽

Batch Data

Download Full-text

Reinforcement learning vs. rule-based adaptive traffic signal control: A Fourier basis linear function approximation for traffic signal control

AI Communications ◽

10.3233/aic-201580 ◽

2021 ◽

pp. 1-15

Author(s):

Theresa Ziemke ◽

Lucas N. Alegre ◽

Ana L.C. Bazzan

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Function Approximation ◽

Traffic Signals ◽

The State ◽

Signal Control ◽

Traffic Signal Control ◽

Rule Based ◽

Fourier Basis ◽

Linear Function Approximation

Reinforcement learning is an efficient, widely used machine learning technique that performs well when the state and action spaces have a reasonable size. This is rarely the case regarding control-related problems, as for instance controlling traffic signals. Here, the state space can be very large. In order to deal with the curse of dimensionality, a rough discretization of such space can be employed. However, this is effective just up to a certain point. A way to mitigate this is to use techniques that generalize the state space such as function approximation. In this paper, a linear function approximation is used. Specifically, SARSA ( λ ) with Fourier basis features is implemented to control traffic signals in the agent-based transport simulation MATSim. The results are compared not only to trivial controllers such as fixed-time, but also to state-of-the-art rule-based adaptive methods. It is concluded that SARSA ( λ ) with Fourier basis features is able to outperform such methods, especially in scenarios with varying traffic demands or unexpected events.

Download Full-text

Using Reinforcement Learning to Control Traffic Signals in a Real-World Scenario: An Approach Based on Linear Function Approximation

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2021.3091014 ◽

2021 ◽

pp. 1-10

Author(s):

Lucas N. Alegre ◽

Theresa Ziemke ◽

Ana L. C. Bazzan

Keyword(s):

Reinforcement Learning ◽

Linear Function ◽

Real World ◽

Function Approximation ◽

Traffic Signals ◽

Linear Function Approximation ◽

Control Traffic

Download Full-text

Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation

SIAM Journal on Mathematics of Data Science ◽

10.1137/20m1311971 ◽

2021 ◽

Vol 3 (1) ◽

pp. 298-320

Author(s):

Thinh T. Doan ◽

Siva Theja Maguluri ◽

Justin Romberg

Keyword(s):

Linear Function ◽

Finite Time ◽

Function Approximation ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Time Performance ◽

Linear Function Approximation

Download Full-text

Byzantine-Resilient Decentralized TD Learning with Linear Function Approximation

IEEE Transactions on Signal Processing ◽

10.1109/tsp.2021.3090952 ◽

2021 ◽

pp. 1-1

Author(s):

Zhaoxian Wu ◽

Han Shen ◽

Tianyi Chen ◽

Qing Ling

Keyword(s):

Linear Function ◽

Function Approximation ◽

Linear Function Approximation

Download Full-text

linear function approximation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Decentralized Multiagent Actor-Critic Algorithm Based on Message Diffusion

Minibatch Recursive Least Squares Q-Learning

Byzantine-Resilient Decentralized TD Learning with Linear Function Approximation

Gaussian Based Non-linear Function Approximation for Reinforcement Learning

A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation

Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling

Reinforcement learning vs. rule-based adaptive traffic signal control: A Fourier basis linear function approximation for traffic signal control

Using Reinforcement Learning to Control Traffic Signals in a Real-World Scenario: An Approach Based on Linear Function Approximation

Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation

Byzantine-Resilient Decentralized TD Learning with Linear Function Approximation

Export Citation Format

linear function approximationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Decentralized Multiagent Actor-Critic Algorithm Based on Message Diffusion

Minibatch Recursive Least Squares Q-Learning

Byzantine-Resilient Decentralized TD Learning with Linear Function Approximation

Gaussian Based Non-linear Function Approximation for Reinforcement Learning

A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation

Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling

Reinforcement learning vs. rule-based adaptive traffic signal control: A Fourier basis linear function approximation for traffic signal control

Using Reinforcement Learning to Control Traffic Signals in a Real-World Scenario: An Approach Based on Linear Function Approximation

Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation

Byzantine-Resilient Decentralized TD Learning with Linear Function Approximation

linear function approximation
Recently Published Documents