FoLaR: Foggy Latent Representations for Reinforcement Learning with Partial Observability

Bangkok is notorious for its chronic traffic congestion due to the rapid urbanization and the haphazard city plan. The Sathorn Road network area stands to be one of the most critical areas where gridlocks are a normal occurrence during rush hours. This stems from the high volume of demand imposed by the dense geographical placement of 3 big educational institutions and the insufficient link capacity with strict routes. Current solutions place heavy reliance on human traffic control expertises to prevent and disentangle gridlocks by consecutively releasing each queue length spillback through inter-junction coordination. A calibrated dataset of the Sathorn Road network area in a microscopic road traffic simulation package SUMO (Simulation of Urban MObility) is provided in the work of Chula-Sathorn SUMO Simulator (Chula-SSS). In this paper, we aim to utilize the Chula-SSS dataset with extended vehicle flows and gridlocks in order to further optimize the present traffic signal control policies with reinforcement learning approaches by an artificial agent. Reinforcement learning has been successful in a variety of domains over the past few years. While a number of researches exist on using reinforcement learning with adaptive traffic light control, existing studies often lack pragmatic considerations concerning application to the physical world especially for the traffic system infrastructure in developing countries, which suffer from constraints imposed from economic factors. The resultant limitation of the agent’s partial observability of the whole network state at any specific time is imperative and cannot be overlooked. With such partial observability constraints, this paper has reported an investigation on applying the Ape-X Deep Q-Network agent at the critical junction in the morning rush hours from 6 AM to 9 AM with practically occasional presence of gridlocks. The obtainable results have shown a potential value of the agent’s ability to learn despite physical limitations in the traffic light control at the considered intersection within the Sathorn gridlock area. This suggests a possibility of further investigations on agent applicability in trying to mitigate complex interconnected gridlocks in the future.

Download Full-text

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability (Extended Abstract)

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/706 ◽

2020 ◽

Author(s):

Vincent Francois-Lavet ◽

Guillaume Rabusseau ◽

Joelle Pineau ◽

Damien Ernst ◽

Raphael Fonteneau

Keyword(s):

Reinforcement Learning ◽

Theoretical Analysis ◽

Asymptotic Bias ◽

Limited Information ◽

Limited Data ◽

State Representation ◽

Error Sources ◽

Partial Observability ◽

Batch Reinforcement Learning

When an agent has limited information on its environment, the suboptimality of an RL algorithm can be decomposed into the sum of two terms: a term related to an asymptotic bias (suboptimality with unlimited data) and a term due to overfitting (additional suboptimality due to limited data). In the context of reinforcement learning with partial observability, this paper provides an analysis of the tradeoff between these two error sources. In particular, our theoretical analysis formally characterizes how a smaller state representation increases the asymptotic bias while decreasing the risk of overfitting.

Download Full-text

Cooperative Multi-Agent Reinforcement Learning with Hierarchical Relation Graph under Partial Observability

2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) ◽

10.1109/ictai50040.2020.00011 ◽

2020 ◽

Author(s):

Yang Li ◽

Xinzhi Wang ◽

Jianshu Wang ◽

Wei Wang ◽

Xiangfeng Luo ◽

...

Keyword(s):

Reinforcement Learning ◽

Partial Observability ◽

Multi Agent ◽

Relation Graph ◽

Hierarchical Relation

Download Full-text

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11478 ◽

2019 ◽

Vol 65 ◽

pp. 1-30 ◽

Cited By ~ 2

Author(s):

Vincent Francois-Lavet ◽

Guillaume Rabusseau ◽

Joelle Pineau ◽

Damien Ernst ◽

Raphael Fonteneau

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Asymptotic Bias ◽

State Representation ◽

Real World Data ◽

Partial Observability ◽

History Of ◽

Batch Reinforcement Learning ◽

Partially Observable ◽

Belief States

This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. This analysis relies on expressing the quality of a state representation by bounding $L_1$ error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations, both on synthetic POMDPs and on a large-scale POMDP in the context of smartgrids, with real-world data. Finally, similarly to known results in the fully observable setting, we also briefly discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting in the partially observable context.

Download Full-text