Reinforcement learning (RL) is an attractive way to implement high-level decision-making policies for autonomous driving, but learning directly from a real vehicle or a high-fidelity simulator is variously infeasible. We therefore consider the problem of transfer reinforcement learning and study how a policy learned in a simple environment using
WiseMove
can be transferred to our high-fidelity simulator, W
ise
M
ove
.
WiseMove
is a framework to study safety and other aspects of RL for autonomous driving. W
ise
M
ove
accurately reproduces the dynamics and software stack of our real vehicle.
We find that the accurately modelled perception errors in W
ise
M
ove
contribute the most to the transfer problem. These errors, when even naively modelled in
WiseMove
, provide an RL policy that performs better in W
ise
M
ove
than a hand-crafted rule-based policy. Applying domain randomization to the environment in
WiseMove
yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has significantly less reliance on velocity compared to the rule-based policy, having learned that its measurement is unreliable.