High-level Decision Making for Safe and Reasonable Autonomous Lane Changing using Reinforcement Learning

Reinforcement learning (RL) is an attractive way to implement high-level decision-making policies for autonomous driving, but learning directly from a real vehicle or a high-fidelity simulator is variously infeasible. We therefore consider the problem of transfer reinforcement learning and study how a policy learned in a simple environment using WiseMove can be transferred to our high-fidelity simulator, W ise M ove . WiseMove is a framework to study safety and other aspects of RL for autonomous driving. W ise M ove accurately reproduces the dynamics and software stack of our real vehicle. We find that the accurately modelled perception errors in W ise M ove contribute the most to the transfer problem. These errors, when even naively modelled in WiseMove , provide an RL policy that performs better in W ise M ove than a hand-crafted rule-based policy. Applying domain randomization to the environment in WiseMove yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has significantly less reliance on velocity compared to the rule-based policy, having learned that its measurement is unreliable.

Download Full-text

Deep Reinforcement Learning Based High-level Driving Behavior Decision-making Model in Heterogeneous Traffic

2019 Chinese Control Conference (CCC) ◽

10.23919/chicc.2019.8866005 ◽

2019 ◽

Author(s):

Zhengwei Bai ◽

Wei Shangguan ◽

Baigen Cai ◽

Linguo Chai

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Driving Behavior ◽

Heterogeneous Traffic ◽

High Level ◽

Decision Making Model

Download Full-text

Strategic Tasks for Explainable Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.330110007 ◽

2019 ◽

Vol 33 ◽

pp. 10007-10008 ◽

Cited By ~ 1

Author(s):

Rey Pocius ◽

Lawrence Neal ◽

Alan Fern

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Environment ◽

Strategic Decision ◽

Strategic Decision Making ◽

Sequential Decision Making ◽

Sequential Decision ◽

Level Control ◽

Mini Games ◽

High Level

Commonly used sequential decision making tasks such as the games in the Arcade Learning Environment (ALE) provide rich observation spaces suitable for deep reinforcement learning. However, they consist mostly of low-level control tasks which are of limited use for the development of explainable artificial intelligence(XAI) due to the fine temporal resolution of the tasks. Many of these domains also lack built-in high level abstractions and symbols. Existing tasks that provide for both strategic decision-making and rich observation spaces are either difficult to simulate or are intractable. We provide a set of new strategic decision-making tasks specialized for the development and evaluation of explainable AI methods, built as constrained mini-games within the StarCraft II Learning Environment.

Download Full-text

Logic-Based Sequential Decision-Making

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019995 ◽

2019 ◽

Vol 33 ◽

pp. 9995-9996

Author(s):

Daoming Lyu ◽

Fangkai Yang ◽

Bo Liu ◽

Daesub Yoon

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

High Dimensional ◽

Great Success ◽

Sequential Decision ◽

Sensory Inputs ◽

Hierarchical Decision ◽

High Level ◽

Data Efficiency ◽

Symbolic Planning

Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options. This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.

Download Full-text

A Reinforcement Learning Approach with Spline-Fit Object Tracking for AIBO Robot’s High Level Decision Making

Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2011 - Studies in Computational Intelligence ◽

10.1007/978-3-642-22288-7_14 ◽

2011 ◽

pp. 169-183

Author(s):

Subhasis Mukherjee ◽

Shamsul Huda ◽

John Yearwood

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Object Tracking ◽

Learning Approach ◽

High Level

Download Full-text

Neural Signatures of Prediction Errors in a Decision-Making Task Are Modulated by Action Execution Failures

10.1101/474361 ◽

2018 ◽

Author(s):

Samuel D. McDougle ◽

Peter A. Butcher ◽

Darius Parvin ◽

Fasial Mushtaq ◽

Yael Niv ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Prediction Error ◽

Learning Task ◽

Prediction Errors ◽

Action Execution ◽

Model Driven ◽

Assignment Of Credit ◽

Level Information ◽

High Level

AbstractDecisions must be implemented through actions, and actions are prone to error. As such, when an expected outcome is not obtained, an individual should not only be sensitive to whether the choice itself was suboptimal, but also whether the action required to indicate that choice was executed successfully. The intelligent assignment of credit to action execution versus action selection has clear ecological utility for the learner. To explore this scenario, we used a modified version of a classic reinforcement learning task in which feedback indicated if negative prediction errors were, or were not, associated with execution errors. Using fMRI, we asked if prediction error computations in the human striatum, a key substrate in reinforcement learning and decision making, are modulated when a failure in action execution results in the negative outcome. Participants were more tolerant of non-rewarded outcomes when these resulted from execution errors versus when execution was successful but the reward was withheld. Consistent with this behavior, a model-driven analysis of neural activity revealed an attenuation of the signal associated with negative reward prediction error in the striatum following execution failures. These results converge with other lines of evidence suggesting that prediction errors in the mesostriatal dopamine system integrate high-level information during the evaluation of instantaneous reward outcomes.

Download Full-text

Enhancing Dialog Coherence with Event Graph Grounded Content Planning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/545 ◽

2020 ◽

Author(s):

Jun Xu ◽

Zeyang Lei ◽

Haifeng Wang ◽

Zheng-Yu Niu ◽

Hua Wu ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Policy Decision ◽

Experimental Results ◽

Open Domain ◽

Narrative Texts ◽

Policy Decision Making ◽

Event Planning ◽

High Level ◽

Event Chains

How to generate informative, coherent and sustainable open-domain conversations is a non-trivial task. Previous work on knowledge grounded conversation generation focus on improving dialog informativeness with little attention on dialog coherence. In this paper, to enhance multi-turn dialog coherence, we propose to leverage event chains to help determine a sketch of a multi-turn dialog. We first extract event chains from narrative texts and connect them as a graph. We then present a novel event graph grounded Reinforcement Learning (RL) framework. It conducts high-level response content (simply an event) planning by learning to walk over the graph, and then produces a response conditioned on the planned content. In particular, we devise a novel multi-policy decision making mechanism to foster a coherent dialog with both appropriate content ordering and high contextual relevance. Experimental results indicate the effectiveness of this framework in terms of dialog coherence and informativeness.

Download Full-text

Research on decision-making of lane-changing of automated vehicles in highway confluence area based on deep reinforcement learning

10.1109/cvci54083.2021.9661204 ◽

2021 ◽

Author(s):

Shuang Tang ◽

Hong Shu ◽

Yu Tang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Automated Vehicles ◽

Lane Changing

Download Full-text

Embedding High-Level Knowledge into DQNs to Learn Faster and More Safely

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i09.7091 ◽

2020 ◽

Vol 34 (09) ◽

pp. 13608-13609

Author(s):

Zihang Gao ◽

Fangzhen Lin ◽

Yi Zhou ◽

Hao Zhang ◽

Kaishun Wu ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Early Stage ◽

Training Process ◽

Learning Efficiency ◽

Embedded Knowledge ◽

High Level

Deep reinforcement learning has been successfully applied in many decision making scenarios. However, the slow training process and difficulty in explaining limit its application. In this paper, we attempt to address some of these problems by proposing a framework of Rule-interposing Learning (RIL) that embeds knowledge into deep reinforcement learning. In this framework, the rules dynamically effect the training progress, and accelerate the learning. The embedded knowledge in form of rule not only improves learning efficiency, but also prevents unnecessary or disastrous explorations at early stage of training. Moreover, the modularity of the framework makes it straightforward to transfer high-level knowledge among similar tasks.

Download Full-text

Hierarchical Active Tracking Control for UAVs via Deep Reinforcement Learning

Applied Sciences ◽

10.3390/app112210595 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10595

Author(s):

Wenlong Zhao ◽

Zhijun Meng ◽

Kaipeng Wang ◽

Jiahui Zhang ◽

Shaoze Lu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Tracking Control ◽

Spatial Information ◽

Ground Truth ◽

Tracking Task ◽

Learning To Learn ◽

Low Level ◽

Dynamic Target ◽

High Level

Active tracking control is essential for UAVs to perform autonomous operations in GPS-denied environments. In the active tracking task, UAVs take high-dimensional raw images as input and execute motor actions to actively follow the dynamic target. Most research focuses on three-stage methods, which entail perception first, followed by high-level decision-making based on extracted spatial information of the dynamic target, and then UAV movement control, using a low-level dynamic controller. Perception methods based on deep neural networks are powerful but require considerable effort for manual ground truth labeling. Instead, we unify the perception and decision-making stages using a high-level controller and then leverage deep reinforcement learning to learn the mapping from raw images to the high-level action commands in the V-REP-based environment, where simulation data are infinite and inexpensive. This end-to-end method also has the advantages of a small parameter size and reduced effort requirements for parameter turning in the decision-making stage. The high-level controller, which has a novel architecture, explicitly encodes the spatial and temporal features of the dynamic target. Auxiliary segmentation and motion-in-depth losses are introduced to generate denser training signals for the high-level controller’s fast and stable training. The high-level controller and a conventional low-level PID controller constitute our hierarchical active tracking control framework for the UAVs’ active tracking task. Simulation experiments show that our controller trained with several augmentation techniques sufficiently generalizes dynamic targets with random appearances and velocities, and achieves significantly better performance, compared with three-stage methods.

Download Full-text