Deep Reinforcement Learning Based High-level Driving Behavior Decision-making Model in Heterogeneous Traffic

Reinforcement learning (RL) is an attractive way to implement high-level decision-making policies for autonomous driving, but learning directly from a real vehicle or a high-fidelity simulator is variously infeasible. We therefore consider the problem of transfer reinforcement learning and study how a policy learned in a simple environment using WiseMove can be transferred to our high-fidelity simulator, W ise M ove . WiseMove is a framework to study safety and other aspects of RL for autonomous driving. W ise M ove accurately reproduces the dynamics and software stack of our real vehicle. We find that the accurately modelled perception errors in W ise M ove contribute the most to the transfer problem. These errors, when even naively modelled in WiseMove , provide an RL policy that performs better in W ise M ove than a hand-crafted rule-based policy. Applying domain randomization to the environment in WiseMove yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has significantly less reliance on velocity compared to the rule-based policy, having learned that its measurement is unreliable.

Download Full-text

Strategic Tasks for Explainable Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.330110007 ◽

2019 ◽

Vol 33 ◽

pp. 10007-10008 ◽

Cited By ~ 1

Author(s):

Rey Pocius ◽

Lawrence Neal ◽

Alan Fern

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Environment ◽

Strategic Decision ◽

Strategic Decision Making ◽

Sequential Decision Making ◽

Sequential Decision ◽

Level Control ◽

Mini Games ◽

High Level

Commonly used sequential decision making tasks such as the games in the Arcade Learning Environment (ALE) provide rich observation spaces suitable for deep reinforcement learning. However, they consist mostly of low-level control tasks which are of limited use for the development of explainable artificial intelligence(XAI) due to the fine temporal resolution of the tasks. Many of these domains also lack built-in high level abstractions and symbols. Existing tasks that provide for both strategic decision-making and rich observation spaces are either difficult to simulate or are intractable. We provide a set of new strategic decision-making tasks specialized for the development and evaluation of explainable AI methods, built as constrained mini-games within the StarCraft II Learning Environment.

Download Full-text

Logic-Based Sequential Decision-Making

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019995 ◽

2019 ◽

Vol 33 ◽

pp. 9995-9996

Author(s):

Daoming Lyu ◽

Fangkai Yang ◽

Bo Liu ◽

Daesub Yoon

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

High Dimensional ◽

Great Success ◽

Sequential Decision ◽

Sensory Inputs ◽

Hierarchical Decision ◽

High Level ◽

Data Efficiency ◽

Symbolic Planning

Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options. This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.

Download Full-text

A Reinforcement Learning Approach with Spline-Fit Object Tracking for AIBO Robot’s High Level Decision Making

Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2011 - Studies in Computational Intelligence ◽

10.1007/978-3-642-22288-7_14 ◽

2011 ◽

pp. 169-183

Author(s):

Subhasis Mukherjee ◽

Shamsul Huda ◽

John Yearwood

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Object Tracking ◽

Learning Approach ◽

High Level

Download Full-text

Neural Signatures of Prediction Errors in a Decision-Making Task Are Modulated by Action Execution Failures

10.1101/474361 ◽

2018 ◽

Author(s):

Samuel D. McDougle ◽

Peter A. Butcher ◽

Darius Parvin ◽

Fasial Mushtaq ◽

Yael Niv ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Prediction Error ◽

Learning Task ◽

Prediction Errors ◽

Action Execution ◽

Model Driven ◽

Assignment Of Credit ◽

Level Information ◽

High Level

AbstractDecisions must be implemented through actions, and actions are prone to error. As such, when an expected outcome is not obtained, an individual should not only be sensitive to whether the choice itself was suboptimal, but also whether the action required to indicate that choice was executed successfully. The intelligent assignment of credit to action execution versus action selection has clear ecological utility for the learner. To explore this scenario, we used a modified version of a classic reinforcement learning task in which feedback indicated if negative prediction errors were, or were not, associated with execution errors. Using fMRI, we asked if prediction error computations in the human striatum, a key substrate in reinforcement learning and decision making, are modulated when a failure in action execution results in the negative outcome. Participants were more tolerant of non-rewarded outcomes when these resulted from execution errors versus when execution was successful but the reward was withheld. Consistent with this behavior, a model-driven analysis of neural activity revealed an attenuation of the signal associated with negative reward prediction error in the striatum following execution failures. These results converge with other lines of evidence suggesting that prediction errors in the mesostriatal dopamine system integrate high-level information during the evaluation of instantaneous reward outcomes.

Download Full-text

Enhancing Dialog Coherence with Event Graph Grounded Content Planning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/545 ◽

2020 ◽

Author(s):

Jun Xu ◽

Zeyang Lei ◽

Haifeng Wang ◽

Zheng-Yu Niu ◽

Hua Wu ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Policy Decision ◽

Experimental Results ◽

Open Domain ◽

Narrative Texts ◽

Policy Decision Making ◽

Event Planning ◽

High Level ◽

Event Chains

How to generate informative, coherent and sustainable open-domain conversations is a non-trivial task. Previous work on knowledge grounded conversation generation focus on improving dialog informativeness with little attention on dialog coherence. In this paper, to enhance multi-turn dialog coherence, we propose to leverage event chains to help determine a sketch of a multi-turn dialog. We first extract event chains from narrative texts and connect them as a graph. We then present a novel event graph grounded Reinforcement Learning (RL) framework. It conducts high-level response content (simply an event) planning by learning to walk over the graph, and then produces a response conditioned on the planned content. In particular, we devise a novel multi-policy decision making mechanism to foster a coherent dialog with both appropriate content ordering and high contextual relevance. Experimental results indicate the effectiveness of this framework in terms of dialog coherence and informativeness.

Download Full-text

Embedding High-Level Knowledge into DQNs to Learn Faster and More Safely

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i09.7091 ◽

2020 ◽

Vol 34 (09) ◽

pp. 13608-13609

Author(s):

Zihang Gao ◽

Fangzhen Lin ◽

Yi Zhou ◽

Hao Zhang ◽

Kaishun Wu ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Early Stage ◽

Training Process ◽

Learning Efficiency ◽

Embedded Knowledge ◽

High Level

Deep reinforcement learning has been successfully applied in many decision making scenarios. However, the slow training process and difficulty in explaining limit its application. In this paper, we attempt to address some of these problems by proposing a framework of Rule-interposing Learning (RIL) that embeds knowledge into deep reinforcement learning. In this framework, the rules dynamically effect the training progress, and accelerate the learning. The embedded knowledge in form of rule not only improves learning efficiency, but also prevents unnecessary or disastrous explorations at early stage of training. Moreover, the modularity of the framework makes it straightforward to transfer high-level knowledge among similar tasks.

Download Full-text

Decision-Making for the Autonomous Navigation of Maritime Autonomous Surface Ships Based on Scene Division and Deep Reinforcement Learning

Sensors ◽

10.3390/s19184055 ◽

2019 ◽

Vol 19 (18) ◽

pp. 4055 ◽

Cited By ~ 9

Author(s):

Zhang ◽

Wang ◽

Liu ◽

Chen

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Collision Avoidance ◽

Autonomous Navigation ◽

Learning Algorithm ◽

Q Learning ◽

Reward Function ◽

International Regulations ◽

Convergence Trend ◽

Decision Making Model

This research focuses on the adaptive navigation of maritime autonomous surface ships (MASSs) in an uncertain environment. To achieve intelligent obstacle avoidance of MASSs in a port, an autonomous navigation decision-making model based on hierarchical deep reinforcement learning is proposed. The model is mainly composed of two layers: the scene division layer and an autonomous navigation decision-making layer. The scene division layer mainly quantifies the sub-scenarios according to the International Regulations for Preventing Collisions at Sea (COLREG). This research divides the navigational situation of a ship into entities and attributes based on the ontology model and Protégé language. In the decision-making layer, we designed a deep Q-learning algorithm utilizing the environmental model, ship motion space, reward function, and search strategy to learn the environmental state in a quantized sub-scenario to train the navigation strategy. Finally, two sets of verification experiments of the deep reinforcement learning (DRL) and improved DRL algorithms were designed with Rizhao port as a study case. Moreover, the experimental data were analyzed in terms of the convergence trend, iterative path, and collision avoidance effect. The results indicate that the improved DRL algorithm could effectively improve the navigation safety and collision avoidance.

Download Full-text