Modeling human-like longitudinal driver model for intelligent vehicles based on reinforcement learning

<div>Recently, barrier function-based safe reinforcement learning (RL) with the actor-critic structure for continuous control tasks has received increasing attention. It is still challenging to learn a near-optimal control policy with safety and convergence guarantees. Also, few works have addressed the safe RL algorithm design under time-varying safety constraints. This paper proposes a model-based safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. In the proposed approach, we construct a novel barrier-based control policy structure that can guarantee control safety. A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely. Theoretical results on stability and robustness are proven. Also, the convergence of the actor-critic learning algorithm is analyzed. The performance of the proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. Furthermore, the approach is applied to the integrated path following and collision avoidance problem for two real-world intelligent vehicles. A differential-drive vehicle and an Ackermann-drive one are used to verify the offline deployment performance and the online learning performance, respectively. Our approach shows an impressive sim-to-real transfer capability and a satisfactory online control performance in the experiment.</div>

Download Full-text

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

10.36227/techrxiv.17205740 ◽

2021 ◽

Author(s):

Xinglong Zhang ◽

Yaoqian Peng ◽

Biao Luo ◽

Wei Pan ◽

Xin Xu ◽

...

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Control Policy ◽

Intelligent Vehicles ◽

Time Varying ◽

Control Constraints ◽

Model Based ◽

Safety Constraints ◽

And Control ◽

State And Control Constraints

<div>Recently, barrier function-based safe reinforcement learning (RL) with the actor-critic structure for continuous control tasks has received increasing attention. It is still challenging to learn a near-optimal control policy with safety and convergence guarantees. Also, few works have addressed the safe RL algorithm design under time-varying safety constraints. This paper proposes a model-based safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. In the proposed approach, we construct a novel barrier-based control policy structure that can guarantee control safety. A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely. Theoretical results on stability and robustness are proven. Also, the convergence of the actor-critic learning algorithm is analyzed. The performance of the proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. Furthermore, the approach is applied to the integrated path following and collision avoidance problem for two real-world intelligent vehicles. A differential-drive vehicle and an Ackermann-drive one are used to verify the offline deployment performance and the online learning performance, respectively. Our approach shows an impressive sim-to-real transfer capability and a satisfactory online control performance in the experiment.</div>

Download Full-text

Modeling adaptive preview time of driver model for intelligent vehicles based on deep learning

Proceedings of the Institution of Mechanical Engineers Part I Journal of Systems and Control Engineering ◽

10.1177/09596518211028372 ◽

2021 ◽

pp. 095965182110283

Author(s):

Ju Xie ◽

Xing Xu ◽

Feng Wang ◽

Long Chen

Keyword(s):

Closed Loop ◽

Path Following ◽

Intelligent Vehicles ◽

Loop System ◽

Driver Model ◽

Intelligent Vehicle ◽

Training Procedure ◽

Closed Loop System ◽

Time Model ◽

Driving Conditions

In order to improve the adaptability and tracking performance of intelligent vehicles under complex driving conditions, and simulate the manipulation characteristics of the real driver in the driver–vehicle–road closed-loop system, a kind of adaptive preview time model for intelligent vehicle driver model is proposed. This article builds the intelligent vehicle driver model based on optimal preview control theory and the basic preview time is identified to minimize path error under various conditions based on particle swarm optimization. Then, the ideal compensation preview time is constructed in various conditions and the appropriate factors affecting compensation preview time are filtered out according to correlation analysis. Moreover, the architecture and training procedure of deep network is specified for compensation preview time prediction. Finally, the adaptive preview time is modeled by combining the basic preview time with the compensation preview time and the validity of adaptive preview time model is verified by the driver–vehicle–road closed-loop system under normal and aggressive driving conditions. The results show that the proposed adaptive preview time model can help intelligent vehicles better adapt complex driving conditions and effectively improve the path-following performance.

Download Full-text

Data Efficient Reinforcement Learning for Integrated Lateral Planning and Control in Automated Parking System

Sensors ◽

10.3390/s20247297 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7297

Author(s):

Shaoyu Song ◽

Hui Chen ◽

Hongwei Sun ◽

Meicen Liu

Keyword(s):

Reinforcement Learning ◽

Path Following ◽

Monte Carlo Tree Search ◽

Integrated Method ◽

Model Based ◽

Planning And Control ◽

Model Free ◽

Data Efficiency ◽

And Control ◽

Path Following Methods

Reinforcement learning (RL) is a promising direction in automated parking systems (APSs), as integrating planning and tracking control using RL can potentially maximize the overall performance. However, commonly used model-free RL requires many interactions to achieve acceptable performance, and model-based RL in APS cannot continuously learn. In this paper, a data-efficient RL method is constructed to learn from data by use of a model-based method. The proposed method uses a truncated Monte Carlo tree search to evaluate parking states and select moves. Two artificial neural networks are trained to provide the search probability of each tree branch and the final reward for each state using self-trained data. The data efficiency is enhanced by weighting exploration with parking trajectory returns, an adaptive exploration scheme, and experience augmentation with imaginary rollouts. Without human demonstrations, a novel training pipeline is also used to train the initial action guidance network and the state value network. Compared with path planning and path-following methods, the proposed integrated method can flexibly co-ordinate the longitudinal and lateral motion to park a smaller parking space in one maneuver. Its adaptability to changes in the vehicle model is verified by joint Carsim and MATLAB simulation, demonstrating that the algorithm converges within a few iterations. Finally, experiments using a real vehicle platform are used to further verify the effectiveness of the proposed method. Compared with obtaining rewards using simulation, the proposed method achieves a better final parking attitude and success rate.

Download Full-text

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

10.36227/techrxiv.17205740.v1 ◽

2021 ◽

Author(s):

Xinglong Zhang ◽

Yaoqian Peng ◽

Biao Luo ◽

Wei Pan ◽

Xin Xu ◽

...

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Control Policy ◽

Intelligent Vehicles ◽

Time Varying ◽

Control Constraints ◽

Model Based ◽

Safety Constraints ◽

And Control ◽

State And Control Constraints

<div>Recently, barrier function-based safe reinforcement learning (RL) with the actor-critic structure for continuous control tasks has received increasing attention. It is still challenging to learn a near-optimal control policy with safety and convergence guarantees. Also, few works have addressed the safe RL algorithm design under time-varying safety constraints. This paper proposes a model-based safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. In the proposed approach, we construct a novel barrier-based control policy structure that can guarantee control safety. A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely. Theoretical results on stability and robustness are proven. Also, the convergence of the actor-critic learning algorithm is analyzed. The performance of the proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. Furthermore, the approach is applied to the integrated path following and collision avoidance problem for two real-world intelligent vehicles. A differential-drive vehicle and an Ackermann-drive one are used to verify the offline deployment performance and the online learning performance, respectively. Our approach shows an impressive sim-to-real transfer capability and a satisfactory online control performance in the experiment.</div>

Download Full-text

Modeling and Control for Plant Dynamics Based on Reinforcement Learning

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.129.363 ◽

2009 ◽

Vol 129 (4) ◽

pp. 363-367

Author(s):

Tomoyuki Maeda ◽

Makishi Nakayama ◽

Hiroshi Narazaki ◽

Akira Kitamura

Keyword(s):

Reinforcement Learning ◽

Modeling And Control ◽

Plant Dynamics ◽

And Control

Download Full-text

Learning and control

10.1093/oso/9780199674923.003.0026 ◽

2018 ◽

Author(s):

Ivan Herreros

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Brain Function ◽

Control Strategies ◽

Learning Problems ◽

Animal Learning ◽

Feed Forward Control ◽

Machine Learning Applications ◽

And Control

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.

Download Full-text

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text