Continuous reinforcement learning based ramp jump control for single-track two-wheeled robots

Author(s):  
Qingyuan Zheng ◽  
Duo Wang ◽  
Zhang Chen ◽  
Yiyong Sun ◽  
Bin Liang

Single-track two-wheeled robots have become an important research topic in recent years, owing to their simple structure, energy savings and ability to run on narrow roads. However, the ramp jump remains a challenging task. In this study, we propose to realize a single-track two-wheeled robot ramp jump. We present a control method that employs continuous action reinforcement learning techniques for single-track two-wheeled robot control. We design a novel reward function for reinforcement learning, optimize the dimensions of the action space, and enable training under the deep deterministic policy gradient algorithm. Finally, we validate the control method through simulation experiments and successfully realize the single-track two-wheeled robot ramp jump task. Simulation results validate that the control method is effective and has several advantages over high-dimension action space control, reinforcement learning control of sparse reward function and discrete action reinforcement learning control.

Author(s):  
Feng Pan ◽  
Hong Bao

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.


2021 ◽  
Vol 103 (4) ◽  
Author(s):  
Bartomeu Rubí ◽  
Bernardo Morcego ◽  
Ramon Pérez

AbstractA deep reinforcement learning approach for solving the quadrotor path following and obstacle avoidance problem is proposed in this paper. The problem is solved with two agents: one for the path following task and another one for the obstacle avoidance task. A novel structure is proposed, where the action computed by the obstacle avoidance agent becomes the state of the path following agent. Compared to traditional deep reinforcement learning approaches, the proposed method allows to interpret the training process outcomes, is faster and can be safely trained on the real quadrotor. Both agents implement the Deep Deterministic Policy Gradient algorithm. The path following agent was developed in a previous work. The obstacle avoidance agent uses the information provided by a low-cost LIDAR to detect obstacles around the vehicle. Since LIDAR has a narrow field-of-view, an approach for providing the agent with a memory of the previously seen obstacles is developed. A detailed description of the process of defining the state vector, the reward function and the action of this agent is given. The agents are programmed in python/tensorflow and are trained and tested in the RotorS/gazebo platform. Simulations results prove the validity of the proposed approach.


2021 ◽  
Author(s):  
Xiaoliang Zheng ◽  
Gongping Wu

Abstract Robot intelligence includes motion intelligence and cognitive intelligence. Aiming at the motion intelligence, a hierarchical reinforcement learning architecture considering stochastic wind disturbance is proposed for the decision-making of the power line maintenance robot with autonomous operation. This architecture uses the prior information of the mechanism knowledge and empirical data to improve the safety and efficiency of the robot operation. In this architecture, the high-level policy selection and the low-level motion control at global and local levels are considered comprehensively under the condition of stochastic wind disturbance. Firstly, the operation task is decomposed into three sub-policies: global obstacle avoidance, local approach and local tightening, and each sub-policy is learned. Then, a master policy is learned to select the operation sub-policy in the current state. The dual deep Q network algorithm is used for the master policy, while the deep deterministic policy gradient algorithm is used for the operation policy. In order to improve the training efficiency, the global obstacle avoidance sub-policy takes the random forest composed of dynamic environmental decision tree as the expert algorithm for imitation learning. The architecture is applied to a power line maintenance scenario, the state function and reward function of each policy are designed, and all policies are trained in an asynchronous and parallel computing environment. It is proved that this architecture can realize stable and safe autonomous operating decision for the power line maintenance robot subjected to stochastic wind disturbance.


2021 ◽  
Vol 9 ◽  
Author(s):  
Jiawen Li ◽  
Yaping Li ◽  
Tao Yu

In order to improve the stability of proton exchange membrane fuel cell (PEMFC) output voltage, a data-driven output voltage control strategy based on regulation of the duty cycle of the DC-DC converter is proposed in this paper. In detail, an imitation-oriented twin delay deep deterministic (IO-TD3) policy gradient algorithm which offers a more robust voltage control strategy is demonstrated. This proposed output voltage control method is a distributed deep reinforcement learning training framework, the design of which is guided by the pedagogic concept of imitation learning. The effectiveness of the proposed control strategy is experimentally demonstrated.


2019 ◽  
Vol 2019 ◽  
pp. 1-8
Author(s):  
Xi-liang Chen ◽  
Lei Cao ◽  
Zhi-xiong Xu ◽  
Jun Lai ◽  
Chen-xi Li

The assumption of IRL is that demonstrations are optimally acting in an environment. In the past, most of the work on IRL needed to calculate optimal policies for different reward functions. However, this requirement is difficult to satisfy in large or continuous state space tasks. Let alone continuous action space. We propose a continuous maximum entropy deep inverse reinforcement learning algorithm for continuous state space and continues action space, which realizes the depth cognition of the environment model by the way of reconstructing the reward function based on the demonstrations, and a hot start mechanism based on demonstrations to make the training process faster and better. We compare this new approach to well-known IRL algorithms using Maximum Entropy IRL, DDPG, hot start DDPG, etc. Empirical results on classical control environments on OpenAI Gym: MountainCarContinues-v0 show that our approach is able to learn policies faster and better.


2021 ◽  
Vol 11 (20) ◽  
pp. 9367
Author(s):  
Usman Ahmad Usmani ◽  
Junzo Watada ◽  
Jafreezal Jaafar ◽  
Izzatdin Abdul Aziz ◽  
Arunava Roy

Skin cancers are increasing at an alarming rate, and detection in the early stages is essential for advanced treatment. The current segmentation methods have limited labeling ability to the ground truth images due to the numerous noisy expert annotations present in the datasets. The precise boundary segmentation is essential to correctly locate and diagnose the various skin lesions. In this work, the lesion segmentation method is proposed as a Markov decision process. It is solved by training an agent to segment the region using a deep reinforcement-learning algorithm. Our method is similar to the delineation of a region of interest by the physicians. The agent follows a set of serial actions for the region delineation, and the action space is defined as a set of continuous action parameters. The segmentation model learns in continuous action space using the deep deterministic policy gradient algorithm. The proposed method enables continuous improvement in performance as we proceed from coarse segmentation results to finer results. Finally, our proposed model is evaluated on the International Skin Imaging Collaboration (ISIC) 2017 image dataset, Human against Machine (HAM10000), and PH2 dataset. On the ISIC 2017 dataset, the algorithm achieves an accuracy of 96.33% for the naevus cases, 95.39% for the melanoma cases, and 94.27% for the seborrheic keratosis cases. The other metrics are evaluated on these datasets and rank higher when compared with the current state-of-the-art lesion segmentation algorithms.


2016 ◽  
Vol 10 (1) ◽  
pp. 69-79 ◽  
Author(s):  
Juan Yan ◽  
Huibin Yang

Self-balancing control is the basis for applications of two-wheeled robots. In order to improve the self-balancing of two-wheeled robots, we propose a hierarchical reinforcement learning algorithm for controlling the balance of two-wheeled robots. After describing the subgoals of hierarchical reinforcement learning, we extract features for subgoals, define a feature value vector and its corresponding weight vector, and propose a reward function with additional subgoal reward function. Finally, we give a hierarchical reinforcement learning algorithm for finding the optimal strategy. Simulation experiments show that, the proposed algorithm is more effectiveness than traditional reinforcement learning algorithm in convergent speed. So in our system, the robots can get self-balanced very quickly.


Symmetry ◽  
2019 ◽  
Vol 11 (2) ◽  
pp. 290 ◽  
Author(s):  
SeungYoon Choi ◽  
Tuyen Le ◽  
Quang Nguyen ◽  
Md Layek ◽  
SeungGwan Lee ◽  
...  

In this paper, we propose a controller for a bicycle using the DDPG (Deep Deterministic Policy Gradient) algorithm, which is a state-of-the-art deep reinforcement learning algorithm. We use a reward function and a deep neural network to build the controller. By using the proposed controller, a bicycle can not only be stably balanced but also travel to any specified location. We confirm that the controller with DDPG shows better performance than the other baselines such as Normalized Advantage Function (NAF) and Proximal Policy Optimization (PPO). For the performance evaluation, we implemented the proposed algorithm in various settings such as fixed and random speed, start location, and destination location.


2019 ◽  
Vol 8 (3) ◽  
pp. 8527-8531

Power consumption-Traffic aware-Improved Resource Intensity Aware Load balancing (PT-IRIAL) method was proposed to balance load in cloud computing by choosing the migration Virtual Machines (VMs) and the destination Physical Machines (PMs). In this paper, an Artificial Intelligence (AI) technique called Reinforcement Learning (RL) is introduced to find out an optimal time to migrate the selected VM to the selected destination PM. RL enables an agent to find out the most appropriate time for VM migration based on the resource utilization, power consumption, temperature and traffic demand. RL is incorporated into the cloud environment by creating multiple state and action space. The state space is obtained through the computation of resource utilization, power consumption, temperature and traffic of selected VMs. The action space is represented as wait or migrate which is learned through a reward function. Based on the action space, the selected VMs are waiting or migrating to the selected destination PMs.


Sign in / Sign up

Export Citation Format

Share Document