scholarly journals An Improved Reinforcement Learning Algorithm for Cooperative Behaviors of Mobile Robots

2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Yong Song ◽  
Yibin Li ◽  
Xiaoli Wang ◽  
Xin Ma ◽  
Jiuhong Ruan

Reinforcement learning algorithm for multirobot will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequentialQ-learning based on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in the database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be appended to the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multirobot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.

2012 ◽  
Vol 588-589 ◽  
pp. 1515-1518
Author(s):  
Yong Song ◽  
Bing Liu ◽  
Yi Bin Li

Reinforcement learning algorithm for multi-robot may will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequential Q-learning base on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be joined in the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multi-robot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.


Symmetry ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 132
Author(s):  
Jianfeng Zheng ◽  
Shuren Mao ◽  
Zhenyu Wu ◽  
Pengcheng Kong ◽  
Hao Qiang

To solve the problems of poor exploration ability and convergence speed of traditional deep reinforcement learning in the navigation task of the patrol robot under indoor specified routes, an improved deep reinforcement learning algorithm based on Pan/Tilt/Zoom(PTZ) image information was proposed in this paper. The obtained symmetric image information and target position information are taken as the input of the network, the speed of the robot is taken as the output of the next action, and the circular route with boundary is taken as the test. The improved reward and punishment function is designed to improve the convergence speed of the algorithm and optimize the path so that the robot can plan a safer path while avoiding obstacles first. Compared with Deep Q Network(DQN) algorithm, the convergence speed after improvement is shortened by about 40%, and the loss function is more stable.


2018 ◽  
Vol 9 (1) ◽  
pp. 235-253 ◽  
Author(s):  
George Velentzas ◽  
Theodore Tsitsimis ◽  
Iñaki Rañó ◽  
Costas Tzafestas ◽  
Mehdi Khamassi

Abstract Using assistive robots for educational applications requires robots to be able to adapt their behavior specifically for each child with whom they interact.Among relevant signals, non-verbal cues such as the child’s gaze can provide the robot with important information about the child’s current engagement in the task, and whether the robot should continue its current behavior or not. Here we propose a reinforcement learning algorithm extended with active state-specific exploration and show its applicability to child engagement maximization as well as more classical tasks such as maze navigation. We first demonstrate its adaptive nature on a continuous maze problem as an enhancement of the classic grid world. There, parameterized actions enable the agent to learn single moves until the end of a corridor, similarly to “options” but without explicit hierarchical representations.We then apply the algorithm to a series of simulated scenarios, such as an extended Tower of Hanoi where the robot should find the appropriate speed of movement for the interacting child, and to a pointing task where the robot should find the child-specific appropriate level of expressivity of action. We show that the algorithm enables to cope with both global and local non-stationarities in the state space while preserving a stable behavior in other stationary portions of the state space. Altogether, these results suggest a promising way to enable robot learning based on non-verbal cues and the high degree of non-stationarities that can occur during interaction with children.


Electronics ◽  
2021 ◽  
Vol 10 (16) ◽  
pp. 1929
Author(s):  
Huan Shen ◽  
Yao Zhang ◽  
Jianguo Mao ◽  
Zhiwei Yan ◽  
Linwei Wu

In order to solve the flight time problem of Unmanned Aerial Vehicles (UAV), this paper proposes a set of energy management strategies based on reinforcement learning for hybrid agricultural UAV. The battery is used to optimize the working point of internal combustion engines to the greatest extent while solving the high power demand issues of UAV and the response problem of internal combustion engines. Firstly, the decision-making oriented hybrid model and UAV dynamic model are established. Owing to the characteristics of the energy management strategy (EMS) based on reinforcement learning (RL), which is an intelligent optimization algorithm that has emerged in recent years, the complex theoretical formula derivation is avoided in the modeling process. In terms of the EMS, a double Q learning algorithm with strong convergence is adopted. The algorithm separates the state action value function database used in derivation decisions and the state action value function-updated database brought by the decision, so as to avoid delay and shock within the convergence process caused by maximum deviation. After the improvement, the off-line training is carried out with a large number of flight data generated in the past. The simulation results demonstrate that the improved algorithm can show better performance with less learning cost than before by virtue of the search function strategy proposed in this paper. In the state space, time-based and residual fuel-based selection are carried out successively, and the convergence rate and application effect are compared and analyzed. The results show that the learning algorithm has stronger robustness and convergence speed due to the appropriate selection of state space under different types of operating cycles. After 120,000 cycles of training, the fuel economy of the improved algorithm in this paper can reach more than 90% of that of the optimal solution, and can perform stably in actual flight.


Sign in / Sign up

Export Citation Format

Share Document