An Improved Reinforcement Learning Algorithm for Cooperative Behaviors of Mobile Robots

Reinforcement learning algorithm for multirobot will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequentialQ-learning based on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in the database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be appended to the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multirobot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.

Download Full-text

The Knowledge Sharing Based Reinforcement Learning Algorithm for Collective Behaviors of Mobile Robots

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.588-589.1515 ◽

2012 ◽

Vol 588-589 ◽

pp. 1515-1518

Author(s):

Yong Song ◽

Bing Liu ◽

Yi Bin Li

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Knowledge Sharing ◽

State Space ◽

Learning Algorithm ◽

Collective Behaviors ◽

Q Learning ◽

Exponential Increase ◽

Multi Robot ◽

Reinforcement Learning Algorithm

Reinforcement learning algorithm for multi-robot may will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequential Q-learning base on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be joined in the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multi-robot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.

Download Full-text

Research on Reinforcement Learning Algorithm for Path Planning of Multiple Mobile Robots

Journal of Physics Conference Series ◽

10.1088/1742-6596/1915/4/042022 ◽

2021 ◽

Vol 1915 (4) ◽

pp. 042022

Author(s):

Ya Xu

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Mobile Robots ◽

Learning Algorithm ◽

Multiple Mobile Robots ◽

Reinforcement Learning Algorithm

Download Full-text

Improved Path Planning for Indoor Patrol Robot Based on Deep Reinforcement Learning

Symmetry ◽

10.3390/sym14010132 ◽

2022 ◽

Vol 14 (1) ◽

pp. 132

Author(s):

Jianfeng Zheng ◽

Shuren Mao ◽

Zhenyu Wu ◽

Pengcheng Kong ◽

Hao Qiang

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Loss Function ◽

Learning Algorithm ◽

Target Position ◽

Convergence Speed ◽

Position Information ◽

Image Information ◽

Navigation Task ◽

Reinforcement Learning Algorithm

To solve the problems of poor exploration ability and convergence speed of traditional deep reinforcement learning in the navigation task of the patrol robot under indoor specified routes, an improved deep reinforcement learning algorithm based on Pan/Tilt/Zoom(PTZ) image information was proposed in this paper. The obtained symmetric image information and target position information are taken as the input of the network, the speed of the robot is taken as the output of the next action, and the circular route with boundary is taken as the test. The improved reward and punishment function is designed to improve the convergence speed of the algorithm and optimize the path so that the robot can plan a safer path while avoiding obstacles first. Compared with Deep Q Network(DQN) algorithm, the convergence speed after improvement is shortened by about 40%, and the loss function is more stable.

Download Full-text

Toward Obstacle Avoidance for Mobile Robots Using Deep Reinforcement Learning Algorithm

10.1109/iciea51954.2021.9516114 ◽

2021 ◽

Author(s):

Xiaoshan Gao ◽

Liang Yan ◽

Gang Wang ◽

Tiantian Wang ◽

Nannan Du ◽

...

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Obstacle Avoidance ◽

Learning Algorithm ◽

Reinforcement Learning Algorithm

Download Full-text

Adaptive reinforcement learning with active state-specific exploration for engagement maximization during simulated child-robot interaction

Paladyn Journal of Behavioral Robotics ◽

10.1515/pjbr-2018-0016 ◽

2018 ◽

Vol 9 (1) ◽

pp. 235-253 ◽

Cited By ~ 2

Author(s):

George Velentzas ◽

Theodore Tsitsimis ◽

Iñaki Rañó ◽

Costas Tzafestas ◽

Mehdi Khamassi

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Learning Algorithm ◽

The State ◽

Active State ◽

Verbal Cues ◽

Assistive Robots ◽

Maze Navigation ◽

Educational Applications ◽

Global And Local

Abstract Using assistive robots for educational applications requires robots to be able to adapt their behavior specifically for each child with whom they interact.Among relevant signals, non-verbal cues such as the child’s gaze can provide the robot with important information about the child’s current engagement in the task, and whether the robot should continue its current behavior or not. Here we propose a reinforcement learning algorithm extended with active state-specific exploration and show its applicability to child engagement maximization as well as more classical tasks such as maze navigation. We first demonstrate its adaptive nature on a continuous maze problem as an enhancement of the classic grid world. There, parameterized actions enable the agent to learn single moves until the end of a corridor, similarly to “options” but without explicit hierarchical representations.We then apply the algorithm to a series of simulated scenarios, such as an extended Tower of Hanoi where the robot should find the appropriate speed of movement for the interacting child, and to a pointing task where the robot should find the child-specific appropriate level of expressivity of action. We show that the algorithm enables to cope with both global and local non-stationarities in the state space while preserving a stable behavior in other stationary portions of the state space. Altogether, these results suggest a promising way to enable robot learning based on non-verbal cues and the high degree of non-stationarities that can occur during interaction with children.

Download Full-text

Energy Management of Hybrid UAV Based on Reinforcement Learning

Electronics ◽

10.3390/electronics10161929 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1929

Author(s):

Huan Shen ◽

Yao Zhang ◽

Jianguo Mao ◽

Zhiwei Yan ◽

Linwei Wu

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Energy Management ◽

Internal Combustion Engines ◽

Value Function ◽

Learning Algorithm ◽

The State ◽

Combustion Engines ◽

State Action ◽

Action Value

In order to solve the flight time problem of Unmanned Aerial Vehicles (UAV), this paper proposes a set of energy management strategies based on reinforcement learning for hybrid agricultural UAV. The battery is used to optimize the working point of internal combustion engines to the greatest extent while solving the high power demand issues of UAV and the response problem of internal combustion engines. Firstly, the decision-making oriented hybrid model and UAV dynamic model are established. Owing to the characteristics of the energy management strategy (EMS) based on reinforcement learning (RL), which is an intelligent optimization algorithm that has emerged in recent years, the complex theoretical formula derivation is avoided in the modeling process. In terms of the EMS, a double Q learning algorithm with strong convergence is adopted. The algorithm separates the state action value function database used in derivation decisions and the state action value function-updated database brought by the decision, so as to avoid delay and shock within the convergence process caused by maximum deviation. After the improvement, the off-line training is carried out with a large number of flight data generated in the past. The simulation results demonstrate that the improved algorithm can show better performance with less learning cost than before by virtue of the search function strategy proposed in this paper. In the state space, time-based and residual fuel-based selection are carried out successively, and the convergence rate and application effect are compared and analyzed. The results show that the learning algorithm has stronger robustness and convergence speed due to the appropriate selection of state space under different types of operating cycles. After 120,000 cycles of training, the fuel economy of the improved algorithm in this paper can reach more than 90% of that of the optimal solution, and can perform stably in actual flight.

Download Full-text