Energy Management of Hybrid UAV Based on Reinforcement Learning

Huan Shen; Yao Zhang; Jianguo Mao; Zhiwei Yan; Linwei Wu

doi:10.3390/electronics10161929

Energy Management of Hybrid UAV Based on Reinforcement Learning

Electronics ◽

10.3390/electronics10161929 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1929

Author(s):

Huan Shen ◽

Yao Zhang ◽

Jianguo Mao ◽

Zhiwei Yan ◽

Linwei Wu

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Energy Management ◽

Internal Combustion Engines ◽

Value Function ◽

Learning Algorithm ◽

The State ◽

Combustion Engines ◽

State Action ◽

Action Value

In order to solve the flight time problem of Unmanned Aerial Vehicles (UAV), this paper proposes a set of energy management strategies based on reinforcement learning for hybrid agricultural UAV. The battery is used to optimize the working point of internal combustion engines to the greatest extent while solving the high power demand issues of UAV and the response problem of internal combustion engines. Firstly, the decision-making oriented hybrid model and UAV dynamic model are established. Owing to the characteristics of the energy management strategy (EMS) based on reinforcement learning (RL), which is an intelligent optimization algorithm that has emerged in recent years, the complex theoretical formula derivation is avoided in the modeling process. In terms of the EMS, a double Q learning algorithm with strong convergence is adopted. The algorithm separates the state action value function database used in derivation decisions and the state action value function-updated database brought by the decision, so as to avoid delay and shock within the convergence process caused by maximum deviation. After the improvement, the off-line training is carried out with a large number of flight data generated in the past. The simulation results demonstrate that the improved algorithm can show better performance with less learning cost than before by virtue of the search function strategy proposed in this paper. In the state space, time-based and residual fuel-based selection are carried out successively, and the convergence rate and application effect are compared and analyzed. The results show that the learning algorithm has stronger robustness and convergence speed due to the appropriate selection of state space under different types of operating cycles. After 120,000 cycles of training, the fuel economy of the improved algorithm in this paper can reach more than 90% of that of the optimal solution, and can perform stably in actual flight.

A solution for the Elevators Group Dispatch by Multiagent Reinforcement Learning

10.5753/eniac.2019.9322 ◽

2019 ◽

Author(s):

Jordão Memória ◽

José Maia

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Value Function ◽

The State ◽

Evaluation Function ◽

State Action ◽

Traffic Pattern ◽

Multiagent Reinforcement Learning ◽

Multi Agent ◽

Action Value

In this work, a modeling and algorithm based on multiagent reinforcement learning is developed for the problem of elevator group dispatch. The main advantage is that, along with the function approximation, this multi-agent solution leads to reduction of the state space, allowing complex states to be addressed with a synthesizing evaluation function. Each elevator is considered an agent that have to decide about two actions: answer or ignore the new call. With some iterations, the agents learn the weights of an evaluation function which approximate the state-action value function. The performance of solution (average waiting time - AWT), shown varying the traffic pattern, flow of people, number of elevators and number of floors, is comparable to other current proposals reported in the literature.

An Improved Reinforcement Learning Algorithm for Cooperative Behaviors of Mobile Robots

Journal of Control Science and Engineering ◽

10.1155/2014/270548 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Yong Song ◽

Yibin Li ◽

Xiaoli Wang ◽

Xin Ma ◽

Jiuhong Ruan

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Knowledge Sharing ◽

State Space ◽

Learning Algorithm ◽

The State ◽

Convergence Speed ◽

Exponential Increase ◽

Cooperative Behaviors ◽

Reinforcement Learning Algorithm

Reinforcement learning algorithm for multirobot will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequentialQ-learning based on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in the database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be appended to the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multirobot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.

Boosting Offline Reinforcement Learning with Residual Generative Modeling

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/492 ◽

2021 ◽

Author(s):

Hua Wei ◽

Deheng Ye ◽

Zhao Liu ◽

Hao Wu ◽

Bo Yuan ◽

...

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Approximation Error ◽

The State ◽

Training Data ◽

Action Function ◽

Q Learning ◽

State Action ◽

Generative Modeling ◽

Benchmark Datasets

Offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration.Current offline RL research includes: 1) generative modeling, i.e., approximating a policy using fixed data; and 2) learning the state-action value function. While most research focuses on the state-action function part through reducing the bootstrapping error in value function approximation induced by the distribution shift of training data, the effects of error propagation in generative modeling have been neglected. In this paper, we analyze the error in generative modeling. We propose AQL (action-conditioned Q-learning), a residual generative model to reduce policy approximation error for offline RL. We show that our method can learn more accurate policy approximations in different benchmark datasets. In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game, Honor of Kings.

Reinforcement Learning for Optimizing Driving Policies on Cruising Taxis Services

Sustainability ◽

10.3390/su12218883 ◽

2020 ◽

Vol 12 (21) ◽

pp. 8883

Author(s):

Kun Jin ◽

Wei Wang ◽

Xuedong Hua ◽

Wei Zhou

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

State Action ◽

Future Reward ◽

Long Run ◽

Markov Decision ◽

Action Value ◽

Data Expansion ◽

Taking Action ◽

The Value Function

As the key element of urban transportation, taxis services significantly provide convenience and comfort for residents’ travel. However, the reality has not shown much efficiency. Previous researchers mainly aimed to optimize policies by order dispatch on ride-hailing services, which cannot be applied in cruising taxis services. This paper developed the reinforcement learning (RL) framework to optimize driving policies on cruising taxis services. Firstly, we formulated the drivers’ behaviours as the Markov decision process (MDP) progress, considering the influences after taking action in the long run. The RL framework using dynamic programming and data expansion was employed to calculate the state-action value function. Following the value function, drivers can determine the best choice and then quantify the expected future reward at a particular state. By utilizing historic orders data in Chengdu, we analysed the function value’s spatial distribution and demonstrated how the model could optimize the driving policies. Finally, the realistic simulation of the on-demand platform was built. Compared with other benchmark methods, the results verified that the new model performs better in increasing total revenue, answer rate and decreasing waiting time, with the relative percentages of 4.8%, 6.2% and −27.27% at most.

Adaptive reinforcement learning with active state-specific exploration for engagement maximization during simulated child-robot interaction

Paladyn Journal of Behavioral Robotics ◽

10.1515/pjbr-2018-0016 ◽

2018 ◽

Vol 9 (1) ◽

pp. 235-253 ◽

Cited By ~ 2

Author(s):

George Velentzas ◽

Theodore Tsitsimis ◽

Iñaki Rañó ◽

Costas Tzafestas ◽

Mehdi Khamassi

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Learning Algorithm ◽

The State ◽

Active State ◽

Verbal Cues ◽

Assistive Robots ◽

Maze Navigation ◽

Educational Applications ◽

Global And Local

Abstract Using assistive robots for educational applications requires robots to be able to adapt their behavior specifically for each child with whom they interact.Among relevant signals, non-verbal cues such as the child’s gaze can provide the robot with important information about the child’s current engagement in the task, and whether the robot should continue its current behavior or not. Here we propose a reinforcement learning algorithm extended with active state-specific exploration and show its applicability to child engagement maximization as well as more classical tasks such as maze navigation. We first demonstrate its adaptive nature on a continuous maze problem as an enhancement of the classic grid world. There, parameterized actions enable the agent to learn single moves until the end of a corridor, similarly to “options” but without explicit hierarchical representations.We then apply the algorithm to a series of simulated scenarios, such as an extended Tower of Hanoi where the robot should find the appropriate speed of movement for the interacting child, and to a pointing task where the robot should find the child-specific appropriate level of expressivity of action. We show that the algorithm enables to cope with both global and local non-stationarities in the state space while preserving a stable behavior in other stationary portions of the state space. Altogether, these results suggest a promising way to enable robot learning based on non-verbal cues and the high degree of non-stationarities that can occur during interaction with children.

Intelligent Energy Management Strategy Based on an Improved Reinforcement Learning Algorithm With Exploration Factor for a Plug-in PHEV

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2021.3085710 ◽

2021 ◽

pp. 1-11

Author(s):

Xinyou Lin ◽

Kuncheng Zhou ◽

Liping Mo ◽

Hailin Li

Keyword(s):

Reinforcement Learning ◽

Energy Management ◽

Management Strategy ◽

Learning Algorithm ◽

Energy Management Strategy ◽

Reinforcement Learning Algorithm

Solving flow-shop scheduling problem with a reinforcement learning algorithm that generalizes the value function with neural network

Alexandria Engineering Journal ◽

10.1016/j.aej.2021.01.030 ◽

2021 ◽

Vol 60 (3) ◽

pp. 2787-2800

Author(s):

Jianfeng Ren ◽

Chunming Ye ◽

Feng Yang

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Value Function ◽

Flow Shop ◽

Learning Algorithm ◽

Flow Shop Scheduling ◽

Scheduling Problem ◽

Shop Scheduling ◽

The Value Function ◽

Reinforcement Learning Algorithm

Overview of engine oil replacement technologies in internal combustion engines of agricultural machinery

10.33920/pro-2-2009-01 ◽

2020 ◽

pp. 10-16

Author(s):

S.A. Belov ◽

I.V. Busin

Keyword(s):

Economic Efficiency ◽

Internal Combustion Engines ◽

Internal Combustion ◽

The State ◽

Engine Oil ◽

Combustion Engines ◽

Lubrication System ◽

Agricultural Machinery ◽

High Quality

The article reviews four existing technologies for replacing engine oil and a method for determining its suitability for improving economic efficiency. It is established that the oil is replaced according to the need in accordance with the defect indicators. This technology of oil condition is characterized by a more complete use of its resource. The frequency of replacement is determined by the indicators of condition, which is monitored by special sensors built into the engine lubrication system. However, the difficulty of using this technology is due to the lack of high-quality devices for monitoring the state of running engine oil in the engine.

Study and Investigation of Energy Management Techniques Used in Electric/Hybrid Electric Vehicles

Journal Européen des Systèmes Automatisés ◽

10.18280/jesa.540409 ◽

2021 ◽

Vol 54 (4) ◽

pp. 599-606

Author(s):

Punyavathi Ramineni ◽

Alagappan Pandian

Keyword(s):

Energy Management ◽

Electric Vehicles ◽

Internal Combustion Engines ◽

Hybrid Electric Vehicles ◽

Vital Role ◽

Combustion Engines ◽

Key Factor ◽

Hybrid Electric ◽

And Control ◽

Intelligent Controllers

Many pollution-related issues are raising due to the usage of conventional internal combustion engines (ICEs) vehicles. Electric Vehicles/ Hybrid electric vehicles (EVs/HEVs) are the finest solutions to overcome those problems associated with ICE-based vehicles. The EVs are introduced with a signal energy source (SES), which is not a successful attempt, especially during transient vehicles, driving, etc. Multiple energy sources (MES) EVs are introduced to attain better performance than the SES vehicles, which is obtained by combining two sources like battery/fuel cells, ultracapacitor. In this contest, energy management (EMNG) plays a vital role in sharing the load to the sources as per the EVs requirement. In the case of MES-based EVs, the controller always plays a significant role in the related EMNG system because it is the key factor in improving vehicle efficiency. In this article, a study has mainly been done related to several conventional, intelligent controllers and control algorithms to do the proper EMNG between sources present in the EV.

A Multi-Step Reinforcement Learning Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.3611 ◽

2010 ◽

Vol 44-47 ◽

pp. 3611-3615 ◽

Cited By ~ 1

Author(s):

Zhi Cong Zhang ◽

Kai Shun Hu ◽

Hui Yu Huang ◽

Shuai Li ◽

Shao Yong Zhao

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Large Scale ◽

Learning Algorithm ◽

Machine Learning Method ◽

Learning Method ◽

K Value ◽

Markov Decision ◽

Action Value

Reinforcement learning (RL) is a state or action value based machine learning method which approximately solves large-scale Markov Decision Process (MDP) or Semi-Markov Decision Process (SMDP). A multi-step RL algorithm called Sarsa(,k) is proposed, which is a compromised variation of Sarsa and Sarsa(). It is equivalent to Sarsa if k is 1 and is equivalent to Sarsa() if k is infinite. Sarsa(,k) adjust its performance by setting k value. Two forms of Sarsa(,k), forward view Sarsa(,k) and backward view Sarsa(,k), are constructed and proved equivalent in off-line updating.