scholarly journals Energy Management of Hybrid UAV Based on Reinforcement Learning

Electronics ◽  
2021 ◽  
Vol 10 (16) ◽  
pp. 1929
Author(s):  
Huan Shen ◽  
Yao Zhang ◽  
Jianguo Mao ◽  
Zhiwei Yan ◽  
Linwei Wu

In order to solve the flight time problem of Unmanned Aerial Vehicles (UAV), this paper proposes a set of energy management strategies based on reinforcement learning for hybrid agricultural UAV. The battery is used to optimize the working point of internal combustion engines to the greatest extent while solving the high power demand issues of UAV and the response problem of internal combustion engines. Firstly, the decision-making oriented hybrid model and UAV dynamic model are established. Owing to the characteristics of the energy management strategy (EMS) based on reinforcement learning (RL), which is an intelligent optimization algorithm that has emerged in recent years, the complex theoretical formula derivation is avoided in the modeling process. In terms of the EMS, a double Q learning algorithm with strong convergence is adopted. The algorithm separates the state action value function database used in derivation decisions and the state action value function-updated database brought by the decision, so as to avoid delay and shock within the convergence process caused by maximum deviation. After the improvement, the off-line training is carried out with a large number of flight data generated in the past. The simulation results demonstrate that the improved algorithm can show better performance with less learning cost than before by virtue of the search function strategy proposed in this paper. In the state space, time-based and residual fuel-based selection are carried out successively, and the convergence rate and application effect are compared and analyzed. The results show that the learning algorithm has stronger robustness and convergence speed due to the appropriate selection of state space under different types of operating cycles. After 120,000 cycles of training, the fuel economy of the improved algorithm in this paper can reach more than 90% of that of the optimal solution, and can perform stably in actual flight.


2019 ◽  
Author(s):  
Jordão Memória ◽  
José Maia

In this work, a modeling and algorithm based on multiagent reinforcement learning is developed for the problem of elevator group dispatch. The main advantage is that, along with the function approximation, this multi-agent solution leads to reduction of the state space, allowing complex states to be addressed with a synthesizing evaluation function. Each elevator is considered an agent that have to decide about two actions: answer or ignore the new call. With some iterations, the agents learn the weights of an evaluation function which approximate the state-action value function. The performance of solution (average waiting time - AWT), shown varying the traffic pattern, flow of people, number of elevators and number of floors, is comparable to other current proposals reported in the literature.



2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Yong Song ◽  
Yibin Li ◽  
Xiaoli Wang ◽  
Xin Ma ◽  
Jiuhong Ruan

Reinforcement learning algorithm for multirobot will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequentialQ-learning based on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in the database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be appended to the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multirobot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.



Author(s):  
Hua Wei ◽  
Deheng Ye ◽  
Zhao Liu ◽  
Hao Wu ◽  
Bo Yuan ◽  
...  

Offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration.Current offline RL research includes: 1) generative modeling, i.e., approximating a policy using fixed data; and 2) learning the state-action value function. While most research focuses on the state-action function part through reducing the bootstrapping error in value function approximation induced by the distribution shift of training data, the effects of error propagation in generative modeling have been neglected. In this paper, we analyze the error in generative modeling. We propose AQL (action-conditioned Q-learning), a residual generative model to reduce policy approximation error for offline RL. We show that our method can learn more accurate policy approximations in different benchmark datasets. In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game, Honor of Kings.



2020 ◽  
Vol 12 (21) ◽  
pp. 8883
Author(s):  
Kun Jin ◽  
Wei Wang ◽  
Xuedong Hua ◽  
Wei Zhou

As the key element of urban transportation, taxis services significantly provide convenience and comfort for residents’ travel. However, the reality has not shown much efficiency. Previous researchers mainly aimed to optimize policies by order dispatch on ride-hailing services, which cannot be applied in cruising taxis services. This paper developed the reinforcement learning (RL) framework to optimize driving policies on cruising taxis services. Firstly, we formulated the drivers’ behaviours as the Markov decision process (MDP) progress, considering the influences after taking action in the long run. The RL framework using dynamic programming and data expansion was employed to calculate the state-action value function. Following the value function, drivers can determine the best choice and then quantify the expected future reward at a particular state. By utilizing historic orders data in Chengdu, we analysed the function value’s spatial distribution and demonstrated how the model could optimize the driving policies. Finally, the realistic simulation of the on-demand platform was built. Compared with other benchmark methods, the results verified that the new model performs better in increasing total revenue, answer rate and decreasing waiting time, with the relative percentages of 4.8%, 6.2% and −27.27% at most.



2018 ◽  
Vol 9 (1) ◽  
pp. 235-253 ◽  
Author(s):  
George Velentzas ◽  
Theodore Tsitsimis ◽  
Iñaki Rañó ◽  
Costas Tzafestas ◽  
Mehdi Khamassi

Abstract Using assistive robots for educational applications requires robots to be able to adapt their behavior specifically for each child with whom they interact.Among relevant signals, non-verbal cues such as the child’s gaze can provide the robot with important information about the child’s current engagement in the task, and whether the robot should continue its current behavior or not. Here we propose a reinforcement learning algorithm extended with active state-specific exploration and show its applicability to child engagement maximization as well as more classical tasks such as maze navigation. We first demonstrate its adaptive nature on a continuous maze problem as an enhancement of the classic grid world. There, parameterized actions enable the agent to learn single moves until the end of a corridor, similarly to “options” but without explicit hierarchical representations.We then apply the algorithm to a series of simulated scenarios, such as an extended Tower of Hanoi where the robot should find the appropriate speed of movement for the interacting child, and to a pointing task where the robot should find the child-specific appropriate level of expressivity of action. We show that the algorithm enables to cope with both global and local non-stationarities in the state space while preserving a stable behavior in other stationary portions of the state space. Altogether, these results suggest a promising way to enable robot learning based on non-verbal cues and the high degree of non-stationarities that can occur during interaction with children.



2020 ◽  
pp. 10-16
Author(s):  
S.A. Belov ◽  
I.V. Busin

The article reviews four existing technologies for replacing engine oil and a method for determining its suitability for improving economic efficiency. It is established that the oil is replaced according to the need in accordance with the defect indicators. This technology of oil condition is characterized by a more complete use of its resource. The frequency of replacement is determined by the indicators of condition, which is monitored by special sensors built into the engine lubrication system. However, the difficulty of using this technology is due to the lack of high-quality devices for monitoring the state of running engine oil in the engine.



2021 ◽  
Vol 54 (4) ◽  
pp. 599-606
Author(s):  
Punyavathi Ramineni ◽  
Alagappan Pandian

Many pollution-related issues are raising due to the usage of conventional internal combustion engines (ICEs) vehicles. Electric Vehicles/ Hybrid electric vehicles (EVs/HEVs) are the finest solutions to overcome those problems associated with ICE-based vehicles. The EVs are introduced with a signal energy source (SES), which is not a successful attempt, especially during transient vehicles, driving, etc. Multiple energy sources (MES) EVs are introduced to attain better performance than the SES vehicles, which is obtained by combining two sources like battery/fuel cells, ultracapacitor. In this contest, energy management (EMNG) plays a vital role in sharing the load to the sources as per the EVs requirement. In the case of MES-based EVs, the controller always plays a significant role in the related EMNG system because it is the key factor in improving vehicle efficiency. In this article, a study has mainly been done related to several conventional, intelligent controllers and control algorithms to do the proper EMNG between sources present in the EV.



2010 ◽  
Vol 44-47 ◽  
pp. 3611-3615 ◽  
Author(s):  
Zhi Cong Zhang ◽  
Kai Shun Hu ◽  
Hui Yu Huang ◽  
Shuai Li ◽  
Shao Yong Zhao

Reinforcement learning (RL) is a state or action value based machine learning method which approximately solves large-scale Markov Decision Process (MDP) or Semi-Markov Decision Process (SMDP). A multi-step RL algorithm called Sarsa(,k) is proposed, which is a compromised variation of Sarsa and Sarsa(). It is equivalent to Sarsa if k is 1 and is equivalent to Sarsa() if k is infinite. Sarsa(,k) adjust its performance by setting k value. Two forms of Sarsa(,k), forward view Sarsa(,k) and backward view Sarsa(,k), are constructed and proved equivalent in off-line updating.



Sign in / Sign up

Export Citation Format

Share Document