scholarly journals Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 737
Author(s):  
Fengjie Sun ◽  
Xianchang Wang ◽  
Rui Zhang

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

Sensors ◽  
2019 ◽  
Vol 19 (18) ◽  
pp. 4055 ◽  
Author(s):  
Zhang ◽  
Wang ◽  
Liu ◽  
Chen

This research focuses on the adaptive navigation of maritime autonomous surface ships (MASSs) in an uncertain environment. To achieve intelligent obstacle avoidance of MASSs in a port, an autonomous navigation decision-making model based on hierarchical deep reinforcement learning is proposed. The model is mainly composed of two layers: the scene division layer and an autonomous navigation decision-making layer. The scene division layer mainly quantifies the sub-scenarios according to the International Regulations for Preventing Collisions at Sea (COLREG). This research divides the navigational situation of a ship into entities and attributes based on the ontology model and Protégé language. In the decision-making layer, we designed a deep Q-learning algorithm utilizing the environmental model, ship motion space, reward function, and search strategy to learn the environmental state in a quantized sub-scenario to train the navigation strategy. Finally, two sets of verification experiments of the deep reinforcement learning (DRL) and improved DRL algorithms were designed with Rizhao port as a study case. Moreover, the experimental data were analyzed in terms of the convergence trend, iterative path, and collision avoidance effect. The results indicate that the improved DRL algorithm could effectively improve the navigation safety and collision avoidance.


2020 ◽  
pp. 78-87
Author(s):  
Vijayalakshmi Anand ◽  
Chittaranjan Hota

Crowdsourcing is a model where individuals or organizations receive services from a large group of Internet users including ideas, finances, completing a complex task, etc. Several crowdsourcing websites have failed due to lack of user participation; hence, the success of crowdsourcing platforms is manifested by the mass of user participation. However, an issue of motivating users to participate in crowdsourcing platform stays challenging. We have proposed a new approach, i.e., reinforcement learning-based gamification method to motivate users. Gamification has been a practical approach to engaging users in many fields, but still, it needs an improvement in the Crowdsourcing platform. In this paper, the gamification approach is strengthened by a reinforcement learning algorithm. We have created an intelligent agent using the Reinforcement learning algorithm (Q-learning). This agent suggests an optimal action plan that yields maximum reward points to the users for their active participation in the Crowdsourcing application. Also, its performance is compared with the SARSA algorithm (On- policy learning), which is another Reinforcement learning algorithm.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2233 ◽  
Author(s):  
Ke Li ◽  
Kun Zhang ◽  
Zhenchong Zhang ◽  
Zekun Liu ◽  
Shuai Hua ◽  
...  

How to operate an unmanned aerial vehicle (UAV) safely and efficiently in an interactive environment is challenging. A large amount of research has been devoted to improve the intelligence of a UAV while performing a mission, where finding an optimal maneuver decision-making policy of the UAV has become one of the key issues when we attempt to enable the UAV autonomy. In this paper, we propose a maneuver decision-making algorithm based on deep reinforcement learning, which generates efficient maneuvers for a UAV agent to execute the airdrop mission autonomously in an interactive environment. Particularly, the training set of the learning algorithm by the Prioritized Experience Replay is constructed, that can accelerate the convergence speed of decision network training in the algorithm. It is shown that a desirable and effective maneuver decision-making policy can be found by extensive experimental results.


Aerospace ◽  
2021 ◽  
Vol 8 (4) ◽  
pp. 113
Author(s):  
Pedro Andrade ◽  
Catarina Silva ◽  
Bernardete Ribeiro ◽  
Bruno F. Santos

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.


2012 ◽  
Vol 566 ◽  
pp. 572-579
Author(s):  
Abdolkarim Niazi ◽  
Norizah Redzuan ◽  
Raja Ishak Raja Hamzah ◽  
Sara Esfandiari

In this paper, a new algorithm based on case base reasoning and reinforcement learning (RL) is proposed to increase the convergence rate of the reinforcement learning algorithms. RL algorithms are very useful for solving wide variety decision problems when their models are not available and they must make decision correctly in every state of system, such as multi agent systems, artificial control systems, robotic, tool condition monitoring and etc. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function is proposed to select the action, which led to an increase in algorithms based on Q-learning. The algorithm mentioned was used for solving the problem of cooperative Markov’s games as one of the models of Markov based multi-agent systems. The results of experiments Indicated that the proposed algorithms perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy.


Electronics ◽  
2018 ◽  
Vol 7 (11) ◽  
pp. 279 ◽  
Author(s):  
Xianbing Zhang ◽  
Guoqing Liu ◽  
Chaojie Yang ◽  
Jiang Wu

With the development of information technology, the degree of intelligence in air combat is increasing, and the demand for automated intelligent decision-making systems is becoming more intense. Based on the characteristics of over-the-horizon air combat, this paper constructs a super-horizon air combat training environment, which includes aircraft model modeling, air combat scene design, enemy aircraft strategy design, and reward and punishment signal design. In order to improve the efficiency of the reinforcement learning algorithm for the exploration of strategy space, this paper proposes a heuristic Q-Network method that integrates expert experience, and uses expert experience as a heuristic signal to guide the search process. At the same time, heuristic exploration and random exploration are combined. Aiming at the over-the-horizon air combat maneuver decision problem, the heuristic Q-Network method is adopted to train the neural network model in the over-the-horizon air combat training environment. Through continuous interaction with the environment, self-learning of the air combat maneuver strategy is realized. The efficiency of the heuristic Q-Network method and effectiveness of the air combat maneuver strategy are verified by simulation experiments.


Author(s):  
Taichi Chujo ◽  
Kosei Nishida ◽  
Tatsushi Nishi

Abstract In a modern large-scale fabrication, hundreds of vehicles are used for transportation. Since traffic conditions are changing rapidly, the routing of automated guided vehicles (AGV) needs to be changed according to the change in traffic conditions. We propose a conflict-free routing method for AGVs using reinforcement learning in dynamic transportation. An advantage of the proposed method is that a change in the state can be obtained as an evaluation function. Therefore, the action can be selected according to the states. A deadlock avoidance method in bidirectional transport systems is developed using reinforcement learning. The effectiveness of the proposed method is demonstrated by comparing the performance with the conventional Q learning algorithm from computational results.


1995 ◽  
Vol 4 (1) ◽  
pp. 3-28 ◽  
Author(s):  
Mance E. Harmon ◽  
Leemon C. Baird ◽  
A. Harry Klopf

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations. Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small.


2020 ◽  
Author(s):  
Josias G. Batista ◽  
Felipe J. S. Vasconcelos ◽  
Kaio M. Ramos ◽  
Darielson A. Souza ◽  
José L. N. Silva

Industrial robots have grown over the years making production systems more and more efficient, requiring the need for efficient trajectory generation algorithms that optimize and, if possible, generate collision-free trajectories without interrupting the production process. In this work is presented the use of Reinforcement Learning (RL), based on the Q-Learning algorithm, in the trajectory generation of a robotic manipulator and also a comparison of its use with and without constraints of the manipulator kinematics, in order to generate collisionfree trajectories. The results of the simulations are presented with respect to the efficiency of the algorithm and its use in trajectory generation, a comparison of the computational cost for the use of constraints is also presented.


2009 ◽  
Vol 10 (4) ◽  
pp. 329-341 ◽  
Author(s):  
Aleksandras Vytautas Rutkauskas ◽  
Tomas Ramanauskas

In this paper we propose an artificial stock market model based on interaction of heterogeneous agents whose forward-looking behaviour is driven by the reinforcement-learning algorithm combined with some evolutionary selection mechanism. We use the model for the analysis of market self-regulation abilities, market efficiency and determinants of emergent properties of the financial market. Distinctive and novel features of the model include strong emphasis on the economic content of individual decision-making, application of the Q-learning algorithm for driving individual behaviour, and rich market setup. Along with that a parallel version of the model is presented, which is mainly based on research of current changes in the market, as well as on search of newly emerged consistent patterns, and which has been repeatedly used for optimal decisions’ search experiments in various capital markets.


Sign in / Sign up

Export Citation Format

Share Document