scholarly journals Obtaining Human Experience for Intelligent Dredger Control: A Reinforcement Learning Approach

2019 ◽  
Vol 9 (9) ◽  
pp. 1769 ◽  
Author(s):  
Changyun Wei ◽  
Fusheng Ni ◽  
Xiujing Chen

This work presents a reinforcement learning approach for intelligent decision-making of a Cutter Suction Dredger (CSD), which is a special type of vessel for deepening harbors, constructing ports or navigational channels, and reclaiming landfills. Currently, CSDs are usually controlled by human operators, and the production rate is mainly determined by the so-called cutting process (i.e., cutting the underwater soil into fragments). Long-term manual operation is likely to cause driving fatigue, resulting in operational accidents and inefficiencies. To reduce the labor intensity of the operator, we seek an intelligent controller the can manipulate the cutting process to replace human operators. To this end, our proposed reinforcement learning approach consists of two parts. In the first part, we employ a neural network model to construct a virtual environment based on the historical dredging data. In the second part, we develop a reinforcement learning model that can lean the optimal control policy by interacting with the virtual environment to obtain human experience. The results show that the proposed learning approach can successfully imitate the dredging behavior of an experienced human operator. Moreover, the learning approach can outperform the operator in a way that can make quick responses to the change in uncertain environments.

2020 ◽  
Vol 12 (1) ◽  
pp. 8
Author(s):  
Ibrahim Ahmed ◽  
Marcos Quiñones-Grueiro ◽  
Gautam Biswas

Faults are endemic to all systems. Adaptive fault-tolerant control accepts degraded performance under faults in exchange for continued operation. In systems with abrupt faults and strict time constraints, it is imperative for control to adapt fast to system changes. We present a meta-reinforcement learning approach that quickly adapts control policy. The approach builds upon model-agnostic meta learning (MAML). The controller maintains a complement of prior policies learned under system faults. This ``library" is evaluated on a system after a new fault to initialize the new policy. This contrasts with MAML where the controller samples new policies from a distribution of similar systems at each update step to achieve the new policy. Our approach improves sample efficiency of the reinforcement learning process. We evaluate this on a model of fuel tanks under abrupt faults.


Electronics ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1668
Author(s):  
Yuxiang Sun ◽  
Bo Yuan ◽  
Tao Zhang ◽  
Bojian Tang ◽  
Wanwen Zheng ◽  
...  

The reinforcement learning problem of complex action control in a multi-player wargame has been a hot research topic in recent years. In this paper, a game system based on turn-based confrontation is designed and implemented with state-of-the-art deep reinforcement learning models. Specifically, we first design a Q-learning algorithm to achieve intelligent decision-making, which is based on the DQN (Deep Q Network) to model complex game behaviors. Then, an a priori knowledge-based algorithm PK-DQN (Prior Knowledge-Deep Q Network) is introduced to improve the DQN algorithm, which accelerates the convergence speed and stability of the algorithm. The experiments demonstrate the correctness of the PK-DQN algorithm, it is validated, and its performance surpasses the conventional DQN algorithm. Furthermore, the PK-DQN algorithm shows effectiveness in defeating the high level of rule-based opponents, which provides promising results for the exploration of the field of smart chess and intelligent game deduction.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 1019
Author(s):  
Shengluo Yang ◽  
Zhigang Xu ◽  
Junyi Wang

Dynamic scheduling problems have been receiving increasing attention in recent years due to their practical implications. To realize real-time and the intelligent decision-making of dynamic scheduling, we studied dynamic permutation flowshop scheduling problem (PFSP) with new job arrival using deep reinforcement learning (DRL). A system architecture for solving dynamic PFSP using DRL is proposed, and the mathematical model to minimize total tardiness cost is established. Additionally, the intelligent scheduling system based on DRL is modeled, with state features, actions, and reward designed. Moreover, the advantage actor-critic (A2C) algorithm is adapted to train the scheduling agent. The learning curve indicates that the scheduling agent learned to generate better solutions efficiently during training. Extensive experiments are carried out to compare the A2C-based scheduling agent with every single action, other DRL algorithms, and meta-heuristics. The results show the well performance of the A2C-based scheduling agent considering solution quality, CPU times, and generalization. Notably, the trained agent generates a scheduling action only in 2.16 ms on average, which is almost instantaneous and can be used for real-time scheduling. Our work can help to build a self-learning, real-time optimizing, and intelligent decision-making scheduling system.


Author(s):  
Shuangxia Bai ◽  
Shaomei Song ◽  
Shiyang Liang ◽  
Jianmei Wang ◽  
Bo Li ◽  
...  

Aiming at intelligent decision-making of UAV based on situation information in air combat, a novel maneuvering decision method based on deep reinforcement learning is proposed in this paper. The autonomous maneuvering model of UAV is established by Markov Decision Process. The Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm and the Deep Deterministic Policy Gradient (DDPG) algorithm in deep reinforcement learning are used to train the model, and the experimental results of the two algorithms are analyzed and compared. The simulation experiment results show that compared with the DDPG algorithm, the TD3 algorithm has stronger decision-making performance and faster convergence speed, and is more suitable forsolving combat problems. The algorithm proposed in this paper enables UAVs to autonomously make maneuvering decisions based on situation information such as position, speed, and relative azimuth, adjust their actions to approach and successfully strike the enemy, providing a new method for UAVs to make intelligent maneuvering decisions during air combat.


Author(s):  
Bailin Song ◽  
Hua Xu ◽  
Lei Jiang ◽  
Ning Rao

In order to solve the problem of intelligent anti-jamming decision-making in battlefield communication, this paper designs an intelligent decision-making method for communication anti-jamming based on deep reinforcement learning. Introducing experience replay and dynamic epsilon mechanism based on PHC under the framework of DQN algorithm, a dynamic epsilon-DQN intelligent decision-making method is proposed. The algorithm can better select the value of epsilon according to the state of the decision network and improve the convergence speed and decision success rate. During the decision-making process, the jamming signals of all communication frequencies are detected, and the results are input into the decision-making algorithm as jamming discriminant information, so that we can effectively avoid being jammed under the condition of no prior jamming information. The experimental results show that the proposed method adapts to various communication models, has a fast decision-making speed, and the average success rate of the convergent algorithm can reach more than 95%, which has a great advantage over the existing decision-making methods.


Sign in / Sign up

Export Citation Format

Share Document