Risk Perception Oriented Autonomous Ship Navigation in AIS Environment

Volume 1: Offshore Technology ◽

10.1115/omae2020-18003 ◽

2020 ◽

Author(s):

Ruolan Zhang ◽

Masao Furusho

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Risk Perception ◽

Adaptive Systems ◽

Learning Algorithm ◽

Automatic Identification ◽

Identification System ◽

Rule Based ◽

Ship Navigation ◽

Frame Motion

Abstract Due to the quality and error of the data itself, historical automatic identification system (AIS) data was insufficient used to predict navigation risk at sea, but it adequately used to train decision-making neural networks. This paper presents a real AIS ship navigation environment with a rule-based and a neural-based decision processes with frame motion and training the decision network using a deep reinforcement learning algorithm. Rule-based decision-making has several applications in the field of adaptive systems, expert systems, and decision support systems, it also including general ship navigation which regulated by the convention on the international regulations for preventing collisions at sea (COLREGs). However, if someone intend to achieve full unmanned ship navigation without any remote control at the open sea, a rule-based decision-making system cannot be implemented alone. With the growing amount of data, complex sea environment, different collision scenarios, the agent-based decision has become an important role in transportation. For ships, combined rule-based and neural-based decision-making is the only option. It has become progressively challenging to satisfy autonomous decision-making development requirements. This study uses deep reinforcement learning to evaluate the performance of decision-making efficiency under different AIS data input shapes. The results show that the decision neural network trained with AIS data has good robustness and a high ability to achieve collision avoidance. Furthermore, using the same methodology, include instructive guidance for processing radar, camera, ENC, etc., respond to different risk perception tasks in different scenarios. It has important implications for fully unmanned navigation.

Download Full-text

Transfer Reinforcement Learning for Autonomous Driving

ACM Transactions on Modeling and Computer Simulation ◽

10.1145/3449356 ◽

2021 ◽

Vol 31 (3) ◽

pp. 1-26

Author(s):

Aravind Balakrishnan ◽

Jaeyoung Lee ◽

Ashish Gaurav ◽

Krzysztof Czarnecki ◽

Sean Sedwards

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Transfer Problem ◽

Autonomous Driving ◽

High Fidelity ◽

Rule Based ◽

High Level ◽

Real Vehicle

Reinforcement learning (RL) is an attractive way to implement high-level decision-making policies for autonomous driving, but learning directly from a real vehicle or a high-fidelity simulator is variously infeasible. We therefore consider the problem of transfer reinforcement learning and study how a policy learned in a simple environment using WiseMove can be transferred to our high-fidelity simulator, W ise M ove . WiseMove is a framework to study safety and other aspects of RL for autonomous driving. W ise M ove accurately reproduces the dynamics and software stack of our real vehicle. We find that the accurately modelled perception errors in W ise M ove contribute the most to the transfer problem. These errors, when even naively modelled in WiseMove , provide an RL policy that performs better in W ise M ove than a hand-crafted rule-based policy. Applying domain randomization to the environment in WiseMove yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has significantly less reliance on velocity compared to the rule-based policy, having learned that its measurement is unreliable.

Download Full-text

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Entropy ◽

10.3390/e23060737 ◽

2021 ◽

Vol 23 (6) ◽

pp. 737

Author(s):

Fengjie Sun ◽

Xianchang Wang ◽

Rui Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Optimal Policy ◽

Feasible Solution ◽

Learning Algorithm ◽

Plant Protection ◽

Agricultural Plant ◽

Q Learning ◽

Aerial Vehicle ◽

Optimal Action

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

Download Full-text

Research on Air Combat Maneuver Decision-Making Method Based on Reinforcement Learning

Electronics ◽

10.3390/electronics7110279 ◽

2018 ◽

Vol 7 (11) ◽

pp. 279 ◽

Cited By ~ 6

Author(s):

Xianbing Zhang ◽

Guoqing Liu ◽

Chaojie Yang ◽

Jiang Wu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Signal Design ◽

Training Environment ◽

Strategy Space ◽

Intelligent Decision Making ◽

Combat Training ◽

Network Method ◽

Air Combat

With the development of information technology, the degree of intelligence in air combat is increasing, and the demand for automated intelligent decision-making systems is becoming more intense. Based on the characteristics of over-the-horizon air combat, this paper constructs a super-horizon air combat training environment, which includes aircraft model modeling, air combat scene design, enemy aircraft strategy design, and reward and punishment signal design. In order to improve the efficiency of the reinforcement learning algorithm for the exploration of strategy space, this paper proposes a heuristic Q-Network method that integrates expert experience, and uses expert experience as a heuristic signal to guide the search process. At the same time, heuristic exploration and random exploration are combined. Aiming at the over-the-horizon air combat maneuver decision problem, the heuristic Q-Network method is adopted to train the neural network model in the over-the-horizon air combat training environment. Through continuous interaction with the environment, self-learning of the air combat maneuver strategy is realized. The efficiency of the heuristic Q-Network method and effectiveness of the air combat maneuver strategy are verified by simulation experiments.

Download Full-text

Ship-Collision Avoidance Decision-Making Learning of Unmanned Surface Vehicles with Automatic Identification System Data Based on Encoder—Decoder Automatic-Response Neural Networks

Journal of Marine Science and Engineering ◽

10.3390/jmse8100754 ◽

2020 ◽

Vol 8 (10) ◽

pp. 754

Author(s):

Miao Gao ◽

Guo-You Shi

Keyword(s):

Neural Networks ◽

Decision Making ◽

Time Series ◽

Collision Avoidance ◽

Time Series Data ◽

Series Data ◽

Automatic Identification ◽

Identification System ◽

Automatic Response ◽

Ship Collision

Intelligent unmanned surface vehicle (USV) collision avoidance is a complex inference problem based on current navigation status. This requires simultaneous processing of the input sequences and generation of the response sequences. The automatic identification system (AIS) encounter data mainly include the time-series data of two AIS sets, which exhibit a one-to-one mapping relation. Herein, an encoder–decoder automatic-response neural network is designed and implemented based on the sequence-to-sequence (Seq2Seq) structure to simultaneously process the two AIS encounter trajectory sequences. Furthermore, this model is combined with the bidirectional long short-term memory recurrent neural networks (Bi-LSTM RNN) to obtain a network framework for processing the time-series data to obtain ship-collision avoidance decisions based on big data. The encoder–decoder neural networks were trained based on the AIS data obtained in 2018 from Zhoushan Port to achieve ship collision avoidance decision-making learning. The results indicated that the encoder–decoder neural networks can be used to effectively formulate the sequence of the collision avoidance decision of the USV. Thus, this study significantly contributes to the increased efficiency and safety of maritime transportation. The proposed method can potentially be applied to the USV technology and intelligent collision-avoidance systems.

Download Full-text

Information unfitness as a factor constraining Automatic Identification System (AIS) application to anti-collision manoeuvring

Polish Maritime Research ◽

10.2478/v10012-012-0032-4 ◽

2012 ◽

Vol 19 (3) ◽

pp. 60-64 ◽

Cited By ~ 8

Author(s):

Andrzej Felski ◽

Krzysztof Jaskólski

Keyword(s):

Decision Making ◽

Collision Avoidance ◽

Significant Role ◽

Technical Specification ◽

Supplementary Information ◽

Automatic Identification ◽

Identification System ◽

Radar Measurement ◽

Dynamic Data ◽

Automatic Identification System

ABSTRACT Common use of shipboard AIS creates conditions for the use of a new kind of dynamic data in the situation of the risk of collision. AIS position report is a source of supplementary information derived from error leveraged radar measurement. However, in view of the results of the studies there are opinions with regard to inconsistent AIS dynamic data in the process of decision-making by the officer of the watch. By taking into consideration the recordings of the studies and technical specification of AIS it can be concluded that the results of inconsistent data have significant role in collision avoidance manoeuvring.

Download Full-text

A Novel Adaptive Sampling Strategy for Deep Reinforcement Learning

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026821500115 ◽

2021 ◽

Vol 20 (02) ◽

pp. 2150011

Author(s):

Xingxing Liang ◽

Li Chen ◽

Yanghe Feng ◽

Zhong Liu ◽

Yang Ma ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Adaptive Sampling ◽

Learning Algorithm ◽

Sampling Strategy ◽

Sequential Decision ◽

Fixed Temperature ◽

Sample Distribution ◽

Intelligent Decision Making ◽

Experience Replay

Reinforcement learning, as an effective method to solve complex sequential decision-making problems, plays an important role in areas such as intelligent decision-making and behavioral cognition. It is well known that the sample experience replay mechanism contributes to the development of current deep reinforcement learning by reusing past samples to improve the efficiency of samples. However, the existing priority experience replay mechanism changes the sample distribution in the sample set due to the higher sampling frequency assigned to a specific transition, and it cannot be applied to actor-critic and other on-policy reinforcement learning algorithm. To address this, we propose an adaptive factor based on TD-error, which further increases sample utilization by giving more attention weight to samples of larger TD-error, and embeds it flexibly into the original Deep Q Network and Advantage Actor-Critic algorithm to improve their performance. Then we carried out the performance evaluation for the proposed architecture in the context of CartPole-V1 and 6 environments of Atari game experiments, respectively, and the obtained results either on the conditions of fixed temperature or annealing temperature, when compared to those produced by the vanilla DQN and original A2C, highlight the advantages in cumulative rewards and climb speed of the improved algorithms.

Download Full-text

Driver-like decision-making method for vehicle longitudinal autonomous driving based on deep reinforcement learning

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/09544070211063081 ◽

2021 ◽

pp. 095440702110630

Author(s):

Zhenhai Gao ◽

Xiangtong Yan ◽

Fei Gao ◽

Lei He

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Autonomous Driving ◽

Decision Strategies ◽

Reward Function ◽

Human Driver ◽

Reward Functions ◽

A Current ◽

Better Than

Decision-making is one of the key parts of the research on vehicle longitudinal autonomous driving. Considering the behavior of human drivers when designing autonomous driving decision-making strategies is a current research hotspot. In longitudinal autonomous driving decision-making strategies, traditional rule-based decision-making strategies are difficult to apply to complex scenarios. Current decision-making methods that use reinforcement learning and deep reinforcement learning construct reward functions designed with safety, comfort, and economy. Compared with human drivers, the obtained decision strategies still have big gaps. Focusing on the above problems, this paper uses the driver’s behavior data to design the reward function of the deep reinforcement learning algorithm through BP neural network fitting, and uses the deep reinforcement learning DQN algorithm and the DDPG algorithm to establish two driver-like longitudinal autonomous driving decision-making models. The simulation experiment compares the decision-making effect of the two models with the driver curve. The results shows that the two algorithms can realize driver-like decision-making, and the consistency of the DDPG algorithm and human driver behavior is higher than that of the DQN algorithm, the effect of the DDPG algorithm is better than the DQN algorithm.

Download Full-text

Knowledge Acquisition of Self-organizing Systems with Deep Multiagent Reinforcement Learning

Journal of Computing and Information Science in Engineering ◽

10.1115/1.4052800 ◽

2021 ◽

pp. 1-47

Author(s):

Hao Ji ◽

Yan Jin

Keyword(s):

Reinforcement Learning ◽

Knowledge Acquisition ◽

Learning Algorithm ◽

Task Complexity ◽

Functional Differentiation ◽

Simple Task ◽

Social Rules ◽

Rule Based ◽

Multi Agent ◽

Self Organizing

Abstract Self-organizing systems (SOS) can perform complex tasks in unforeseen situations with adaptability. Previous work has introduced field-based approaches and rule-based social structuring for individual agents to not only comprehend the task situations but also take advantage of the social rule-based agent relations to accomplish their tasks without a centralized controller. Although the task fields and social rules can be predefined for relatively simple task situations, when the task complexity increases and the task environment changes, having a priori knowledge about these fields and the rules may not be feasible. In this paper, a multi-agent reinforcement learning based model is proposed as a design approach to solving the rule generation problem with complex SOS tasks. A deep multi-agent reinforcement learning algorithm was devised as a mechanism to train SOS agents for knowledge acquisition of the task field and social rules. Learning stability, functional differentiation and robustness properties of this learning approach were investigated with respect to the changing team sizes and task variations. Through computer simulation studies of a box-pushing problem, the results have shown that there is an optimal range of number of agents that achieves good learning stability; agents in a team learn to differentiate from other agents with changing team sizes and box dimensions; and the robustness of the learned knowledge shows to be stronger to the external noises than with changing task constraints.

Download Full-text

Virtual Electronic Aids to Navigation for Remote and Ecologically Sensitive Regions

Journal of Navigation ◽

10.1017/s0373463316000527 ◽

2016 ◽

Vol 70 (2) ◽

pp. 225-241 ◽

Cited By ~ 2

Author(s):

R. Glenn Wright ◽

Michael Baldauf

Keyword(s):

Denial Of Service ◽

Satellite System ◽

The Arctic ◽

Automatic Identification ◽

Identification System ◽

Digital Information ◽

Hostile Environment ◽

Ship Navigation ◽

Potential Methods ◽

Global Navigation Satellite

Vessel traffic in the Arctic is expanding in volume both within and transiting the region, yet the infrastructure necessary to support modern ship navigation is lacking. This includes aids to navigation such as buoys and beacons that can be difficult to place and maintain in this hostile environment that stretches across vast distances. The results of research are described which determine whether virtual electronic Aids to Navigation (eAtoN) existing entirely as digital information objects can overcome the practical limitations of physical aids to navigation (AtoN) and Automatic Identification System (AIS) radio eAtoN. Capabilities unique to virtual eAtoN that are not available using either physical or AIS radio technologies are also examined including dynamic and real time properties and immunity to Global Navigation Satellite System (GNSS) and AIS spoofing, aliasing, denial of service attacks and service outages. Conclusions are provided describing potential methods of deployment based upon similar concepts already in use.

Download Full-text

Actor–critic-based decision-making method for the artificial intelligence commander in tactical wargames

The Journal of Defense Modeling and Simulation Applications Methodology Technology ◽

10.1177/1548512920954542 ◽

2020 ◽

pp. 154851292095454

Author(s):

Junfeng Zhang ◽

Qing Xue

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Decision Making ◽

Reinforcement Learning ◽

Convolutional Neural Network ◽

Difficult Problem ◽

Learning Method ◽

Rule Based ◽

Autonomous Decision ◽

Decision Making Problem

In a tactical wargame, the decisions of the artificial intelligence (AI) commander are critical to the final combat result. Due to the existence of fog-of-war, AI commanders are faced with unknown and invisible information on the battlefield and lack of understanding of the situation, and it is difficult to make appropriate tactical strategies. The traditional knowledge rule-based decision-making method lacks flexibility and autonomy. How to make flexible and autonomous decision-making when facing complex battlefield situations is a difficult problem. This paper aims to solve the decision-making problem of the AI commander by using the deep reinforcement learning (DRL) method. We develop a tactical wargame as the research environment, which contains built-in script AI and supports the machine–machine combat mode. On this basis, an end-to-end actor–critic framework for commander decision making based on the convolutional neural network is designed to represent the battlefield situation and the reinforcement learning method is used to try different tactical strategies. Finally, we carry out a combat experiment between a DRL-based agent and a rule-based agent in a jungle terrain scenario. The result shows that the AI commander who adopts the actor–critic method successfully learns how to get a higher score in the tactical wargame, and the DRL-based agent has a higher winning ratio than the rule-based agent.

Download Full-text