scholarly journals Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

2021 ◽  
Vol 11 (3) ◽  
pp. 1291
Author(s):  
Bonwoo Gu ◽  
Yunsick Sung

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.

2021 ◽  
Vol 3 (6) ◽  
Author(s):  
John Akagi ◽  
T. Devon Morris ◽  
Brady Moon ◽  
Xingguang Chen ◽  
Cameron K. Peterson

Abstract Directing groups of unmanned air vehicles (UAVs) is a task that typically requires the full attention of several operators. This can be prohibitive in situations where an operator must pay attention to their surroundings. In this paper we present a gesture device that assists operators in commanding UAVs in focus-constrained environments. The operator influences the UAVs’ behavior by using intuitive hand gesture movements. Gestures are captured using an accelerometer and gyroscope and then classified using a logistic regression model. Ten gestures were chosen to provide behaviors for a group of fixed-wing UAVs. These behaviors specified various searching, following, and tracking patterns that could be used in a dynamic environment. A novel variant of the Monte Carlo Tree Search algorithm was developed to autonomously plan the paths of the cooperating UAVs. These autonomy algorithms were executed when their corresponding gesture was recognized by the gesture device. The gesture device was trained to classify the ten gestures and accurately identified them 95% of the time. Each of the behaviors associated with the gestures was tested in hardware-in-the-loop simulations and the ability to dynamically switch between them was demonstrated. The results show that the system can be used as a natural interface to assist an operator in directing a fleet of UAVs. Article highlights A gesture device was created that enables operators to command a group of UAVs in focus-constrained environments. Each gesture triggers high-level commands that direct a UAV group to execute complex behaviors. Software simulations and hardware-in-the-loop testing shows the device is effective in directing UAV groups.


Sign in / Sign up

Export Citation Format

Share Document