Cognitive Electronic Jamming Decision-Making Method Based on Improved Q -Learning Algorithm

In this paper, a cognitive electronic jamming decision-making method based on improved Q -learning is proposed to improve the efficiency of radar jamming decision-making. First, the method adopts the simulated annealing (SA) algorithm’s Metropolis criterion to enhance the exploration strategy, balancing the contradictory relationship between exploration and utilization in the algorithm to avoid falling into local optima. At the same time, the idea of stochastic gradient descent with warm restarts (SGDR) is introduced to improve the learning rate of the algorithm, which reduces the oscillation and improves convergence speed at the later stage of the algorithm iteration. Then, a cognitive electronic jamming decision-making model is constructed, and the improved Q -learning algorithm’s specific steps are given. The simulation experiment takes a multifunctional radar as an example to analyze the influence of exploration strategy and learning rate on decision-making performance. The results reveal that compared with the traditional Q -learning algorithm, the improved Q -learning algorithm proposed in this paper can fully explore and efficiently utilize and converge the results to a better solution at a faster speed. The number of iterations can be reduced to more than 50%, which proves the feasibility and effectiveness of the method applied to cognitive electronic jamming decision-making.

Download Full-text

Asymptotics of Reinforcement Learning with Neural Networks

Stochastic Systems ◽

10.1287/stsy.2021.0072 ◽

2021 ◽

Author(s):

Justin Sirignano ◽

Konstantinos Spiliopoulos

Keyword(s):

Differential Equation ◽

Neural Networks ◽

Stationary Solution ◽

Gradient Descent ◽

Learning Algorithm ◽

Single Layer ◽

Stochastic Gradient Descent ◽

Distributed Data ◽

Limiting Behavior ◽

Q Learning

We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution that is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on independent and identically distributed data with stochastic gradient descent under the widely used Xavier initialization.

Download Full-text

Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/422 ◽

2020 ◽

Author(s):

Bowen Weng ◽

Huaqing Xiong ◽

Yingbin Liang ◽

Wei Zhang

Keyword(s):

Convergence Rate ◽

Gradient Descent ◽

Learning Algorithm ◽

Learning Algorithms ◽

Linear Quadratic Regulator ◽

Stochastic Gradient Descent ◽

Learning Method ◽

Linear Quadratic ◽

Q Learning ◽

Moment Estimation

Existing convergence analyses of Q-learning mostly focus on the vanilla stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment Estimation (Adam) has been commonly used for practical Q-learning algorithms, there has not been any convergence guarantee provided for Q-learning with such type of updates. In this paper, we first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis). To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. The convergence rate of Q-AMSGradR is also established. Our experiments on a linear quadratic regulator problem demonstrate that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates. The two algorithms also exhibit significantly better performance than the DQN learning method over a batch of Atari 2600 games.

Download Full-text

Decision-Making for the Autonomous Navigation of Maritime Autonomous Surface Ships Based on Scene Division and Deep Reinforcement Learning

Sensors ◽

10.3390/s19184055 ◽

2019 ◽

Vol 19 (18) ◽

pp. 4055 ◽

Cited By ~ 9

Author(s):

Zhang ◽

Wang ◽

Liu ◽

Chen

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Collision Avoidance ◽

Autonomous Navigation ◽

Learning Algorithm ◽

Q Learning ◽

Reward Function ◽

International Regulations ◽

Convergence Trend ◽

Decision Making Model

This research focuses on the adaptive navigation of maritime autonomous surface ships (MASSs) in an uncertain environment. To achieve intelligent obstacle avoidance of MASSs in a port, an autonomous navigation decision-making model based on hierarchical deep reinforcement learning is proposed. The model is mainly composed of two layers: the scene division layer and an autonomous navigation decision-making layer. The scene division layer mainly quantifies the sub-scenarios according to the International Regulations for Preventing Collisions at Sea (COLREG). This research divides the navigational situation of a ship into entities and attributes based on the ontology model and Protégé language. In the decision-making layer, we designed a deep Q-learning algorithm utilizing the environmental model, ship motion space, reward function, and search strategy to learn the environmental state in a quantized sub-scenario to train the navigation strategy. Finally, two sets of verification experiments of the deep reinforcement learning (DRL) and improved DRL algorithms were designed with Rizhao port as a study case. Moreover, the experimental data were analyzed in terms of the convergence trend, iterative path, and collision avoidance effect. The results indicate that the improved DRL algorithm could effectively improve the navigation safety and collision avoidance.

Download Full-text

Mobile Robot Path Planning using Q-Learning with Guided Distance

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.27.22480 ◽

2018 ◽

Vol 7 (4.27) ◽

pp. 57

Author(s):

Ee Soong Low ◽

Pauline Ong ◽

Cheng Yee Low

Keyword(s):

Decision Making ◽

Path Planning ◽

Mobile Robot ◽

Learning Algorithm ◽

Robot Path Planning ◽

Local Optima ◽

Q Learning ◽

Random Direction ◽

Robot Path

In path planning for mobile robot, classical Q-learning algorithm requires high iteration counts and longer time taken to achieve convergence. This is due to the beginning stage of classical Q-learning for path planning consists of mostly exploration, involving random direction decision making. This paper proposed the addition of distance aspect into direction decision making in Q-learning. This feature is used to reduce the time taken for the Q-learning to fully converge. In the meanwhile, random direction decision making is added and activated when mobile robot gets trapped in local optima. This strategy enables the mobile robot to escape from local optimal trap. The results show that the time taken for the improved Q-learning with distance guiding to converge is longer than the classical Q-learning. However, the total number of steps used is lower than the classical Q-learning.

Download Full-text

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Entropy ◽

10.3390/e23060737 ◽

2021 ◽

Vol 23 (6) ◽

pp. 737

Author(s):

Fengjie Sun ◽

Xianchang Wang ◽

Rui Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Optimal Policy ◽

Feasible Solution ◽

Learning Algorithm ◽

Plant Protection ◽

Agricultural Plant ◽

Q Learning ◽

Aerial Vehicle ◽

Optimal Action

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

Download Full-text

Real-Time AI-Based Informational Decision-Making Support System Utilizing Dynamic Text Sources

Applied Sciences ◽

10.3390/app11136237 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6237

Author(s):

Azharul Islam ◽

KyungHi Chang

Keyword(s):

Machine Learning ◽

Decision Making ◽

Random Forest ◽

Support System ◽

Classification Accuracy ◽

Short Term Memory ◽

Learning Algorithm ◽

Unstructured Data ◽

Stochastic Gradient Descent ◽

Decision Making Support

Unstructured data from the internet constitute large sources of information, which need to be formatted in a user-friendly way. This research develops a model that classifies unstructured data from data mining into labeled data, and builds an informational and decision-making support system (DMSS). We often have assortments of information collected by mining data from various sources, where the key challenge is to extract valuable information. We observe substantial classification accuracy enhancement for our datasets with both machine learning and deep learning algorithms. The highest classification accuracy (99% in training, 96% in testing) was achieved from a Covid corpus which is processed by using a long short-term memory (LSTM). Furthermore, we conducted tests on large datasets relevant to the Disaster corpus, with an LSTM classification accuracy of 98%. In addition, random forest (RF), a machine learning algorithm, provides a reasonable 84% accuracy. This research’s main objective is to increase the application’s robustness by integrating intelligence into the developed DMSS, which provides insight into the user’s intent, despite dealing with a noisy dataset. Our designed model selects the random forest and stochastic gradient descent (SGD) algorithms’ F1 score, where the RF method outperforms by improving accuracy by 2% (to 83% from 81%) compared with a conventional method.

Download Full-text

Enhancement of Multilayer Perceptron Model Training Accuracy through the Optimization of Hyperparameters: A Case Study of the Quality Prediction of Injection Molded Parts

10.21203/rs.3.rs-685234/v1 ◽

2021 ◽

Author(s):

Kun-Cheng Ke ◽

Ming-Shyan Huang

Keyword(s):

Gradient Descent ◽

Training Model ◽

Learning Rate ◽

Stochastic Gradient Descent ◽

Activation Functions ◽

Molded Parts ◽

Injection Molded ◽

Testing Accuracy ◽

Model Training

Abstract Injection molding has been broadly used in the mass production of plastic parts and must meet the requirements of efficiency and quality consistency. Machine learning can effectively predict the quality of injection molded part. However, the performance of machine learning models largely depends on the accuracy of the training. Hyperparameters such as activation functions, momentum, and learning rate are crucial to the accuracy and efficiency of model training. This research further analyzed the influence of hyperparameters on testing accuracy, explored the corresponding optimal learning rate, and provided the optimal training model for predicting the quality of injection molded parts. In this study, stochastic gradient descent (SGD) and stochastic gradient descent with momentum were used to optimize the artificial neural network model. Through optimization of these training model hyperparameters, the width testing accuracy of the injection product improved. The experimental results indicated that in the absence of momentum effects, all five activation functions can achieve more than 90% of the training accuracy with a learning rate of 0.1. Moreover, when optimized with the SGD, the learning rate of the Sigmoid activation function was 0.1, and the testing accuracy reached 95.8%. Although momentum had the least influence on accuracy, it affected the convergence speed of the Sigmoid function, which reduced the number of required learning iterations (82.4% reduction rate). Optimizing hyperparameter settings can improve the accuracy of model testing and markedly reduce training time.

Download Full-text

Contactless Li-Ion Battery Voltage Detection by Using Walabot and Machine Learning

Volume 9: 15th IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications ◽

10.1115/detc2019-97668 ◽

2019 ◽

Author(s):

Yanan Wang ◽

Haoyu Niu ◽

Tiebiao Zhao ◽

Xiaozhong Liao ◽

Lei Dong ◽

...

Keyword(s):

Machine Learning ◽

Gradient Descent ◽

Learning Algorithm ◽

Three Dimensional ◽

Lithium Ion ◽

Principal Component ◽

Stochastic Gradient Descent ◽

Li Ion Battery ◽

Linear Discriminant ◽

Li Ion

Abstract This paper has proposed a contactless voltage classification method for Lithium-ion batteries (LIBs). With a three-dimensional radio-frequency based sensor called Walabot, voltage data of LIBs can be collected in a contactless way. Then three machine learning algorithm, that is, principal component analysis (PCA), linear discriminant analysis (LDA), and stochastic gradient descent (SGD) classifiers, have been employed for data processing. Experiments and comparison have been conducted to verify the proposed method. The colormaps of results and prediction accuracy show that LDA may be most suitable for LIBs voltage classification.

Download Full-text

Hyperparameter-free optimizer of stochastic gradient descent that incorporates unit correction and moment estimation

10.1101/348557 ◽

2018 ◽

Author(s):

Kazunori D Yamada

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Mathematical Optimization ◽

Descent Method ◽

Learning Rate ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Moment Estimation ◽

Estimation System

ABSTRACTIn the deep learning era, stochastic gradient descent is the most common method used for optimizing neural network parameters. Among the various mathematical optimization methods, the gradient descent method is the most naive. Adjustment of learning rate is necessary for quick convergence, which is normally done manually with gradient descent. Many optimizers have been developed to control the learning rate and increase convergence speed. Generally, these optimizers adjust the learning rate automatically in response to learning status. These optimizers were gradually improved by incorporating the effective aspects of earlier methods. In this study, we developed a new optimizer: YamAdam. Our optimizer is based on Adam, which utilizes the first and second moments of previous gradients. In addition to the moment estimation system, we incorporated an advantageous part of AdaDelta, namely a unit correction system, into YamAdam. According to benchmark tests on some common datasets, our optimizer showed similar or faster convergent performance compared to the existing methods. YamAdam is an option as an alternative optimizer for deep learning.

Download Full-text

Martial Arts Competitive Decision-Making Algorithm Based on Improved BP Neural Network

Journal of Healthcare Engineering ◽

10.1155/2021/9920186 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Huipeng Lv

Keyword(s):

Neural Network ◽

Decision Making ◽

Bp Neural Network ◽

Martial Arts ◽

Gradient Descent ◽

Step Length ◽

Learning Rate ◽

Main Body ◽

Scientific Analysis ◽

Neural Network Technology

The main body of modern Chinese martial arts competition is the strategy, and fighting has just started in sports competitions. Strategy and action correspond to each other and practice as a set. Therefore, constructing the Chinese martial arts competition decision-making algorithm and perfecting the martial arts competition are intuitive and essential. The formulation of martial arts competition strategies requires scientific analysis of athletic data and more accurate predictions. Based on this observation, this paper combines the popular neural network technology to propose a novel additional momentum-elastic gradient descent. The BP neural network adapts to the learning rate. The algorithm is improved for the traditional BP neural network, such as selecting learning step length, the difficulty of determining the size, and direction of the weight, and the learning rate is not easy to control. The experimental results show that this paper’s algorithm has improved both network scale and running time and can predict martial arts competition routines and formulate scientific strategies.

Download Full-text