scholarly journals Cognitive Electronic Jamming Decision-Making Method Based on Improved Q -Learning Algorithm

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Huiqin Li ◽  
Yanling Li ◽  
Chuan He ◽  
Jianwei Zhan ◽  
Hui Zhang

In this paper, a cognitive electronic jamming decision-making method based on improved Q -learning is proposed to improve the efficiency of radar jamming decision-making. First, the method adopts the simulated annealing (SA) algorithm’s Metropolis criterion to enhance the exploration strategy, balancing the contradictory relationship between exploration and utilization in the algorithm to avoid falling into local optima. At the same time, the idea of stochastic gradient descent with warm restarts (SGDR) is introduced to improve the learning rate of the algorithm, which reduces the oscillation and improves convergence speed at the later stage of the algorithm iteration. Then, a cognitive electronic jamming decision-making model is constructed, and the improved Q -learning algorithm’s specific steps are given. The simulation experiment takes a multifunctional radar as an example to analyze the influence of exploration strategy and learning rate on decision-making performance. The results reveal that compared with the traditional Q -learning algorithm, the improved Q -learning algorithm proposed in this paper can fully explore and efficiently utilize and converge the results to a better solution at a faster speed. The number of iterations can be reduced to more than 50%, which proves the feasibility and effectiveness of the method applied to cognitive electronic jamming decision-making.

2021 ◽  
Author(s):  
Justin Sirignano ◽  
Konstantinos Spiliopoulos

We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution that is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on independent and identically distributed data with stochastic gradient descent under the widely used Xavier initialization.


Author(s):  
Bowen Weng ◽  
Huaqing Xiong ◽  
Yingbin Liang ◽  
Wei Zhang

Existing convergence analyses of Q-learning mostly focus on the vanilla stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment Estimation (Adam) has been commonly used for practical Q-learning algorithms, there has not been any convergence guarantee provided for Q-learning with such type of updates. In this paper, we first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis). To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. The convergence rate of Q-AMSGradR is also established. Our experiments on a linear quadratic regulator problem demonstrate that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates. The two algorithms also exhibit significantly better performance than the DQN learning method over a batch of Atari 2600 games.


Sensors ◽  
2019 ◽  
Vol 19 (18) ◽  
pp. 4055 ◽  
Author(s):  
Zhang ◽  
Wang ◽  
Liu ◽  
Chen

This research focuses on the adaptive navigation of maritime autonomous surface ships (MASSs) in an uncertain environment. To achieve intelligent obstacle avoidance of MASSs in a port, an autonomous navigation decision-making model based on hierarchical deep reinforcement learning is proposed. The model is mainly composed of two layers: the scene division layer and an autonomous navigation decision-making layer. The scene division layer mainly quantifies the sub-scenarios according to the International Regulations for Preventing Collisions at Sea (COLREG). This research divides the navigational situation of a ship into entities and attributes based on the ontology model and Protégé language. In the decision-making layer, we designed a deep Q-learning algorithm utilizing the environmental model, ship motion space, reward function, and search strategy to learn the environmental state in a quantized sub-scenario to train the navigation strategy. Finally, two sets of verification experiments of the deep reinforcement learning (DRL) and improved DRL algorithms were designed with Rizhao port as a study case. Moreover, the experimental data were analyzed in terms of the convergence trend, iterative path, and collision avoidance effect. The results indicate that the improved DRL algorithm could effectively improve the navigation safety and collision avoidance.


2018 ◽  
Vol 7 (4.27) ◽  
pp. 57
Author(s):  
Ee Soong Low ◽  
Pauline Ong ◽  
Cheng Yee Low

In path planning for mobile robot, classical Q-learning algorithm requires high iteration counts and longer time taken to achieve convergence. This is due to the beginning stage of classical Q-learning for path planning consists of mostly exploration, involving random direction decision making. This paper proposed the addition of distance aspect into direction decision making in Q-learning. This feature is used to reduce the time taken for the Q-learning to fully converge. In the meanwhile, random direction decision making is added and activated when mobile robot gets trapped in local optima. This strategy enables the mobile robot to escape from local optimal trap. The results show that the time taken for the improved Q-learning with distance guiding to converge is longer than the classical Q-learning. However, the total number of steps used is lower than the classical Q-learning. 


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 737
Author(s):  
Fengjie Sun ◽  
Xianchang Wang ◽  
Rui Zhang

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.


2021 ◽  
Vol 11 (13) ◽  
pp. 6237
Author(s):  
Azharul Islam ◽  
KyungHi Chang

Unstructured data from the internet constitute large sources of information, which need to be formatted in a user-friendly way. This research develops a model that classifies unstructured data from data mining into labeled data, and builds an informational and decision-making support system (DMSS). We often have assortments of information collected by mining data from various sources, where the key challenge is to extract valuable information. We observe substantial classification accuracy enhancement for our datasets with both machine learning and deep learning algorithms. The highest classification accuracy (99% in training, 96% in testing) was achieved from a Covid corpus which is processed by using a long short-term memory (LSTM). Furthermore, we conducted tests on large datasets relevant to the Disaster corpus, with an LSTM classification accuracy of 98%. In addition, random forest (RF), a machine learning algorithm, provides a reasonable 84% accuracy. This research’s main objective is to increase the application’s robustness by integrating intelligence into the developed DMSS, which provides insight into the user’s intent, despite dealing with a noisy dataset. Our designed model selects the random forest and stochastic gradient descent (SGD) algorithms’ F1 score, where the RF method outperforms by improving accuracy by 2% (to 83% from 81%) compared with a conventional method.


2021 ◽  
Author(s):  
Kun-Cheng Ke ◽  
Ming-Shyan Huang

Abstract Injection molding has been broadly used in the mass production of plastic parts and must meet the requirements of efficiency and quality consistency. Machine learning can effectively predict the quality of injection molded part. However, the performance of machine learning models largely depends on the accuracy of the training. Hyperparameters such as activation functions, momentum, and learning rate are crucial to the accuracy and efficiency of model training. This research further analyzed the influence of hyperparameters on testing accuracy, explored the corresponding optimal learning rate, and provided the optimal training model for predicting the quality of injection molded parts. In this study, stochastic gradient descent (SGD) and stochastic gradient descent with momentum were used to optimize the artificial neural network model. Through optimization of these training model hyperparameters, the width testing accuracy of the injection product improved. The experimental results indicated that in the absence of momentum effects, all five activation functions can achieve more than 90% of the training accuracy with a learning rate of 0.1. Moreover, when optimized with the SGD, the learning rate of the Sigmoid activation function was 0.1, and the testing accuracy reached 95.8%. Although momentum had the least influence on accuracy, it affected the convergence speed of the Sigmoid function, which reduced the number of required learning iterations (82.4% reduction rate). Optimizing hyperparameter settings can improve the accuracy of model testing and markedly reduce training time.


Author(s):  
Yanan Wang ◽  
Haoyu Niu ◽  
Tiebiao Zhao ◽  
Xiaozhong Liao ◽  
Lei Dong ◽  
...  

Abstract This paper has proposed a contactless voltage classification method for Lithium-ion batteries (LIBs). With a three-dimensional radio-frequency based sensor called Walabot, voltage data of LIBs can be collected in a contactless way. Then three machine learning algorithm, that is, principal component analysis (PCA), linear discriminant analysis (LDA), and stochastic gradient descent (SGD) classifiers, have been employed for data processing. Experiments and comparison have been conducted to verify the proposed method. The colormaps of results and prediction accuracy show that LDA may be most suitable for LIBs voltage classification.


2018 ◽  
Author(s):  
Kazunori D Yamada

ABSTRACTIn the deep learning era, stochastic gradient descent is the most common method used for optimizing neural network parameters. Among the various mathematical optimization methods, the gradient descent method is the most naive. Adjustment of learning rate is necessary for quick convergence, which is normally done manually with gradient descent. Many optimizers have been developed to control the learning rate and increase convergence speed. Generally, these optimizers adjust the learning rate automatically in response to learning status. These optimizers were gradually improved by incorporating the effective aspects of earlier methods. In this study, we developed a new optimizer: YamAdam. Our optimizer is based on Adam, which utilizes the first and second moments of previous gradients. In addition to the moment estimation system, we incorporated an advantageous part of AdaDelta, namely a unit correction system, into YamAdam. According to benchmark tests on some common datasets, our optimizer showed similar or faster convergent performance compared to the existing methods. YamAdam is an option as an alternative optimizer for deep learning.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Huipeng Lv

The main body of modern Chinese martial arts competition is the strategy, and fighting has just started in sports competitions. Strategy and action correspond to each other and practice as a set. Therefore, constructing the Chinese martial arts competition decision-making algorithm and perfecting the martial arts competition are intuitive and essential. The formulation of martial arts competition strategies requires scientific analysis of athletic data and more accurate predictions. Based on this observation, this paper combines the popular neural network technology to propose a novel additional momentum-elastic gradient descent. The BP neural network adapts to the learning rate. The algorithm is improved for the traditional BP neural network, such as selecting learning step length, the difficulty of determining the size, and direction of the weight, and the learning rate is not easy to control. The experimental results show that this paper’s algorithm has improved both network scale and running time and can predict martial arts competition routines and formulate scientific strategies.


Sign in / Sign up

Export Citation Format

Share Document