Jamming Strategy Optimization through Dual Q-Learning Model against Adaptive Radar

Modern adaptive radars can switch work modes to perform various missions and simultaneously use pulse parameter agility in each mode to improve survivability, which leads to a multiplicative increase in the decision-making complexity and declining performance of the existing jamming methods. In this paper, a two-level jamming decision-making framework is developed, based on which a dual Q-learning (DQL) model is proposed to optimize the jamming strategy and a dynamic method for jamming effectiveness evaluation is designed to update the model. Specifically, the jamming procedure is modeled as a finite Markov decision process. On this basis, the high-dimensional jamming action space is disassembled into two low-dimensional subspaces containing jamming mode and pulse parameters respectively, then two specialized Q-learning models with interaction are built to obtain the optimal solution. Moreover, the jamming effectiveness is evaluated through indicator vector distance measuring to acquire the feedback for the DQL model, where indicators are dynamically weighted to adapt to the environment. The experiments demonstrate the advantage of the proposed method in learning radar joint strategy of mode switching and parameter agility, shown as improving the average jamming-to-signal radio (JSR) by 4.05% while reducing the convergence time by 34.94% compared with the normal Q-learning method.

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text

A two-sided matching decision-making approach based on regret theory under intuitionistic fuzzy environment

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202720 ◽

2021 ◽

pp. 1-18

Author(s):

Xiang Jia ◽

Xinfan Wang ◽

Yuanfang Zhu ◽

Lang Zhou ◽

Huan Zhou

Keyword(s):

Decision Making ◽

Hamming Distance ◽

Optimal Solution ◽

Intuitionistic Fuzzy Sets ◽

Regret Theory ◽

Multi Objective ◽

Intuitionistic Fuzzy ◽

Fuzzy Environment ◽

Objective Model ◽

Single Objective

This study proposes a two-sided matching decision-making (TSMDM) approach by combining the regret theory under the intuitionistic fuzzy environment. At first, according to the Hamming distance of intuitionistic fuzzy sets and regret theory, superior and inferior flows are defined to describe the comparative preference of subjects. Hereafter, the satisfaction degrees are obtained by integrating the superior and inferior flows of the subjects. The comprehensive satisfaction degrees are calculated by aggregating the satisfaction degrees, based on which, a multi-objective TSMDM model is built. Furthermore, the multi-objective TSMDM model is converted to a single-objective model, the optimal solution of the latter is derived. Finally, an illustrative example and several analyses are provided to verify the feasibility and the effectiveness of the proposed approach.

Download Full-text

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Entropy ◽

10.3390/e23060737 ◽

2021 ◽

Vol 23 (6) ◽

pp. 737

Author(s):

Fengjie Sun ◽

Xianchang Wang ◽

Rui Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Optimal Policy ◽

Feasible Solution ◽

Learning Algorithm ◽

Plant Protection ◽

Agricultural Plant ◽

Q Learning ◽

Aerial Vehicle ◽

Optimal Action

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

Download Full-text

Temporal concatenation for Markov decision processes

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964821000206 ◽

2021 ◽

pp. 1-28

Author(s):

Ruiyang Song ◽

Kuang Xu

Keyword(s):

Markov Decision Processes ◽

Large Scale ◽

Optimal Solution ◽

Upper Bounds ◽

Black Box ◽

Decision Processes ◽

Optimal Solutions ◽

Wide Range ◽

Markov Decision ◽

Speed Up

We propose and analyze a temporal concatenation heuristic for solving large-scale finite-horizon Markov decision processes (MDP), which divides the MDP into smaller sub-problems along the time horizon and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a “black box” architecture, temporal concatenation works with a wide range of existing MDP algorithms. Our main results characterize the regret of temporal concatenation compared to the optimal solution. We provide upper bounds for general MDP instances, as well as a family of MDP instances in which the upper bounds are shown to be tight. Together, our results demonstrate temporal concatenation's potential of substantial speed-up at the expense of some performance degradation.

Download Full-text

Economic Development Based on a Mathematical Model: An Optimal Solution Method for the Fuel Supply of International Road Transport Activity

Energies ◽

10.3390/en14102963 ◽

2021 ◽

Vol 14 (10) ◽

pp. 2963

Author(s):

Melinda Timea Fülöp ◽

Miklós Gubán ◽

György Kovács ◽

Mihály Avornicului

Keyword(s):

Decision Making ◽

Ad Hoc ◽

Market Competition ◽

Optimal Solution ◽

Cost Savings ◽

Cost Effective ◽

Optimization Method ◽

Optimal Decision ◽

Environmental Damage ◽

Fuel Cost

Due to globalization and increased market competition, forwarding companies must focus on the optimization of their international transport activities and on cost reduction. The minimization of the amount and cost of fuel results in increased competition and profitability of the companies as well as the reduction of environmental damage. Nowadays, these aspects are particularly important. This research aims to develop a new optimization method for road freight transport costs in order to reduce the fuel costs and determine optimal fueling stations and to calculate the optimal quantity of fuel to refill. The mathematical method developed in this research has two phases. In the first phase the optimal, most cost-effective fuel station is determined based on the potential fuel stations. The specific fuel prices differ per fuel station, and the stations are located at different distances from the main transport way. The method developed in this study supports drivers’ decision-making regarding whether to refuel at a farther but cheaper fuel station or at a nearer but more expensive fuel station based on the more economical choice. Thereafter, it is necessary to determine the optimal fuel volume, i.e., the exact volume required including a safe amount to cover stochastic incidents (e.g., road closures). This aspect of the optimization method supports drivers’ optimal decision-making regarding optimal fuel stations and how much fuel to obtain in order to reduce the fuel cost. Therefore, the application of this new method instead of the recently applied ad-hoc individual decision-making of the drivers results in significant fuel cost savings. A case study confirmed the efficiency of the proposed method.

Download Full-text

The Global Sustainability Index: An Instrument For Assessing The Progress Towards The Sustainable Organization

ACTA Universitatis Cibiniensis ◽

10.1515/aucts-2015-0093 ◽

2015 ◽

Vol 67 (1) ◽

pp. 215-220 ◽

Cited By ~ 2

Author(s):

Valentin Grecu

Keyword(s):

Decision Making ◽

Sustainable Development ◽

Optimal Solution ◽

Decision Makers ◽

Decision Making Process ◽

Sustainability Index ◽

Global Sustainability ◽

Multiple Indicators ◽

Mathematical Algorithms ◽

General Indicator

Abstract There is rarely an optimal solution in sustainable development but most frequently a need to build compromises between conflicting aspects such as economic, social and environmental ones and different expectations of stakeholders. Moreover, information is rarely available and precise. This paper will focus on how to use indicators to monitor sustainable development, integrating the information provided by many of them into a complex general sustainability index. Having this general indicator is essential for decision makers as it is very complicated to evaluate the performance of the organization based on multiple indicators. The objective of this paper is to find mathematical algorithms for simplifying the decision-making process by offering an instrument for the evaluation of the sustainability progress.

Download Full-text

Markov decision processes for integrating life cycle dynamics into fab-level decision making

IFAC Proceedings Volumes ◽

10.1016/s1474-6670(17)56827-5 ◽

1999 ◽

Vol 32 (2) ◽

pp. 4852-4857

Author(s):

Shalabh Bhatnagar ◽

Michael C. Fu ◽

Steven I. Marcus ◽

Ying He

Keyword(s):

Decision Making ◽

Life Cycle ◽

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision

Download Full-text

Autonomous Decision-Making While Drilling

Energies ◽

10.3390/en14040969 ◽

2021 ◽

Vol 14 (4) ◽

pp. 969

Author(s):

Eric Cayeux ◽

Benoît Daireaux ◽

Adrian Ambrus ◽

Rodica Mihai ◽

Liv Carlsen

Keyword(s):

Decision Making ◽

Internal State ◽

Drilling Process ◽

Autonomous Decision ◽

Internal States ◽

Markov Decision ◽

Drilling Operations ◽

Drilling System ◽

Drilling Conditions ◽

Erratic Behavior

The drilling process is complex because unexpected situations may occur at any time. Furthermore, the drilling system is extremely long and slender, therefore prone to vibrations and often being dominated by long transient periods. Adding the fact that measurements are not well distributed along the drilling system, with the majority of real-time measurements only available at the top side and having only access to very sparse data from downhole, the drilling process is poorly observed therefore making it difficult to use standard control methods. Therefore, to achieve completely autonomous drilling operations, it is necessary to utilize a method that is capable of estimating the internal state of the drilling system from parsimonious information while being able to make decisions that will keep the operation safe but effective. A solution enabling autonomous decision-making while drilling has been developed. It relies on an optimization of the time to reach the section total depth (TD). The estimated time to reach the section TD is decomposed into the effective time spent in conducting the drilling operation and the likely time lost to solve unexpected drilling events. This optimization problem is solved by using a Markov decision process method. Several example scenarios have been run in a virtual rig environment to test the validity of the concept. It is found that the system is capable to adapt itself to various drilling conditions, as for example being aggressive when the operation runs smoothly and the estimated uncertainty of the internal states is low, but also more cautious when the downhole drilling conditions deteriorate or when observations tend to indicate more erratic behavior, which is often observed prior to a drilling event.

Download Full-text

Operational State Evaluation and Maintenance Decision-making Method for Multi-state CNC Machine Tools based on Partially Observable Markov Decision Process

2020 International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC) ◽

10.1109/sdpc49476.2020.9353134 ◽

2020 ◽

Author(s):

Fang Zixuan ◽

Wang Xiaodong ◽

Wang Lifang

Keyword(s):

Decision Making ◽

Markov Decision Process ◽

Decision Process ◽

Machine Tools ◽

Cnc Machine Tools ◽

Cnc Machine ◽

State Evaluation ◽

Markov Decision ◽

Maintenance Decision ◽

Partially Observable

Download Full-text

Deterministic policies based on maximum regrets in MDPs with imprecise rewards

AI Communications ◽

10.3233/aic-190632 ◽

2021 ◽

pp. 1-16

Author(s):

Pegah Alizadeh ◽

Emiliano Traversi ◽

Aomar Osmani

Keyword(s):

Decision Making ◽

Decision Process ◽

Process Models ◽

Sequential Decision Making ◽

Sequential Decision ◽

Exact Procedure ◽

Markov Decision ◽

Intuitive Idea ◽

First Time ◽

Maximum Regret

Markov Decision Process Models (MDPs) are a powerful tool for planning tasks and sequential decision-making issues. In this work we deal with MDPs with imprecise rewards, often used when dealing with situations where the data is uncertain. In this context, we provide algorithms for finding the policy that minimizes the maximum regret. To the best of our knowledge, all the regret-based methods proposed in the literature focus on providing an optimal stochastic policy. We introduce for the first time a method to calculate an optimal deterministic policy using optimization approaches. Deterministic policies are easily interpretable for users because for a given state they provide a unique choice. To better motivate the use of an exact procedure for finding a deterministic policy, we show some (theoretical and experimental) cases where the intuitive idea of using a deterministic policy obtained after “determinizing” the optimal stochastic policy leads to a policy far from the exact deterministic policy.

Download Full-text