Research on Efficient Reinforcement Learning for Adaptive Frequency-Agility Radar

Modern radar jamming scenarios are complex and changeable. In order to improve the adaptability of frequency-agile radar under complex environmental conditions, reinforcement learning (RL) is introduced into the radar anti-jamming research. There are two aspects of the radar system that do not obey with the Markov decision process (MDP), which is the basic theory of RL: Firstly, the radar cannot confirm the interference rules of the jammer in advance, resulting in unclear environmental boundaries; secondly, the radar has frequency-agility characteristics, which does not meet the sequence change requirements of the MDP. As the existing RL algorithm is directly applied to the radar system, there would be problems, such as low sample utilization rate, poor computational efficiency and large error oscillation amplitude. In this paper, an adaptive frequency agile radar anti-jamming efficient RL model is proposed. First, a radar-jammer system model based on Markov game (MG) established, and the Nash equilibrium point determined and set as a dynamic environment boundary. Subsequently, the state and behavioral structure of RL model is improved to be suitable for processing frequency-agile data. Experiments that our proposal effectively the anti-jamming performance and efficiency of frequency-agile radar.

Download Full-text

Injection Mold Production Sustainable Scheduling Using Deep Reinforcement Learning

Sustainability ◽

10.3390/su12208718 ◽

2020 ◽

Vol 12 (20) ◽

pp. 8718 ◽

Cited By ~ 1

Author(s):

Seunghoon Lee ◽

Yongju Cho ◽

Young Hoon Lee

Keyword(s):

Reinforcement Learning ◽

Production Scheduling ◽

Dynamic Environment ◽

Total Weighted Tardiness ◽

Injection Mold ◽

Weighted Tardiness ◽

Scheduling Policy ◽

Delivery Date ◽

Process Framework ◽

Markov Decision

In the injection mold industry, it is important for manufacturers to satisfy the delivery date for the products that customers order. The mold products are diverse, and each product has a different manufacturing process. Owing to the nature of mold, mold manufacturing is a complex and dynamic environment. To meet the delivery date of the customers, the scheduling of mold production is important and is required to be sustainable and intelligent even in the complicated system and dynamic situation. To address this, in this paper, deep reinforcement learning (RL) is proposed for injection mold production scheduling. Before presenting the RL algorithm, a mathematical model for the mold scheduling problem is presented, and a Markov decision process framework is proposed for RL. The deep Q-network, which is an algorithm for RL, is employed to find the scheduling policy to minimize the total weighted tardiness. The results of experiments demonstrate that the proposed deep RL method outperforms the dispatching rules that are presented for minimizing the total weighted tardiness.

Download Full-text

Reinforcement Learning: A Survey

Journal of Artificial Intelligence Research ◽

10.1613/jair.301 ◽

1996 ◽

Vol 4 ◽

pp. 237-285 ◽

Cited By ~ 2897

Author(s):

L. P. Kaelbling ◽

M. L. Littman ◽

A. W. Moore

Keyword(s):

Reinforcement Learning ◽

Dynamic Environment ◽

Delayed Reinforcement ◽

Trial And Error ◽

Exploration And Exploitation ◽

Markov Decision Theory ◽

Markov Decision ◽

Historical Basis ◽

And Coping ◽

Selection Of

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.

Download Full-text

Dual Dynamic Scheduling for Hierarchical QoS in Uplink-NOMA: A Reinforcement Learning Approach

Sensors ◽

10.3390/s21134404 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4404

Author(s):

Xiangjun Li ◽

Qimei Cui ◽

Jinli Zhai ◽

Xueqing Huang

Keyword(s):

Reinforcement Learning ◽

Power Control ◽

Interference Cancellation ◽

Dynamic Scheduling ◽

Dynamic Environment ◽

Spectrum Efficiency ◽

System Capacity ◽

Power Domain ◽

Markov Decision ◽

Fierce Competition

The demand for bandwidth-intensive and delay-sensitive services is surging daily with the development of 5G technology, resulting in fierce competition for scarce radio resources. Power domain Nonorthogonal Multiple Access (NOMA) technologies can dramatically improve system capacity and spectrum efficiency. Unlike existing NOMA scheduling that mainly focuses on fairness, this paper proposes a power control solution for uplink hybrid OMA and PD-NOMA in dual dynamic environments: dynamic and imperfect channel information together with the random user-specific hierarchical quality of service (QoS). This paper models the power control problem as a nonconvex stochastic, which aims to maximize system energy efficiency while guaranteeing hierarchical user QoS requirements. Then, the problem is formulated as a partially observable Markov decision process (POMDP). Owing to the difficulty of modeling time-varying scenes, the urgency of fast convergency, the adaptability in a dynamic environment, and the continuity of the variables, a Deep Reinforcement Learning (DRL)-based method is proposed. This paper also transforms the hierarchical QoS constraint under the NOMA serial interference cancellation (SIC) scene to fit DRL. The simulation results verify the effectiveness and robustness of the proposed algorithm under a dual uncertain environment. As compared with the baseline Particle Swarm Optimization algorithm (PSO), the proposed DRL-based method has demonstrated satisfying performance.

Download Full-text

Improving Emergency Department Efficiency by Patient Scheduling Using Deep Reinforcement Learning

Healthcare ◽

10.3390/healthcare8020077 ◽

2020 ◽

Vol 8 (2) ◽

pp. 77 ◽

Cited By ~ 4

Author(s):

Seunghoon Lee ◽

Young Hoon Lee

Keyword(s):

Reinforcement Learning ◽

Treatment Process ◽

Waiting Times ◽

Dynamic Environment ◽

Decision Makers ◽

Dispatching Rules ◽

Successful Implementation ◽

Patient Scheduling ◽

Markov Decision ◽

The Mathematical Model

Emergency departments (ED) in hospitals usually suffer from crowdedness and long waiting times for treatment. The complexity of the patient’s path flows and their controls come from the patient’s diverse acute level, personalized treatment process, and interconnected medical staff and resources. One of the factors, which has been controlled, is the dynamic situation change such as the patient’s composition and resources’ availability. The patient’s scheduling is thus complicated in consideration of various factors to achieve ED efficiency. To address this issue, a deep reinforcement learning (RL) is designed and applied in an ED patients’ scheduling process. Before applying the deep RL, the mathematical model and the Markov decision process (MDP) for the ED is presented and formulated. Then, the algorithm of the RL based on deep Q -networks (DQN) is designed to determine the optimal policy for scheduling patients. To evaluate the performance of the deep RL, it is compared with the dispatching rules presented in the study. The deep RL is shown to outperform the dispatching rules in terms of minimizing the weighted waiting time of the patients and the penalty of emergent patients in the suggested scenarios. This study demonstrates the successful implementation of the deep RL for ED applications, particularly in assisting decision-makers under the dynamic environment of an ED.

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text