Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents

Xian Yeow Lee; Sambit Ghadai; Kai Liang Tan; Chinmay Hegde; Soumik Sarkar

doi:10.1609/aaai.v34i04.5887

Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5887 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4577-4584

Author(s):

Xian Yeow Lee ◽

Sambit Ghadai ◽

Kai Liang Tan ◽

Chinmay Hegde ◽

Soumik Sarkar

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Limited Resource ◽

Action Space ◽

Engineering Systems ◽

Physical Systems ◽

Learning Agents ◽

Look Ahead ◽

Real World Applications ◽

Temporal Dimensions

Robustness of Deep Reinforcement Learning (DRL) algorithms towards adversarial attacks in real world applications such as those deployed in cyber-physical systems (CPS) are of increasing concern. Numerous studies have investigated the mechanisms of attacks on the RL agent's state space. Nonetheless, attacks on the RL agent's action space (corresponding to actuators in engineering systems) are equally perverse, but such attacks are relatively less studied in the ML literature. In this work, we first frame the problem as an optimization problem of minimizing the cumulative reward of an RL agent with decoupled constraints as the budget of attack. We propose the white-box Myopic Action Space (MAS) attack algorithm that distributes the attacks across the action space dimensions. Next, we reformulate the optimization problem above with the same objective function, but with a temporally coupled constraint on the attack budget to take into account the approximated dynamics of the agent. This leads to the white-box Look-ahead Action Space (LAS) attack algorithm that distributes the attacks across the action and temporal dimensions. Our results showed that using the same amount of resources, the LAS attack deteriorates the agent's performance significantly more than the MAS attack. This reveals the possibility that with limited resource, an adversary can utilize the agent's dynamics to malevolently craft attacks that causes the agent to fail. Additionally, we leverage these attack strategies as a possible tool to gain insights on the potential vulnerabilities of DRL agents.

Download Full-text

A View on Deep Reinforcement Learning in Imperfect Information Games

Studia Universitatis Babeș-Bolyai Informatica ◽

10.24193/subbi.2020.2.03 ◽

2020 ◽

Vol 65 (2) ◽

pp. 31

Author(s):

T.V. Pricope

Keyword(s):

Reinforcement Learning ◽

Imperfect Information ◽

Large Scale ◽

Traditional Approach ◽

Search Space ◽

Fictitious Play ◽

Learning Agents ◽

Real World Applications ◽

Imperfect Information Games ◽

Human Player

Many real-world applications can be described as large-scale games of imperfect information. This kind of games is particularly harder than the deterministic one as the search space is even more sizeable. In this paper, I want to explore the power of reinforcement learning in such an environment; that is why I take a look at one of the most popular game of such type, no limit Texas Hold’em Poker, yet unsolved, developing multiple agents with different learning paradigms and techniques and then comparing their respective performances. When applied to no-limit Hold’em Poker, deep reinforcement learning agents clearly outperform agents with a more traditional approach. Moreover, if these last agents rival a human beginner level of play, the ones based on reinforcement learning compare to an amateur human player. The main algorithm uses Fictitious Play in combination with ANNs and some handcrafted metrics. We also applied the main algorithm to another game of imperfect information, less complex than Poker, in order to show the scalability of this solution and the increase in performance when put neck in neck with established classical approaches from the reinforcement learning literature.

Download Full-text

Robustifying Reinforcement Learning Agents via Action Space Adversarial Training

2020 American Control Conference (ACC) ◽

10.23919/acc45564.2020.9147846 ◽

2020 ◽

Cited By ~ 2

Author(s):

Kai Liang Tan ◽

Yasaman Esfandiari ◽

Xian Yeow Lee ◽

Aakanksha ◽

Soumik Sarkar

Keyword(s):

Reinforcement Learning ◽

Action Space ◽

Learning Agents ◽

Adversarial Training

Download Full-text

Query-based targeted action-space adversarial policies on deep reinforcement learning agents

Proceedings of the ACM/IEEE 12th International Conference on Cyber-Physical Systems ◽

10.1145/3450267.3450537 ◽

2021 ◽

Author(s):

Xian Yeow Lee ◽

Yasaman Esfandiari ◽

Kai Liang Tan ◽

Soumik Sarkar

Keyword(s):

Reinforcement Learning ◽

Action Space ◽

Learning Agents

Download Full-text

Analyzing Real Options and Flexibility in Engineering Systems Design using Decision Rules and Deep Reinforcement Learning

Journal of Mechanical Design ◽

10.1115/1.4052299 ◽

2021 ◽

pp. 1-31

Author(s):

Cesare Caputo ◽

Michel-Alexandre Cardin

Keyword(s):

Reinforcement Learning ◽

Real Options ◽

Decision Rules ◽

Systems Design ◽

Computational Design ◽

Risk Tolerance ◽

Action Space ◽

Waste To Energy ◽

Engineering Systems ◽

Standard Design

Abstract Engineering systems provide essential services to society e.g., power generation, transportation. Their performance, however, is directly affected by their ability to cope with uncertainty, especially given the realities of climate change and pandemics. Standard design methods often fail to recognize uncertainty in early conceptual activities, leading to rigid systems that are vulnerable to change. Real Options and Flexibility in Design are important paradigms to improve a system's ability to adapt and respond to unforeseen conditions. Existing approaches to analyze flexibility, however, do not leverage sufficiently recent developments in machine learning enabling deeper exploration of the computational design space. There is untapped potential for new solutions that are not readily accessible using existing methods. Here, a novel approach to analyze flexibility is proposed based on Deep Reinforcement Learning (DRL). It explores available datasets systematically and considers a wider range of adaptability strategies. The methodology is evaluated on an example waste-to-energy system. Low and high flexibility DRL models are compared against stochastically optimal inflexible and flexible solutions using decision rules. The results show highly dynamic solutions, with action space parametrized via artificial neural network. They show improved expected economic value up to 69% compared to previous solutions. Combining information from action space probability distributions along expert insights and risk tolerance helps make better decisions in real-world design and system operations. Out of sample testing shows that the policies are generalizable, but subject to tradeoffs between flexibility and inherent limitations of the learning process.

Download Full-text

Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i02.5599 ◽

2020 ◽

Vol 34 (02) ◽

pp. 2226-2235

Author(s):

Sanket Shah ◽

Sinha Arunesh ◽

Varakantham Pradeep ◽

Perrault Andrew ◽

Tambe Milind

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Stackelberg Game ◽

Time Window ◽

Time Windows ◽

Inequality Constraints ◽

Wait Time ◽

Limited Resource ◽

Fixed Time ◽

Action Space

Large-scale screening for potential threats with limited resources and capacity for screening is a problem of interest at airports, seaports, and other ports of entry. Adversaries can observe screening procedures and arrive at a time when there will be gaps in screening due to limited resource capacities. To capture this game between ports and adversaries, this problem has been previously represented as a Stackelberg game, referred to as a Threat Screening Game (TSG). Given the significant complexity associated with solving TSGs and uncertainty in arrivals of customers, existing work has assumed that screenees arrive and are allocated security resources at the beginning of the time-window. In practice, screenees such as airport passengers arrive in bursts correlated with flight time and are not bound by fixed time-windows. To address this, we propose an online threat screening model in which the screening strategy is determined adaptively as a passenger arrives while satisfying a hard bound on acceptable risk of not screening a threat. To solve the online problem, we first reformulate it as a Markov Decision Process (MDP) in which the hard bound on risk translates to a constraint on the action space and then solve the resultant MDP using Deep Reinforcement Learning (DRL). To this end, we provide a novel way to efficiently enforce linear inequality constraints on the action output in DRL. We show that our solution allows us to significantly reduce screenee wait time without compromising on the risk.

Download Full-text

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text

Testing self-healing cyber-physical systems under uncertainty with reinforcement learning: an empirical study

Empirical Software Engineering ◽

10.1007/s10664-021-09941-z ◽

2021 ◽

Vol 26 (3) ◽

Author(s):

Tao Ma ◽

Shaukat Ali ◽

Tao Yue

Keyword(s):

Reinforcement Learning ◽

Empirical Study ◽

Cyber Physical Systems ◽

Self Healing ◽

Physical Systems

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

FPGA Acceleration of ROS2-Based Reinforcement Learning Agents

2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) ◽

10.1109/candarw51189.2020.00031 ◽

2020 ◽

Author(s):

Daniel Pinheiro Leal ◽

Midori Sugaya ◽

Hideharu Amano ◽

Takeshi Ohkawa

Keyword(s):

Reinforcement Learning ◽

Learning Agents ◽

Fpga Acceleration

Download Full-text

A game strategy model in the digital curling system based on NFSP

Complex & Intelligent Systems ◽

10.1007/s40747-021-00345-6 ◽

2021 ◽

Author(s):

Yuntao Han ◽

Qibin Zhou ◽

Fuqing Duan

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Action Space ◽

Learning Networks ◽

Game Tree ◽

Continuous Action ◽

Extensive Game ◽

Strategy Model ◽

Zero Sum ◽

Tree Searching

AbstractThe digital curling game is a two-player zero-sum extensive game in a continuous action space. There are some challenging problems that are still not solved well, such as the uncertainty of strategy, the large game tree searching, and the use of large amounts of supervised data, etc. In this work, we combine NFSP and KR-UCT for digital curling games, where NFSP uses two adversary learning networks and can automatically produce supervised data, and KR-UCT can be used for large game tree searching in continuous action space. We propose two reward mechanisms to make reinforcement learning converge quickly. Experimental results validate the proposed method, and show the strategy model can reach the Nash equilibrium.

Download Full-text