Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning

Sanket Shah; Sinha Arunesh; Varakantham Pradeep; Perrault Andrew; Tambe Milind

doi:10.1609/aaai.v34i02.5599

Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i02.5599 ◽

2020 ◽

Vol 34 (02) ◽

pp. 2226-2235

Author(s):

Sanket Shah ◽

Sinha Arunesh ◽

Varakantham Pradeep ◽

Perrault Andrew ◽

Tambe Milind

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Stackelberg Game ◽

Time Window ◽

Time Windows ◽

Inequality Constraints ◽

Wait Time ◽

Limited Resource ◽

Fixed Time ◽

Action Space

Large-scale screening for potential threats with limited resources and capacity for screening is a problem of interest at airports, seaports, and other ports of entry. Adversaries can observe screening procedures and arrive at a time when there will be gaps in screening due to limited resource capacities. To capture this game between ports and adversaries, this problem has been previously represented as a Stackelberg game, referred to as a Threat Screening Game (TSG). Given the significant complexity associated with solving TSGs and uncertainty in arrivals of customers, existing work has assumed that screenees arrive and are allocated security resources at the beginning of the time-window. In practice, screenees such as airport passengers arrive in bursts correlated with flight time and are not bound by fixed time-windows. To address this, we propose an online threat screening model in which the screening strategy is determined adaptively as a passenger arrives while satisfying a hard bound on acceptable risk of not screening a threat. To solve the online problem, we first reformulate it as a Markov Decision Process (MDP) in which the hard bound on risk translates to a constraint on the action space and then solve the resultant MDP using Deep Reinforcement Learning (DRL). To this end, we provide a novel way to efficiently enforce linear inequality constraints on the action output in DRL. We show that our solution allows us to significantly reduce screenee wait time without compromising on the risk.

Download Full-text

Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5887 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4577-4584

Author(s):

Xian Yeow Lee ◽

Sambit Ghadai ◽

Kai Liang Tan ◽

Chinmay Hegde ◽

Soumik Sarkar

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Limited Resource ◽

Action Space ◽

Engineering Systems ◽

Physical Systems ◽

Learning Agents ◽

Look Ahead ◽

Real World Applications ◽

Temporal Dimensions

Robustness of Deep Reinforcement Learning (DRL) algorithms towards adversarial attacks in real world applications such as those deployed in cyber-physical systems (CPS) are of increasing concern. Numerous studies have investigated the mechanisms of attacks on the RL agent's state space. Nonetheless, attacks on the RL agent's action space (corresponding to actuators in engineering systems) are equally perverse, but such attacks are relatively less studied in the ML literature. In this work, we first frame the problem as an optimization problem of minimizing the cumulative reward of an RL agent with decoupled constraints as the budget of attack. We propose the white-box Myopic Action Space (MAS) attack algorithm that distributes the attacks across the action space dimensions. Next, we reformulate the optimization problem above with the same objective function, but with a temporally coupled constraint on the attack budget to take into account the approximated dynamics of the agent. This leads to the white-box Look-ahead Action Space (LAS) attack algorithm that distributes the attacks across the action and temporal dimensions. Our results showed that using the same amount of resources, the LAS attack deteriorates the agent's performance significantly more than the MAS attack. This reveals the possibility that with limited resource, an adversary can utilize the agent's dynamics to malevolently craft attacks that causes the agent to fail. Additionally, we leverage these attack strategies as a possible tool to gain insights on the potential vulnerabilities of DRL agents.

Download Full-text

On Reinforcement Learning for Full-Length Game of StarCraft

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014691 ◽

2019 ◽

Vol 33 ◽

pp. 4691-4698 ◽

Cited By ~ 6

Author(s):

Zhen-Jia Pang ◽

Ruo-Ze Liu ◽

Zhou-Yu Meng ◽

Yi Zhang ◽

Yang Yu ◽

...

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Large Scale ◽

Learning Algorithm ◽

Full Length ◽

Action Space ◽

Hierarchical Architecture ◽

Difficulty Level ◽

Future Research ◽

Grand Challenge

StarCraft II poses a grand challenge for reinforcement learning. The main difficulties include huge state space, varying action space, long horizon, etc. In this paper, we investigate a set of techniques of reinforcement learning for the full-length game of StarCraft II. We investigate a hierarchical approach, where the hierarchy involves two levels of abstraction. One is the macro-actions extracted from expert’s demonstration trajectories, which can reduce the action space in an order of magnitude yet remain effective. The other is a two-layer hierarchical architecture, which is modular and easy to scale. We also investigate a curriculum transfer learning approach that trains the agent from the simplest opponent to harder ones. On a 64×64 map and using restrictive units, we train the agent on a single machine with 4 GPUs and 48 CPU threads. We achieve a winning rate of more than 99% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat model, we can achieve over 93% winning rate against the most difficult noncheating built-in AI (level-7) within days. We hope this study could shed some light on the future research of large-scale reinforcement learning.

Download Full-text

Linear impulsive control system with impulse time windows

Journal of Vibration and Control ◽

10.1177/1077546315575465 ◽

2016 ◽

Vol 23 (1) ◽

pp. 111-118 ◽

Cited By ~ 17

Author(s):

Yuming Feng ◽

Junzhi Yu ◽

Chuandong Li ◽

Tingwen Huang ◽

Hangjun Che

Keyword(s):

Impulsive Control ◽

Time Window ◽

Time Windows ◽

Practical Importance ◽

Fixed Time ◽

Small Time ◽

Time Interval ◽

Impulsive Systems ◽

Impulse Time Windows ◽

System States

We formulate the linear impulsive control systems with impulse time windows. Different from the most impulsive systems where the impulses occur at fixed time or when the system states hit a certain hyperplane, the impulse time in the presented systems might be uncertain, but limited to a small time interval, i.e. a time window. Compared with the existing impulsive systems, the systems with impulse time windows is of practical importance. We then study the asymptotic stability of the case of linear systems and obtain several stability criteria. Numerical examples are given to verify the effectiveness of the theoretical results.

Download Full-text

Precisely Automatic Time Window Locating for an Interferometric Fiber-Optic Sensor Array Based on a TDM Scheme

Sensors ◽

10.3390/s18103548 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3548 ◽

Cited By ~ 2

Author(s):

Ke Cui ◽

Zhongjie Ren ◽

Jieyu Qian ◽

Wenjun Peng ◽

Rihong Zhu

Keyword(s):

Large Scale ◽

Fiber Optic ◽

Shape Function ◽

Time Window ◽

Time Windows ◽

Sensor Arrays ◽

Fiber Optic Sensor ◽

Sensor Signal ◽

Whole Process ◽

Time Division

Interferometric fiber-optic sensors are often organized in the form of large-scale arrays by lending the technique of time division multiplexing (TDM) to reduce the system cost. Discriminating the time windows for different sensor units is the prerequisite to successfully demodulate the sensing message, but it traditionally calls for a very time-consuming manual calibration process. To combat this problem, a novel automatic time window locating method is proposed in this paper. It introduces the concept of shape function and carries out the cross-correlation operation between the shape function and the sensor signal. The shape function is defined as the function whose curve profile reflects the main data characteristics of the sensor signal. The time window information is then extracted from the correlation result. This whole process is carried out automatically by the interrogation controller of the sensor system without any manual intervene. Experiments are conducted to validate this method. The proposed method can greatly reduce the complexity of locating time windows in large-scale TDM sensor arrays, and make the practical use of the TDM scheme much more convenient.

Download Full-text

Efficient Delivery Services Sharing with Time Windows

Applied Sciences ◽

10.3390/app10217431 ◽

2020 ◽

Vol 10 (21) ◽

pp. 7431

Author(s):

Wanyuan Wang ◽

Hansi Tao ◽

Yichuan Jiang

Keyword(s):

Local Search ◽

Time Window ◽

Time Windows ◽

Fixed Time ◽

The Other ◽

System Throughput ◽

Delivery Service ◽

Other Hand ◽

Service Routing ◽

Off Time

Delivery service sharing (DSS) has made an important contribution in the optimization of daily order delivery applications. Existing DSS algorithms introduce two major limitations. First, due to computational reasons, most DSS algorithms focus on the fixed pickup/drop-off time scenario, which is inconvenient for real-world scenarios where customers can choose the pickup/drop-off time flexibly. On the other hand, to address the intractable DSS with the flexible time windows (DSS-Fle), local search-based heuristics are widely employed; however, they have no theoretical results on the advantage of order sharing. Against this background, this paper designs a novel algorithm for DSS-Fle, which is efficient on both time complexity and system throughput. Inspired by the efficiency of shareability network on the delivery service routing (DSR) variant where orders cannot be shared and have the fixed time window, we first consider the variant of DSR with flexible time windows (DSR-Fle). For DSR-Fle, the order’s flexible time windows are split into multiple virtual fixed time windows, one of which is chosen by the shareability network as the order’s service time. On the other hand, inspired by efficiency of local search heuristics, we further consider the variant of DSS with fixed time window (DSS-Fix). For DSS-Fix, the beneficial sharing orders are searched and inserted to the shareability network. Finally, combining the spitting mechanism proposed in DSR-Fle and the inserting mechanism proposed in DSS-Fix together, an efficient algorithm is proposed for DSS-Fle. Simulation results show that the proposed DSS-Fle variant algorithm can scale to city-scale scenarios with thousands of regions, orders and couriers, and has the significant advantage on improving system throughput.

Download Full-text

Dynamic Dispatching for Large-Scale Heterogeneous Fleet via Multi-agent Deep Reinforcement Learning

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378191 ◽

2020 ◽

Author(s):

Chi Zhang ◽

Philip Odonkor ◽

Shuai Zheng ◽

Hamed Khorasgani ◽

Susumu Serita ◽

...

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Heterogeneous Fleet ◽

Multi Agent ◽

Dynamic Dispatching

Download Full-text

Adaptive Traffic Signal Control for large-scale scenario with Cooperative Group-based Multi-agent reinforcement learning

Transportation Research Part C Emerging Technologies ◽

10.1016/j.trc.2021.103046 ◽

2021 ◽

Vol 125 ◽

pp. 103046

Author(s):

Tong Wang ◽

Jiahua Cao ◽

Azhar Hussain

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Traffic Signal ◽

Signal Control ◽

Traffic Signal Control ◽

Cooperative Group ◽

Adaptive Traffic Signal Control ◽

Multi Agent

Download Full-text

A hybrid ant colony algorithm based on multiple strategies for the vehicle routing problem with time windows

Complex & Intelligent Systems ◽

10.1007/s40747-021-00401-1 ◽

2021 ◽

Author(s):

Hongguang Wu ◽

Yuelin Gao ◽

Wanting Wang ◽

Ziyu Zhang

Keyword(s):

Vehicle Routing ◽

Vehicle Routing Problem ◽

Ant Colony Algorithm ◽

Time Window ◽

Time Windows ◽

Ant Colony ◽

Routing Problem ◽

Multiple Strategies ◽

Hard Time ◽

Mutation Operation

AbstractIn this paper, we propose a vehicle routing problem with time windows (TWVRP). In this problem, we consider a hard time constraint that the fleet can only serve customers within a specific time window. To solve this problem, a hybrid ant colony (HACO) algorithm is proposed based on ant colony algorithm and mutation operation. The HACO algorithm proposed has three innovations: the first is to update pheromones with a new method; the second is the introduction of adaptive parameters; and the third is to add the mutation operation. A famous Solomon instance is used to evaluate the performance of the proposed algorithm. Experimental results show that HACO algorithm is effective against solving the problem of vehicle routing with time windows. Besides, the proposed algorithm also has practical implications for vehicle routing problem and the results show that it is applicable and effective in practical problems.

Download Full-text

A game strategy model in the digital curling system based on NFSP

Complex & Intelligent Systems ◽

10.1007/s40747-021-00345-6 ◽

2021 ◽

Author(s):

Yuntao Han ◽

Qibin Zhou ◽

Fuqing Duan

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Action Space ◽

Learning Networks ◽

Game Tree ◽

Continuous Action ◽

Extensive Game ◽

Strategy Model ◽

Zero Sum ◽

Tree Searching

AbstractThe digital curling game is a two-player zero-sum extensive game in a continuous action space. There are some challenging problems that are still not solved well, such as the uncertainty of strategy, the large game tree searching, and the use of large amounts of supervised data, etc. In this work, we combine NFSP and KR-UCT for digital curling games, where NFSP uses two adversary learning networks and can automatically produce supervised data, and KR-UCT can be used for large game tree searching in continuous action space. We propose two reward mechanisms to make reinforcement learning converge quickly. Experimental results validate the proposed method, and show the strategy model can reach the Nash equilibrium.

Download Full-text