scholarly journals Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning

2020 ◽  
Vol 34 (02) ◽  
pp. 2226-2235
Author(s):  
Sanket Shah ◽  
Sinha Arunesh ◽  
Varakantham Pradeep ◽  
Perrault Andrew ◽  
Tambe Milind

Large-scale screening for potential threats with limited resources and capacity for screening is a problem of interest at airports, seaports, and other ports of entry. Adversaries can observe screening procedures and arrive at a time when there will be gaps in screening due to limited resource capacities. To capture this game between ports and adversaries, this problem has been previously represented as a Stackelberg game, referred to as a Threat Screening Game (TSG). Given the significant complexity associated with solving TSGs and uncertainty in arrivals of customers, existing work has assumed that screenees arrive and are allocated security resources at the beginning of the time-window. In practice, screenees such as airport passengers arrive in bursts correlated with flight time and are not bound by fixed time-windows. To address this, we propose an online threat screening model in which the screening strategy is determined adaptively as a passenger arrives while satisfying a hard bound on acceptable risk of not screening a threat. To solve the online problem, we first reformulate it as a Markov Decision Process (MDP) in which the hard bound on risk translates to a constraint on the action space and then solve the resultant MDP using Deep Reinforcement Learning (DRL). To this end, we provide a novel way to efficiently enforce linear inequality constraints on the action output in DRL. We show that our solution allows us to significantly reduce screenee wait time without compromising on the risk.

2020 ◽  
Vol 34 (04) ◽  
pp. 4577-4584
Author(s):  
Xian Yeow Lee ◽  
Sambit Ghadai ◽  
Kai Liang Tan ◽  
Chinmay Hegde ◽  
Soumik Sarkar

Robustness of Deep Reinforcement Learning (DRL) algorithms towards adversarial attacks in real world applications such as those deployed in cyber-physical systems (CPS) are of increasing concern. Numerous studies have investigated the mechanisms of attacks on the RL agent's state space. Nonetheless, attacks on the RL agent's action space (corresponding to actuators in engineering systems) are equally perverse, but such attacks are relatively less studied in the ML literature. In this work, we first frame the problem as an optimization problem of minimizing the cumulative reward of an RL agent with decoupled constraints as the budget of attack. We propose the white-box Myopic Action Space (MAS) attack algorithm that distributes the attacks across the action space dimensions. Next, we reformulate the optimization problem above with the same objective function, but with a temporally coupled constraint on the attack budget to take into account the approximated dynamics of the agent. This leads to the white-box Look-ahead Action Space (LAS) attack algorithm that distributes the attacks across the action and temporal dimensions. Our results showed that using the same amount of resources, the LAS attack deteriorates the agent's performance significantly more than the MAS attack. This reveals the possibility that with limited resource, an adversary can utilize the agent's dynamics to malevolently craft attacks that causes the agent to fail. Additionally, we leverage these attack strategies as a possible tool to gain insights on the potential vulnerabilities of DRL agents.


Author(s):  
Zhen-Jia Pang ◽  
Ruo-Ze Liu ◽  
Zhou-Yu Meng ◽  
Yi Zhang ◽  
Yang Yu ◽  
...  

StarCraft II poses a grand challenge for reinforcement learning. The main difficulties include huge state space, varying action space, long horizon, etc. In this paper, we investigate a set of techniques of reinforcement learning for the full-length game of StarCraft II. We investigate a hierarchical approach, where the hierarchy involves two levels of abstraction. One is the macro-actions extracted from expert’s demonstration trajectories, which can reduce the action space in an order of magnitude yet remain effective. The other is a two-layer hierarchical architecture, which is modular and easy to scale. We also investigate a curriculum transfer learning approach that trains the agent from the simplest opponent to harder ones. On a 64×64 map and using restrictive units, we train the agent on a single machine with 4 GPUs and 48 CPU threads. We achieve a winning rate of more than 99% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat model, we can achieve over 93% winning rate against the most difficult noncheating built-in AI (level-7) within days. We hope this study could shed some light on the future research of large-scale reinforcement learning.


2016 ◽  
Vol 23 (1) ◽  
pp. 111-118 ◽  
Author(s):  
Yuming Feng ◽  
Junzhi Yu ◽  
Chuandong Li ◽  
Tingwen Huang ◽  
Hangjun Che

We formulate the linear impulsive control systems with impulse time windows. Different from the most impulsive systems where the impulses occur at fixed time or when the system states hit a certain hyperplane, the impulse time in the presented systems might be uncertain, but limited to a small time interval, i.e. a time window. Compared with the existing impulsive systems, the systems with impulse time windows is of practical importance. We then study the asymptotic stability of the case of linear systems and obtain several stability criteria. Numerical examples are given to verify the effectiveness of the theoretical results.


Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3548 ◽  
Author(s):  
Ke Cui ◽  
Zhongjie Ren ◽  
Jieyu Qian ◽  
Wenjun Peng ◽  
Rihong Zhu

Interferometric fiber-optic sensors are often organized in the form of large-scale arrays by lending the technique of time division multiplexing (TDM) to reduce the system cost. Discriminating the time windows for different sensor units is the prerequisite to successfully demodulate the sensing message, but it traditionally calls for a very time-consuming manual calibration process. To combat this problem, a novel automatic time window locating method is proposed in this paper. It introduces the concept of shape function and carries out the cross-correlation operation between the shape function and the sensor signal. The shape function is defined as the function whose curve profile reflects the main data characteristics of the sensor signal. The time window information is then extracted from the correlation result. This whole process is carried out automatically by the interrogation controller of the sensor system without any manual intervene. Experiments are conducted to validate this method. The proposed method can greatly reduce the complexity of locating time windows in large-scale TDM sensor arrays, and make the practical use of the TDM scheme much more convenient.


2020 ◽  
Vol 10 (21) ◽  
pp. 7431
Author(s):  
Wanyuan Wang ◽  
Hansi Tao ◽  
Yichuan Jiang

Delivery service sharing (DSS) has made an important contribution in the optimization of daily order delivery applications. Existing DSS algorithms introduce two major limitations. First, due to computational reasons, most DSS algorithms focus on the fixed pickup/drop-off time scenario, which is inconvenient for real-world scenarios where customers can choose the pickup/drop-off time flexibly. On the other hand, to address the intractable DSS with the flexible time windows (DSS-Fle), local search-based heuristics are widely employed; however, they have no theoretical results on the advantage of order sharing. Against this background, this paper designs a novel algorithm for DSS-Fle, which is efficient on both time complexity and system throughput. Inspired by the efficiency of shareability network on the delivery service routing (DSR) variant where orders cannot be shared and have the fixed time window, we first consider the variant of DSR with flexible time windows (DSR-Fle). For DSR-Fle, the order’s flexible time windows are split into multiple virtual fixed time windows, one of which is chosen by the shareability network as the order’s service time. On the other hand, inspired by efficiency of local search heuristics, we further consider the variant of DSS with fixed time window (DSS-Fix). For DSS-Fix, the beneficial sharing orders are searched and inserted to the shareability network. Finally, combining the spitting mechanism proposed in DSR-Fle and the inserting mechanism proposed in DSS-Fix together, an efficient algorithm is proposed for DSS-Fle. Simulation results show that the proposed DSS-Fle variant algorithm can scale to city-scale scenarios with thousands of regions, orders and couriers, and has the significant advantage on improving system throughput.


Author(s):  
Hongguang Wu ◽  
Yuelin Gao ◽  
Wanting Wang ◽  
Ziyu Zhang

AbstractIn this paper, we propose a vehicle routing problem with time windows (TWVRP). In this problem, we consider a hard time constraint that the fleet can only serve customers within a specific time window. To solve this problem, a hybrid ant colony (HACO) algorithm is proposed based on ant colony algorithm and mutation operation. The HACO algorithm proposed has three innovations: the first is to update pheromones with a new method; the second is the introduction of adaptive parameters; and the third is to add the mutation operation. A famous Solomon instance is used to evaluate the performance of the proposed algorithm. Experimental results show that HACO algorithm is effective against solving the problem of vehicle routing with time windows. Besides, the proposed algorithm also has practical implications for vehicle routing problem and the results show that it is applicable and effective in practical problems.


Author(s):  
Yuntao Han ◽  
Qibin Zhou ◽  
Fuqing Duan

AbstractThe digital curling game is a two-player zero-sum extensive game in a continuous action space. There are some challenging problems that are still not solved well, such as the uncertainty of strategy, the large game tree searching, and the use of large amounts of supervised data, etc. In this work, we combine NFSP and KR-UCT for digital curling games, where NFSP uses two adversary learning networks and can automatically produce supervised data, and KR-UCT can be used for large game tree searching in continuous action space. We propose two reward mechanisms to make reinforcement learning converge quickly. Experimental results validate the proposed method, and show the strategy model can reach the Nash equilibrium.


Author(s):  
Zhouyang Lin ◽  
Kai Li ◽  
Yang Yang ◽  
Fanglei Sun ◽  
Liantao Wu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document