Reinforcement Learning vs Genetic Algorithms in Game-Theoretic Cyber-Security

Mapping Intimacies ◽

10.31237/osf.io/nxzep ◽

2018 ◽

Cited By ~ 1

Author(s):

Stefan Niculae

Keyword(s):

Reinforcement Learning ◽

Cyber Security ◽

Large Scale ◽

Human Performance ◽

Learning Approaches ◽

Classifier Systems ◽

Penetration Testing ◽

Q Learning ◽

Game Theoretic ◽

Security Game

Penetration testing is the practice of performing a simulated attack on a computer system in order to reveal its vulnerabilities. The most common approach is to gain information and then plan and execute the attack manually, by a security expert. This manual method cannot meet the speed and frequency required for efficient, large-scale secu- rity solutions development. To address this, we formalize penetration testing as a security game between an attacker who tries to compro- mise a network and a defending adversary actively protecting it. We compare multiple algorithms for finding the attacker’s strategy, from fixed-strategy to Reinforcement Learning, namely Q-Learning (QL), Extended Classifier Systems (XCS) and Deep Q-Networks (DQN). The attacker’s strength is measured in terms of speed and stealthi- ness, in the specific environment used in our simulations. The results show that QL surpasses human performance, XCS yields worse than human performance but is more stable, and the slow convergence of DQN keeps it from achieving exceptional performance, in addition, we find that all of these Machine Learning approaches outperform fixed-strategy attackers.

Download Full-text

A Conflict-Free Routing Method for Automated Guided Vehicles Using Reinforcement Learning

2020 International Symposium on Flexible Automation ◽

10.1115/isfa2020-9620 ◽

2020 ◽

Author(s):

Taichi Chujo ◽

Kosei Nishida ◽

Tatsushi Nishi

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Learning Algorithm ◽

Automated Guided Vehicles ◽

Transport Systems ◽

Deadlock Avoidance ◽

Q Learning ◽

Traffic Conditions ◽

Bidirectional Transport ◽

Routing Method

Abstract In a modern large-scale fabrication, hundreds of vehicles are used for transportation. Since traffic conditions are changing rapidly, the routing of automated guided vehicles (AGV) needs to be changed according to the change in traffic conditions. We propose a conflict-free routing method for AGVs using reinforcement learning in dynamic transportation. An advantage of the proposed method is that a change in the state can be obtained as an evaluation function. Therefore, the action can be selected according to the states. A deadlock avoidance method in bidirectional transport systems is developed using reinforcement learning. The effectiveness of the proposed method is demonstrated by comparing the performance with the conventional Q learning algorithm from computational results.

Download Full-text

Analyzing Strength-Based Classifier System from Reinforcement Learning Perspective

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2009.p0631 ◽

2009 ◽

Vol 13 (6) ◽

pp. 631-639

Author(s):

Atsushi Wada ◽

◽

Keiki Takadama ◽

◽

Keyword(s):

Reinforcement Learning ◽

Adaptive Systems ◽

Classifier Systems ◽

Q Learning ◽

State Action ◽

Classifier System ◽

Learning Classifier ◽

Value Estimation ◽

On Line ◽

On Line Learning

Learning Classifier Systems (LCSs) are rule-based adaptive systems that have both Reinforcement Learning (RL) and rule-discovery mechanisms for effective and practical on-line learning. With the aim of establishing a common theoretical basis between LCSs and RL algorithms to share each field's findings, a detailed analysis was performed to compare the learning processes of these two approaches. Based on our previous work on deriving an equivalence between the Zeroth-level Classifier System (ZCS) and Q-learning with Function Approximation (FA), this paper extends the analysis to the influence of actually applying the conditions for this equivalence. Comparative experiments have revealed interesting implications: (1) ZCS's original parameter, the deduction rate, plays a role in stabilizing the action selection, but (2) from the Reinforcement Learning perspective, such a process inhibits the ability to accurately estimate values for the entire state-action space, thus limiting the performance of ZCS in problems requiring accurate value estimation.

Download Full-text

Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6212 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7219-7226

Author(s):

Hangyu Mao ◽

Wulong Liu ◽

Jianye Hao ◽

Jun Luo ◽

Dong Li ◽

...

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Football Player ◽

Superior Performance ◽

Human Society ◽

Packet Routing ◽

Q Learning ◽

Cognitive Consistency ◽

Challenging Tasks ◽

Multi Agent

Social psychology and real experiences show that cognitive consistency plays an important role to keep human society in order: if people have a more consistent cognition about their environments, they are more likely to achieve better cooperation. Meanwhile, only cognitive consistency within a neighborhood matters because humans only interact directly with their neighbors. Inspired by these observations, we take the first step to introduce neighborhood cognitive consistency (NCC) into multi-agent reinforcement learning (MARL). Our NCC design is quite general and can be easily combined with existing MARL methods. As examples, we propose neighborhood cognition consistent deep Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations. Extensive experiments on several challenging tasks (i.e., packet routing, wifi configuration and Google football player control) justify the superior performance of our methods compared with state-of-the-art MARL approaches.

Download Full-text

Selective auditory attention detection using dynamic learning systems: The study of RNN and reinforcement learning

10.1101/2021.02.18.431748 ◽

2021 ◽

Author(s):

Masoud Geravanchizadeh ◽

Hossein Roushan

Keyword(s):

Reinforcement Learning ◽

Detection System ◽

Auditory Attention ◽

Final Decision ◽

Learning Approaches ◽

Cocktail Party ◽

Dynamic Learning ◽

Learning Stage ◽

Q Learning ◽

Markov Decision

AbstractThe cocktail party phenomenon describes the ability of the human brain to focus auditory attention on a particular stimulus while ignoring other acoustic events. Selective auditory attention detection (SAAD) is an important issue in the development of brain-computer interface systems and cocktail party processors. This paper proposes a new dynamic attention detection system to process the temporal evolution of the input signal. In the proposed dynamic system, after preprocessing of the input signals, the probabilistic state space of the system is formed. Then, in the learning stage, different dynamic learning methods, including recurrent neural network (RNN) and reinforcement learning (Markov decision process (MDP) and deep Q-learning) are applied to make the final decision as to the attended speech. Among different dynamic learning approaches, the evaluation results show that the deep Q-learning approach (MDP+RNN) provides the highest classification accuracy (94.2%) with the least detection delay. The proposed SAAD system is advantageous, in the sense that the detection of attention is performed dynamically for the sequential inputs. Also, the system has the potential to be used in scenarios, where the attention of the listener might be switched in time in the presence of various acoustic events.

Download Full-text

A game theoretic approach to cyber security risk management

The Journal of Defense Modeling and Simulation Applications Methodology Technology ◽

10.1177/1548512917699724 ◽

2017 ◽

Vol 15 (2) ◽

pp. 127-146 ◽

Cited By ~ 8

Author(s):

Scott Musman ◽

Andrew Turner

Keyword(s):

Cyber Security ◽

Theoretic Approach ◽

Security Risk ◽

Defense Strategies ◽

Point Of Sale ◽

Investment Level ◽

Game Theoretic ◽

Theoretic Solution ◽

Security Game ◽

Cyber Risk

This paper describes the Cyber Security Game (CSG). Cyber Security Game is a method that has been implemented in software that quantitatively identifies cyber security risks and uses this metric to determine the optimal employment of security methods for any given investment level. Cyber Security Game maximizes a system’s ability to operate in today’s contested cyber environment by minimizing its mission risk. The risk score is calculated by using a mission impact model to compute the consequences of cyber incidents and combining that with the likelihood that attacks will succeed. The likelihood of attacks succeeding is computed by applying a threat model to a system topology model and defender model. Cyber Security Game takes into account the widespread interconnectedness of cyber systems, where defenders must defend all multi-step attack paths and an attacker only needs one to succeed. It employs a game theoretic solution using a game formulation that identifies defense strategies to minimize the maximum cyber risk (MiniMax). This paper discusses the methods and models that compose Cyber Security Game . A limited example of a Point of Sale system is used to provide specific demonstrations of Cyber Security Game models and analyses.

Download Full-text

A Game Theoretic Framework for Modeling Adversarial Cyber Security Game Among Attackers, Defenders, and Users

Security and Trust Management - Lecture Notes in Computer Science ◽

10.1007/978-3-319-24858-5_18 ◽

2015 ◽

pp. 274-282 ◽

Cited By ~ 2

Author(s):

Tatyana Ryutov ◽

Michael Orosz ◽

James Blythe ◽

Detlof von Winterfeldt

Keyword(s):

Cyber Security ◽

Game Theoretic ◽

Security Game

Download Full-text

Averaged Soft Actor-Critic for Deep Reinforcement Learning

Complexity ◽

10.1155/2021/6658724 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Feng Ding ◽

Guanfeng Ma ◽

Zhikui Chen ◽

Jing Gao ◽

Peng Li

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Large Scale ◽

Experimental Results ◽

High Dimensional ◽

Training Process ◽

Value Network ◽

Q Learning ◽

Important Impact ◽

The Stability

With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged-SAC. By averaging the previously learned action-state estimates, it reduces the overestimation problem of soft Q-learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged-SAC through some games in the MuJoCo environment. The experimental results show that the Averaged-SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process.

Download Full-text

Comparing reinforcement learning approaches for solving game theoretic models: a dynamic airline pricing game example

Journal of the Operational Research Society ◽

10.1057/jors.2011.94 ◽

2012 ◽

Vol 63 (8) ◽

pp. 1165-1173 ◽

Cited By ~ 7

Author(s):

A Collins ◽

L Thomas

Keyword(s):

Reinforcement Learning ◽

Learning Approaches ◽

Airline Pricing ◽

Game Theoretic

Download Full-text

A State-of-the-Art Survey on Deep Learning Theory and Architectures

Electronics ◽

10.3390/electronics8030292 ◽

2019 ◽

Vol 8 (3) ◽

pp. 292 ◽

Cited By ~ 157

Author(s):

Md Zahangir Alom ◽

Tarek M. Taha ◽

Chris Yakopcic ◽

Stefan Westberg ◽

Paheding Sidike ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Reinforcement Learning ◽

Language Processing ◽

Large Scale ◽

Medical Information ◽

State Of The Art ◽

Generative Models ◽

Learning Approaches

In recent years, deep learning has garnered tremendous success in a variety of application domains. This new field of machine learning has been growing rapidly and has been applied to most traditional application domains, as well as some new areas that present more opportunities. Different methods have been proposed based on different categories of learning, including supervised, semi-supervised, and un-supervised learning. Experimental results show state-of-the-art performance using deep learning when compared to traditional machine learning approaches in the fields of image processing, computer vision, speech recognition, machine translation, art, medical imaging, medical information processing, robotics and control, bioinformatics, natural language processing, cybersecurity, and many others. This survey presents a brief survey on the advances that have occurred in the area of Deep Learning (DL), starting with the Deep Neural Network (DNN). The survey goes on to cover Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), Auto-Encoder (AE), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Deep Reinforcement Learning (DRL). Additionally, we have discussed recent developments, such as advanced variant DL techniques based on these DL approaches. This work considers most of the papers published after 2012 from when the history of deep learning began. Furthermore, DL approaches that have been explored and evaluated in different application domains are also included in this survey. We also included recently developed frameworks, SDKs, and benchmark datasets that are used for implementing and evaluating deep learning approaches. There are some surveys that have been published on DL using neural networks and a survey on Reinforcement Learning (RL). However, those papers have not discussed individual advanced techniques for training large-scale deep learning models and the recently developed method of generative models.

Download Full-text

Research on Orbit Assembly Strategy of Large-scale Space Truss Structure

Recent Patents on Engineering ◽

10.2174/1872212116666211230121623 ◽

2021 ◽

Vol 16 ◽

Author(s):

Ye Dai ◽

Chao-Fang Xiang ◽

Yu-Dong Bao ◽

Yun-Shan Qi ◽

Wen-Yin Qu ◽

...

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Learning Algorithm ◽

Scale Space ◽

Assembly Sequence ◽

Q Learning ◽

Assembly Strategy ◽

Periodic Module ◽

Space Trusses ◽

Optimal Assembly

Background: With the rapid development of spatial technology and mankind's continuous exploration of the space domain, expandable space trusses play an important role in the construction of space station piggyback platforms. Therefore, the study of the in-orbit assembly strategy for space trusses has become increasingly important in recent years. The spatial truss assembly strategy proposed in this paper is fast and effective, and it is applied for the construction of future large-scale space facilities effectively. Objective: The four-prismatic truss periodic module is taken as the research object, and the assembly process of the truss and the assembly behaviors of the spatial cellular robot serving for on-orbit assembly are expressed. Methods: The article uses a reinforcement learning algorithm to study the coupling of truss assembly sequence and robot action sequence, then uses a q-learning algorithm to plan the strategy of the truss cycle module. Results: The robot is trained through the greedy strategy and avoids the failure problem caused by assembly uncertainty. The simulation experiment proves that the Q-learning algorithm of reinforcement learning used for planning the on-orbit assembly sequence of the truss periodic module structures is feasible, and the optimal assembly sequence with the least number of assembly steps obtained by this strategy. Conclusion: In order to address the on-orbit assembly issues of large spatial truss structures in the space environment, we trained the robots through greedy strategy to prevent failure due to the uncertainty conditions both in the strategy analysis and in the simulation study.Finally, the Q-learning algorithm in reinforcement learning is used to plan the on-orbit assembly sequence in the truss cycle module, which can obtain the optimal assembly sequence in the minimum number of assembly steps.

Download Full-text