Navigational Behavior of Humans and Deep Reinforcement Learning Agents

Rapid advances in the field of Deep Reinforcement Learning (DRL) over the past several years have led to artificial agents (AAs) capable of producing behavior that meets or exceeds human-level performance in a wide variety of tasks. However, research on DRL frequently lacks adequate discussion of the low-level dynamics of the behavior itself and instead focuses on meta-level or global-level performance metrics. In doing so, the current literature lacks perspective on the qualitative nature of AA behavior, leaving questions regarding the spatiotemporal patterning of their behavior largely unanswered. The current study explored the degree to which the navigation and route selection trajectories of DRL agents (i.e., AAs trained using DRL) through simple obstacle ridden virtual environments were equivalent (and/or different) from those produced by human agents. The second and related aim was to determine whether a task-dynamical model of human route navigation could not only be used to capture both human and DRL navigational behavior, but also to help identify whether any observed differences in the navigational trajectories of humans and DRL agents were a function of differences in the dynamical environmental couplings.

Download Full-text

Route selection in non-Euclidean virtual environments

10.1101/2020.08.14.250621 ◽

2020 ◽

Author(s):

Alexander Muryy ◽

Andrew Glennerster

Keyword(s):

Virtual Environments ◽

The Other ◽

Route Selection ◽

The Past ◽

Underlying Representation ◽

Other Hand ◽

Shortest Route ◽

Shortest Distance ◽

Representation Of Space ◽

Distance Prediction

AbstractThe way people choose routes through unfamiliar environments provides clues about the underlying representation they use. One way to test the nature of observers’ representation is to manipulate the structure of the scene as they move through it and measure which aspects of performance are significantly affected and which are not. We recorded the routes that participants took in virtual mazes to reach previously-viewed targets. The mazes were either physically realizable or impossible (the latter contained ‘wormholes’ that altered the layout of the scene without any visible change at that moment). We found that participants could usually find the shortest route between remembered objects even in physically impossible environments, despite the gross failures in pointing that an earlier study showed are evident in the physically impossible environment. In the physically impossible conditions, the choice made at a junction was influenced to a greater extent by whether that choice had, in the past, led to the discovery of a target (compared to a shortest-distance prediction). In the physically realizable mazes, on the other hand, junction choices were determined more by the shortest distance to the target. This pattern of results is compatible with the idea of a graph-like representation of space that can include information about previous success or failure for traversing each edge and also information about the distance between nodes. Our results suggest that complexity of the maze may dictate which of these is more important in influencing navigational choices.

Download Full-text

The Agent Web Model: modeling web hacking for reinforcement learning

International Journal of Information Security ◽

10.1007/s10207-021-00554-7 ◽

2021 ◽

Author(s):

László Erdődi ◽

Fabio Massimo Zennaro

Keyword(s):

Reinforcement Learning ◽

Learning Problems ◽

Web Pages ◽

Artificial Agents ◽

Levels Of Abstraction ◽

Learning Agents ◽

Frequent Attack ◽

The Impact ◽

Attack Type ◽

Different Levels

AbstractWebsite hacking is a frequent attack type used by malicious actors to obtain confidential information, modify the integrity of web pages or make websites unavailable. The tools used by attackers are becoming more and more automated and sophisticated, and malicious machine learning agents seem to be the next development in this line. In order to provide ethical hackers with similar tools, and to understand the impact and the limitations of artificial agents, we present in this paper a model that formalizes web hacking tasks for reinforcement learning agents. Our model, named Agent Web Model, considers web hacking as a capture-the-flag style challenge, and it defines reinforcement learning problems at seven different levels of abstraction. We discuss the complexity of these problems in terms of actions and states an agent has to deal with, and we show that such a model allows to represent most of the relevant web vulnerabilities. Aware that the driver of advances in reinforcement learning is the availability of standardized challenges, we provide an implementation for the first three abstraction layers, in the hope that the community would consider these challenges in order to develop intelligent web hacking agents.

Download Full-text

Route selection in non-Euclidean virtual environments

PLoS ONE ◽

10.1371/journal.pone.0247818 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0247818

Author(s):

Alexander Muryy ◽

Andrew Glennerster

Keyword(s):

Virtual Environments ◽

The Other ◽

Route Selection ◽

The Past ◽

Underlying Representation ◽

Other Hand ◽

Shortest Route ◽

Shortest Distance ◽

Representation Of Space ◽

Distance Prediction

The way people choose routes through unfamiliar environments provides clues about the underlying representation they use. One way to test the nature of observers’ representation is to manipulate the structure of the scene as they move through it and measure which aspects of performance are significantly affected and which are not. We recorded the routes that participants took in virtual mazes to reach previously-viewed targets. The mazes were either physically realizable or impossible (the latter contained ‘wormholes’ that altered the layout of the scene without any visible change at that moment). We found that participants could usually find the shortest route between remembered objects even in physically impossible environments, despite the gross failures in pointing that an earlier study showed are evident in the physically impossible environment. In the physically impossible conditions, the choice made at a junction was influenced to a greater extent by whether that choice had, in the past, led to the discovery of a target (compared to a shortest-distance prediction). In the physically realizable mazes, on the other hand, junction choices were determined more by the shortest distance to the target. This pattern of results is compatible with the idea of a graph-like representation of space that can include information about previous success or failure for traversing each edge and also information about the distance between nodes. Our results suggest that complexity of the maze may dictate which of these is more important in influencing navigational choices.

Download Full-text

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text

FPGA Acceleration of ROS2-Based Reinforcement Learning Agents

2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) ◽

10.1109/candarw51189.2020.00031 ◽

2020 ◽

Author(s):

Daniel Pinheiro Leal ◽

Midori Sugaya ◽

Hideharu Amano ◽

Takeshi Ohkawa

Keyword(s):

Reinforcement Learning ◽

Learning Agents ◽

Fpga Acceleration

Download Full-text

A Comparative Study of AI-Based Intrusion Detection Techniques in Critical Infrastructures

ACM Transactions on Internet Technology ◽

10.1145/3406093 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1-22

Author(s):

Safa Otoum ◽

Burak Kantarci ◽

Hussein Mouftah

Keyword(s):

Reinforcement Learning ◽

Intrusion Detection ◽

Comparative Study ◽

Performance Metrics ◽

Action Learning ◽

Smart Devices ◽

Critical Infrastructures ◽

State Action ◽

Detection Techniques ◽

Depth Analysis

Volunteer computing uses Internet-connected devices (laptops, PCs, smart devices, etc.), in which their owners volunteer them as storage and computing power resources, has become an essential mechanism for resource management in numerous applications. The growth of the volume and variety of data traffic on the Internet leads to concerns on the robustness of cyberphysical systems especially for critical infrastructures. Therefore, the implementation of an efficient Intrusion Detection System for gathering such sensory data has gained vital importance. In this article, we present a comparative study of Artificial Intelligence (AI)-driven intrusion detection systems for wirelessly connected sensors that track crucial applications. Specifically, we present an in-depth analysis of the use of machine learning, deep learning and reinforcement learning solutions to recognise intrusive behavior in the collected traffic. We evaluate the proposed mechanisms by using KDD’99 as real attack dataset in our simulations. Results present the performance metrics for three different IDSs, namely the Adaptively Supervised and Clustered Hybrid IDS (ASCH-IDS), Restricted Boltzmann Machine-based Clustered IDS (RBC-IDS), and Q-learning based IDS (Q-IDS), to detect malicious behaviors. We also present the performance of different reinforcement learning techniques such as State-Action-Reward-State-Action Learning (SARSA) and the Temporal Difference learning (TD). Through simulations, we show that Q-IDS performs with detection rate while SARSA-IDS and TD-IDS perform at the order of .

Download Full-text

Performance Study of Minimax and Reinforcement Learning Agents Playing the Turn-based Game Iwoki

Applied Artificial Intelligence ◽

10.1080/08839514.2021.1934265 ◽

2021 ◽

pp. 1-28

Author(s):

Santiago Videgaín ◽

Pablo García Sánchez

Keyword(s):

Reinforcement Learning ◽

Performance Study ◽

Learning Agents

Download Full-text

Evaluation methods of the daylight performance and potential energy saving of tubular daylight guide systems: A review

Indoor and Built Environment ◽

10.1177/1420326x21992419 ◽

2021 ◽

pp. 1420326X2199241

Author(s):

Hanlin Li ◽

Dan Wu ◽

Yanping Yuan ◽

Lijun Zuo

Keyword(s):

Potential Energy ◽

Energy Saving ◽

Energy Savings ◽

Performance Metrics ◽

Field Measurements ◽

Building Design ◽

Estimation Methods ◽

The Past ◽

Potential Energy Saving ◽

Daylight Performance

In the past 30 years, tubular daylight guide systems (TDGSs) have become one of the most popular ways to transport outdoor natural light into the inner space in building design. However, tubular daylight guide systems are not widely used because of the lack of methods to evaluate methods on the suitability of the TDGSs. This study therefore summarizes the daylight performance metrics of TDGSs and presents the estimation methods in terms of field measurements, simulation and empirical formulae. This study focuses on the daylight performance and potential energy savings of TDGSs. Moreover, this study will be helpful for building designers to build healthy, comfortable and energy-saving indoor environment.

Download Full-text

An Efficiency Enhancing Methodology for Multiple Autonomous Vehicles in an Urban Network Adopting Deep Reinforcement Learning

Applied Sciences ◽

10.3390/app11041514 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1514 ◽

Cited By ~ 2

Author(s):

Quang-Duy Tran ◽

Sang-Hoon Bae

Keyword(s):

Reinforcement Learning ◽

Traffic Congestion ◽

Autonomous Vehicles ◽

Penetration Rate ◽

Autonomous Vehicle ◽

Effective Means ◽

Urban Network ◽

Learning Agents ◽

Policy Optimization ◽

The Impact

To reduce the impact of congestion, it is necessary to improve our overall understanding of the influence of the autonomous vehicle. Recently, deep reinforcement learning has become an effective means of solving complex control tasks. Accordingly, we show an advanced deep reinforcement learning that investigates how the leading autonomous vehicles affect the urban network under a mixed-traffic environment. We also suggest a set of hyperparameters for achieving better performance. Firstly, we feed a set of hyperparameters into our deep reinforcement learning agents. Secondly, we investigate the leading autonomous vehicle experiment in the urban network with different autonomous vehicle penetration rates. Thirdly, the advantage of leading autonomous vehicles is evaluated using entire manual vehicle and leading manual vehicle experiments. Finally, the proximal policy optimization with a clipped objective is compared to the proximal policy optimization with an adaptive Kullback–Leibler penalty to verify the superiority of the proposed hyperparameter. We demonstrate that full automation traffic increased the average speed 1.27 times greater compared with the entire manual vehicle experiment. Our proposed method becomes significantly more effective at a higher autonomous vehicle penetration rate. Furthermore, the leading autonomous vehicles could help to mitigate traffic congestion.

Download Full-text

Hierarchical Reinforcement Learning

ACM Computing Surveys ◽

10.1145/3453160 ◽

2021 ◽

Vol 54 (5) ◽

pp. 1-35

Author(s):

Shubham Pateria ◽

Budhitama Subagdja ◽

Ah-hwee Tan ◽

Chai Quek

Keyword(s):

Reinforcement Learning ◽

Future Research ◽

Comprehensive Overview ◽

Open Problems ◽

Practical Applications ◽

Hierarchical Reinforcement Learning ◽

The Past ◽

Agent Learning ◽

Multi Agent ◽

Supplementary Material

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future research in HRL. Furthermore, we outline a few suitable task domains for evaluating the HRL approaches and a few interesting examples of the practical applications of HRL in the Supplementary Material.

Download Full-text