Leveraging human knowledge in tabular reinforcement learning: a study of human subjects

AbstractReinforcement learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort and expertise on the human designer’s part. To date, human factors are generally not considered in the development and evaluation of possible RL approaches. In this article, we set out to investigate how different methods for injecting human knowledge are applied, in practice, by human designers of varying levels of knowledge and skill. We perform the first empirical evaluation of several methods, including a newly proposed method named State Action Similarity Solutions (SASS) which is based on the notion of similarities in the agent’s state–action space. Through this human study, consisting of 51 human participants, we shed new light on the human factors that play a key role in RL. We find that the classical reward shaping technique seems to be the most natural method for most designers, both expert and non-expert, to speed up RL. However, we further find that our proposed method SASS can be effectively and efficiently combined with reward shaping, and provides a beneficial alternative to using only a single-speedup method with minimal human designer effort overhead.

Download Full-text

Leveraging Human Knowledge in Tabular Reinforcement Learning: A Study of Human Subjects

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/534 ◽

2017 ◽

Cited By ~ 2

Author(s):

Ariel Rosenfeld ◽

Matthew E. Taylor ◽

Sarit Kraus

Keyword(s):

Reinforcement Learning ◽

Human Factors ◽

Real World ◽

Human Subjects ◽

Human Knowledge ◽

Human Psychology ◽

Novel Method ◽

Extensive Effort ◽

Psychology Literature ◽

Real World Problems

Reinforcement Learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort on the human designer's part. To date, human factors are generally not considered in the development and evaluation of possible approaches. In this paper, we propose and evaluate a novel method, based on human psychology literature, which we show to be both effective and efficient, for both expert and non-expert designers, in injecting human knowledge for speeding up tabular RL.

Download Full-text

Introspective Q-learning and learning from demonstration

The Knowledge Engineering Review ◽

10.1017/s0269888919000031 ◽

2019 ◽

Vol 34 ◽

Author(s):

Mao Li ◽

Tim Brys ◽

Daniel Kudenko

Keyword(s):

Reinforcement Learning ◽

Potential Function ◽

Domain Knowledge ◽

Experimental Validation ◽

Priority Queue ◽

Learning From Demonstration ◽

Q Learning ◽

State Action ◽

Reward Shaping ◽

Speed Up

Abstract One challenge faced by reinforcement learning (RL) agents is that in many environments the reward signal is sparse, leading to slow improvement of the agent’s performance in early learning episodes. Potential-based reward shaping can help to resolve the aforementioned issue of sparse reward by incorporating an expert’s domain knowledge into the learning through a potential function. Past work on reinforcement learning from demonstration (RLfD) directly mapped (sub-optimal) human expert demonstration to a potential function, which can speed up RL. In this paper we propose an introspective RL agent that significantly further speeds up the learning. An introspective RL agent records its state–action decisions and experience during learning in a priority queue. Good quality decisions, according to a Monte Carlo estimation, will be kept in the queue, while poorer decisions will be rejected. The queue is then used as demonstration to speed up RL via reward shaping. A human expert’s demonstration can be used to initialize the priority queue before the learning process starts. Experimental validation in the 4-dimensional CartPole domain and the 27-dimensional Super Mario AI domain shows that our approach significantly outperforms non-introspective RL and state-of-the-art approaches in RLfD in both domains.

Download Full-text

A Comparative Study of AI-Based Intrusion Detection Techniques in Critical Infrastructures

ACM Transactions on Internet Technology ◽

10.1145/3406093 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1-22

Author(s):

Safa Otoum ◽

Burak Kantarci ◽

Hussein Mouftah

Keyword(s):

Reinforcement Learning ◽

Intrusion Detection ◽

Comparative Study ◽

Performance Metrics ◽

Action Learning ◽

Smart Devices ◽

Critical Infrastructures ◽

State Action ◽

Detection Techniques ◽

Depth Analysis

Volunteer computing uses Internet-connected devices (laptops, PCs, smart devices, etc.), in which their owners volunteer them as storage and computing power resources, has become an essential mechanism for resource management in numerous applications. The growth of the volume and variety of data traffic on the Internet leads to concerns on the robustness of cyberphysical systems especially for critical infrastructures. Therefore, the implementation of an efficient Intrusion Detection System for gathering such sensory data has gained vital importance. In this article, we present a comparative study of Artificial Intelligence (AI)-driven intrusion detection systems for wirelessly connected sensors that track crucial applications. Specifically, we present an in-depth analysis of the use of machine learning, deep learning and reinforcement learning solutions to recognise intrusive behavior in the collected traffic. We evaluate the proposed mechanisms by using KDD’99 as real attack dataset in our simulations. Results present the performance metrics for three different IDSs, namely the Adaptively Supervised and Clustered Hybrid IDS (ASCH-IDS), Restricted Boltzmann Machine-based Clustered IDS (RBC-IDS), and Q-learning based IDS (Q-IDS), to detect malicious behaviors. We also present the performance of different reinforcement learning techniques such as State-Action-Reward-State-Action Learning (SARSA) and the Temporal Difference learning (TD). Through simulations, we show that Q-IDS performs with detection rate while SARSA-IDS and TD-IDS perform at the order of .

Download Full-text

State Action Separable Reinforcement Learning

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9377951 ◽

2020 ◽

Author(s):

Ziyao Zhang ◽

Liang Ma ◽

Kin K. Leung ◽

Konstantinos Poularakis ◽

Mudhakar Srivatsa

Keyword(s):

Reinforcement Learning ◽

State Action

Download Full-text

Exploring the Design Space Using a Surrogate Model Approach With Digital Human Modeling Simulations

Volume 1B: 38th Computers and Information in Engineering Conference ◽

10.1115/detc2018-86323 ◽

2018 ◽

Cited By ~ 2

Author(s):

Salman Ahmed ◽

Mihir Sunil Gawand ◽

Lukman Irshad ◽

H. Onan Demirel

Keyword(s):

Human Factors ◽

Surrogate Model ◽

Human Performance ◽

Design Method ◽

Human Subjects ◽

Design Stage ◽

Digital Human Modeling ◽

Design Variables ◽

Human Modeling ◽

Digital Human

Computational human factors tools are often not fully-integrated during the early phases of product design. Often, conventional ergonomic practices require physical prototypes and human subjects which are costly in terms of finances and time. Ergonomics evaluations executed on physical prototypes has the limitations of increasing the overall rework as more iterations are required to incorporate design changes related to human factors that are found later in the design stage, which affects the overall cost of product development. This paper proposes a design methodology based on Digital Human Modeling (DHM) approach to inform designers about the ergonomics adequacies of products during early stages of design process. This proactive ergonomics approach has the potential to allow designers to identify significant design variables that affect the human performance before full-scale prototypes are built. The design method utilizes a surrogate model that represents human product interaction. Optimizing the surrogate model provides design concepts to optimize human performance. The efficacy of the proposed design method is demonstrated by a cockpit design study.

Download Full-text

Improving Multi-agent Reinforcement Learning with Imperfect Human Knowledge

Artificial Neural Networks and Machine Learning – ICANN 2020 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-61616-8_30 ◽

2020 ◽

pp. 369-380

Author(s):

Xiaoxu Han ◽

Hongyao Tang ◽

Yuan Li ◽

Guang Kou ◽

Leilei Liu

Keyword(s):

Reinforcement Learning ◽

Human Knowledge ◽

Multi Agent

Download Full-text

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/478 ◽

2018 ◽

Cited By ~ 9

Author(s):

Guiliang Liu ◽

Oliver Schulte

Keyword(s):

Reinforcement Learning ◽

Empirical Evaluation ◽

Professional Sports ◽

Ice Hockey ◽

New Approach ◽

The Neural Network ◽

Overall Performance ◽

Game Context ◽

Player Performance ◽

Q Function

A variety of machine learning models have been proposed to assess the performance of players in professional sports. However, they have only a limited ability to model how player performance depends on the game context. This paper proposes a new approach to capturing game context: we apply Deep Reinforcement Learning (DRL) to learn an action-value Q function from 3M play-by-play events in the National Hockey League (NHL). The neural network representation integrates both continuous context signals and game history, using a possession-based LSTM. The learned Q-function is used to value players' actions under different game contexts. To assess a player's overall performance, we introduce a novel Game Impact Metric (GIM) that aggregates the values of the player's actions. Empirical Evaluation shows GIM is consistent throughout a play season, and correlates highly with standard success measures and future salary.

Download Full-text

KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/317 ◽

2020 ◽

Cited By ~ 1

Author(s):

Peng Zhang ◽

Jianye Hao ◽

Weixun Wang ◽

Hongyao Tang ◽

Yi Ma ◽

...

Keyword(s):

Reinforcement Learning ◽

Prior Knowledge ◽

Learning Process ◽

Learning Algorithm ◽

Fuzzy Rule ◽

Policy Network ◽

Human Knowledge ◽

Learning Agents ◽

The Common ◽

Low Performance

Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to finetune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on several control tasks. The empirical results show that our approach, which combines suboptimal human knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.

Download Full-text

Longevity and reliability of chronic unit recordings using the Utah, intracortical multi-electrode arrays

Journal of Neural Engineering ◽

10.1088/1741-2552/ac3eaf ◽

2021 ◽

Author(s):

Caleb Scheffer Sponheim ◽

Vasileios Papadourakis ◽

Jennifer Collinger ◽

John Downey ◽

Jeffrey M Weiss ◽

...

Keyword(s):

Signal To Noise Ratio ◽

Rhesus Macaques ◽

Human Subjects ◽

Electrode Arrays ◽

Computer Interfaces ◽

Somatosensory Cortices ◽

Large Numbers ◽

Electrode Length ◽

Human Participants

Abstract Objective. Microelectrode arrays are standard tools for conducting chronic electrophysiological experiments, allowing researchers to simultaneously record from large numbers of neurons. Specifically, Utah electrode arrays (UEAs) have been utilized by scientists in many species, including rodents, rhesus macaques, marmosets, and human participants. The field of clinical human brain-computer interfaces currently relies on the UEA as a number of research groups have FDA clearance for this device through the investigational device exemption pathway. Despite its widespread usage in systems neuroscience, few studies have comprehensively evaluated the reliability and signal quality of the Utah array over long periods of time in a large dataset. Approach. We collected and analyzed over 6000 recorded datasets from various cortical areas spanning almost 9 years of experiments, totaling 17 rhesus macaques (Macaca Mulatta) and 2 human subjects, and 55 separate microelectrode Utah arrays. The scale of this dataset allowed us to evaluate the average life of these arrays, based primarily on the signal-to-noise ratio of each electrode over time. Main Results. Using implants in primary motor, premotor, prefrontal, and somatosensory cortices, we found that the average lifespan of available recordings from UEAs was 622 days, although we provide several examples of these UEAs lasting over 1000 days and one up to 9 years; human implants were also shown to last longer than non-human primate implants. We also found that electrode length did not affect longevity and quality, but iridium oxide metallization on the electrode tip exhibited superior yield as compared to platinum metallization.

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text