scholarly journals Leveraging human knowledge in tabular reinforcement learning: a study of human subjects

Author(s):  
Ariel Rosenfeld ◽  
Moshe Cohen ◽  
Matthew E. Taylor ◽  
Sarit Kraus

AbstractReinforcement learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort and expertise on the human designer’s part. To date, human factors are generally not considered in the development and evaluation of possible RL approaches. In this article, we set out to investigate how different methods for injecting human knowledge are applied, in practice, by human designers of varying levels of knowledge and skill. We perform the first empirical evaluation of several methods, including a newly proposed method named State Action Similarity Solutions (SASS) which is based on the notion of similarities in the agent’s state–action space. Through this human study, consisting of 51 human participants, we shed new light on the human factors that play a key role in RL. We find that the classical reward shaping technique seems to be the most natural method for most designers, both expert and non-expert, to speed up RL. However, we further find that our proposed method SASS can be effectively and efficiently combined with reward shaping, and provides a beneficial alternative to using only a single-speedup method with minimal human designer effort overhead.

Author(s):  
Ariel Rosenfeld ◽  
Matthew E. Taylor ◽  
Sarit Kraus

Reinforcement Learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort on the human designer's part. To date, human factors are generally not considered in the development and evaluation of possible approaches. In this paper, we propose and evaluate a novel method, based on human psychology literature, which we show to be both effective and efficient, for both expert and non-expert designers, in injecting human knowledge for speeding up tabular RL.


2019 ◽  
Vol 34 ◽  
Author(s):  
Mao Li ◽  
Tim Brys ◽  
Daniel Kudenko

Abstract One challenge faced by reinforcement learning (RL) agents is that in many environments the reward signal is sparse, leading to slow improvement of the agent’s performance in early learning episodes. Potential-based reward shaping can help to resolve the aforementioned issue of sparse reward by incorporating an expert’s domain knowledge into the learning through a potential function. Past work on reinforcement learning from demonstration (RLfD) directly mapped (sub-optimal) human expert demonstration to a potential function, which can speed up RL. In this paper we propose an introspective RL agent that significantly further speeds up the learning. An introspective RL agent records its state–action decisions and experience during learning in a priority queue. Good quality decisions, according to a Monte Carlo estimation, will be kept in the queue, while poorer decisions will be rejected. The queue is then used as demonstration to speed up RL via reward shaping. A human expert’s demonstration can be used to initialize the priority queue before the learning process starts. Experimental validation in the 4-dimensional CartPole domain and the 27-dimensional Super Mario AI domain shows that our approach significantly outperforms non-introspective RL and state-of-the-art approaches in RLfD in both domains.


2021 ◽  
Vol 21 (4) ◽  
pp. 1-22
Author(s):  
Safa Otoum ◽  
Burak Kantarci ◽  
Hussein Mouftah

Volunteer computing uses Internet-connected devices (laptops, PCs, smart devices, etc.), in which their owners volunteer them as storage and computing power resources, has become an essential mechanism for resource management in numerous applications. The growth of the volume and variety of data traffic on the Internet leads to concerns on the robustness of cyberphysical systems especially for critical infrastructures. Therefore, the implementation of an efficient Intrusion Detection System for gathering such sensory data has gained vital importance. In this article, we present a comparative study of Artificial Intelligence (AI)-driven intrusion detection systems for wirelessly connected sensors that track crucial applications. Specifically, we present an in-depth analysis of the use of machine learning, deep learning and reinforcement learning solutions to recognise intrusive behavior in the collected traffic. We evaluate the proposed mechanisms by using KDD’99 as real attack dataset in our simulations. Results present the performance metrics for three different IDSs, namely the Adaptively Supervised and Clustered Hybrid IDS (ASCH-IDS), Restricted Boltzmann Machine-based Clustered IDS (RBC-IDS), and Q-learning based IDS (Q-IDS), to detect malicious behaviors. We also present the performance of different reinforcement learning techniques such as State-Action-Reward-State-Action Learning (SARSA) and the Temporal Difference learning (TD). Through simulations, we show that Q-IDS performs with detection rate while SARSA-IDS and TD-IDS perform at the order of .


Author(s):  
Ziyao Zhang ◽  
Liang Ma ◽  
Kin K. Leung ◽  
Konstantinos Poularakis ◽  
Mudhakar Srivatsa

Author(s):  
Salman Ahmed ◽  
Mihir Sunil Gawand ◽  
Lukman Irshad ◽  
H. Onan Demirel

Computational human factors tools are often not fully-integrated during the early phases of product design. Often, conventional ergonomic practices require physical prototypes and human subjects which are costly in terms of finances and time. Ergonomics evaluations executed on physical prototypes has the limitations of increasing the overall rework as more iterations are required to incorporate design changes related to human factors that are found later in the design stage, which affects the overall cost of product development. This paper proposes a design methodology based on Digital Human Modeling (DHM) approach to inform designers about the ergonomics adequacies of products during early stages of design process. This proactive ergonomics approach has the potential to allow designers to identify significant design variables that affect the human performance before full-scale prototypes are built. The design method utilizes a surrogate model that represents human product interaction. Optimizing the surrogate model provides design concepts to optimize human performance. The efficacy of the proposed design method is demonstrated by a cockpit design study.


Author(s):  
Guiliang Liu ◽  
Oliver Schulte

A variety of machine learning models have been proposed to assess the performance of players in professional sports. However, they have only a limited ability to model how player performance depends on the game context. This paper proposes a new approach to capturing game context: we apply Deep Reinforcement Learning (DRL) to learn an action-value Q function from 3M play-by-play events in the National Hockey League (NHL). The neural network representation integrates both continuous context signals and game history, using a possession-based LSTM. The learned Q-function is used to value players' actions under different game contexts. To assess a player's overall performance, we introduce a novel Game Impact Metric (GIM) that aggregates the values of the player's actions. Empirical Evaluation shows GIM is consistent throughout a play season, and correlates highly with standard success measures and future salary.


Author(s):  
Peng Zhang ◽  
Jianye Hao ◽  
Weixun Wang ◽  
Hongyao Tang ◽  
Yi Ma ◽  
...  

Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to finetune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on several control tasks. The empirical results show that our approach, which combines suboptimal human knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.


Author(s):  
Caleb Scheffer Sponheim ◽  
Vasileios Papadourakis ◽  
Jennifer Collinger ◽  
John Downey ◽  
Jeffrey M Weiss ◽  
...  

Abstract Objective. Microelectrode arrays are standard tools for conducting chronic electrophysiological experiments, allowing researchers to simultaneously record from large numbers of neurons. Specifically, Utah electrode arrays (UEAs) have been utilized by scientists in many species, including rodents, rhesus macaques, marmosets, and human participants. The field of clinical human brain-computer interfaces currently relies on the UEA as a number of research groups have FDA clearance for this device through the investigational device exemption pathway. Despite its widespread usage in systems neuroscience, few studies have comprehensively evaluated the reliability and signal quality of the Utah array over long periods of time in a large dataset. Approach. We collected and analyzed over 6000 recorded datasets from various cortical areas spanning almost 9 years of experiments, totaling 17 rhesus macaques (Macaca Mulatta) and 2 human subjects, and 55 separate microelectrode Utah arrays. The scale of this dataset allowed us to evaluate the average life of these arrays, based primarily on the signal-to-noise ratio of each electrode over time. Main Results. Using implants in primary motor, premotor, prefrontal, and somatosensory cortices, we found that the average lifespan of available recordings from UEAs was 622 days, although we provide several examples of these UEAs lasting over 1000 days and one up to 9 years; human implants were also shown to last longer than non-human primate implants. We also found that electrode length did not affect longevity and quality, but iridium oxide metallization on the electrode tip exhibited superior yield as compared to platinum metallization.


Author(s):  
Abdelghafour Harraz ◽  
Mostapha Zbakh

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.


Sign in / Sign up

Export Citation Format

Share Document