Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning

AbstractReinforcement learning (RL) is a general-purpose powerful machine learning framework within which we can model various deterministic, non-deterministic and complex environments. We applied RL to the problem of tracking and improving human sustained attention during a simple sustained attention to response task (SART) in a proof of concept study with two subjects, using state-of-the-art deep neural network-based RL in the form of Deep Q Networks (DQNs). While others have used RL in EEG settings previously, none have applied it in a neurofeedback (NFB) setting, which seems a natural problem within Brain Computer Interfaces (BCIs) to tackle using end-to-end RL in the form of DQNs, due to both the problem’s non-stationarity and the ability of RL to learn in a continuous setting. Furthermore, while many have explored phasic alerting previously, learning optimal alerting in a personalized way in real time is a less explored field, which we believe RL to be an most suitable solution for. First, we used empirically-derived simulated data of EEG and reaction times and subsequent parameter/algorithmic exploration within this simulated model to pick parameters for the DQN that are more likely to be optimal for the experimental setup and to explore the behavior of DQNs in this task setting. We then applied the method on two subjects and show that we get different but plausible results for both subjects, suggesting something about the behavior of DQNs in this setting. For this experimental part, we used parameters suggested to us by the simulation results. This RL-based behavioral- and neuro-feedback BCI method we have developed here is input feature agnostic and allows for complex continuous actions to be learned in other more complex closed-loop behavioral or neuro-feedback approaches.

Download Full-text

Reinforcement Learning vs Genetic Algorithms in Game-Theoretic Cyber-Security

10.31237/osf.io/nxzep ◽

2018 ◽

Cited By ~ 1

Author(s):

Stefan Niculae

Keyword(s):

Reinforcement Learning ◽

Cyber Security ◽

Large Scale ◽

Human Performance ◽

Learning Approaches ◽

Classifier Systems ◽

Penetration Testing ◽

Q Learning ◽

Game Theoretic ◽

Security Game

Penetration testing is the practice of performing a simulated attack on a computer system in order to reveal its vulnerabilities. The most common approach is to gain information and then plan and execute the attack manually, by a security expert. This manual method cannot meet the speed and frequency required for efficient, large-scale secu- rity solutions development. To address this, we formalize penetration testing as a security game between an attacker who tries to compro- mise a network and a defending adversary actively protecting it. We compare multiple algorithms for finding the attacker’s strategy, from fixed-strategy to Reinforcement Learning, namely Q-Learning (QL), Extended Classifier Systems (XCS) and Deep Q-Networks (DQN). The attacker’s strength is measured in terms of speed and stealthi- ness, in the specific environment used in our simulations. The results show that QL surpasses human performance, XCS yields worse than human performance but is more stable, and the slow convergence of DQN keeps it from achieving exceptional performance, in addition, we find that all of these Machine Learning approaches outperform fixed-strategy attackers.

Download Full-text

Exploration and recency as the main proximate causes of probability matching: a reinforcement learning analysis

10.1101/104752 ◽

2017 ◽

Author(s):

Carolina Feher da Silva ◽

Camila Gomes Victorino ◽

Nestor Caticha ◽

Marcus Vinícius Chrysóstomo Baldo

Keyword(s):

Reinforcement Learning ◽

Human Performance ◽

Probability Learning ◽

Random Sequence ◽

Learning Task ◽

Pattern Search ◽

Probability Matching ◽

Probability Learning Task ◽

Adult Participants ◽

Majority Outcome

ABSTRACTResearch has not yet reached a consensus on why humans match probabilities instead of maximise in a probability learning task. The most influential explanation is that they search for patterns in the random sequence of outcomes. Other explanations, such as expectation matching, are plausible, but do not consider how reinforcement learning shapes people’s choices.We aimed to quantify how human performance in a probability learning task is affected by pattern search and reinforcement learning. We collected behavioural data from 84 young adult participants who performed a probability learning task wherein the majority outcome was rewarded with 0.7 probability, and analysed the data using a reinforcement learning model that searches for patterns. Model simulations indicated that pattern search, exploration, recency (discounting early experiences), and forgetting may impair performance.Our analysis estimated that 85% (95% HDI [76, 94]) of participants searched for patterns and believed that each trial outcome depended on one or two previous ones. The estimated impact of pattern search on performance was, however, only 6%, while those of exploration and recency were 19% and 13% respectively. This suggests that probability matching is caused by uncertainty about how outcomes are generated, which leads to pattern search, exploration, and recency.

Download Full-text

Multi-Task Deep Reinforcement Learning with PopArt

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013796 ◽

2019 ◽

Vol 33 ◽

pp. 3796-3803 ◽

Cited By ~ 21

Author(s):

Matteo Hessel ◽

Hubert Soyer ◽

Lasse Espeholt ◽

Wojciech Czarnecki ◽

Simon Schmitt ◽

...

Keyword(s):

Reinforcement Learning ◽

Human Performance ◽

Learning Algorithm ◽

State Of The Art ◽

Single Agent ◽

Learning System ◽

Learning Platform ◽

Art Performance ◽

The One ◽

First Time

The reinforcement learning (RL) community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequentialdecision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.

Download Full-text

Explanation-Based Reward Coaching to Improve Human Performance via Reinforcement Learning

2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI) ◽

10.1109/hri.2019.8673104 ◽

2019 ◽

Cited By ~ 5

Author(s):

Aaquib Tabrez ◽

Shivendra Agrawal ◽

Bradley Hayes

Keyword(s):

Reinforcement Learning ◽

Human Performance

Download Full-text

Machine Teaching for Human Inverse Reinforcement Learning

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.693050 ◽

2021 ◽

Vol 8 ◽

Author(s):

Michael S. Lee ◽

Henny Admoni ◽

Reid Simmons

Keyword(s):

Reinforcement Learning ◽

Teaching Strategies ◽

Human Performance ◽

Pattern Discovery ◽

User Studies ◽

Prior Work ◽

Inverse Reinforcement Learning ◽

Future Work

As robots continue to acquire useful skills, their ability to teach their expertise will provide humans the two-fold benefit of learning from robots and collaborating fluently with them. For example, robot tutors could teach handwriting to individual students and delivery robots could convey their navigation conventions to better coordinate with nearby human workers. Because humans naturally communicate their behaviors through selective demonstrations, and comprehend others’ through reasoning that resembles inverse reinforcement learning (IRL), we propose a method of teaching humans based on demonstrations that are informative for IRL. But unlike prior work that optimizes solely for IRL, this paper incorporates various human teaching strategies (e.g. scaffolding, simplicity, pattern discovery, and testing) to better accommodate human learners. We assess our method with user studies and find that our measure of test difficulty corresponds well with human performance and confidence, and also find that favoring simplicity and pattern discovery increases human performance on difficult tests. However, we did not find a strong effect for our method of scaffolding, revealing shortcomings that indicate clear directions for future work.

Download Full-text

You Were Always on My Mind: Introducing Chef’s Hat and COPPER for Personalized Reinforcement Learning

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.669990 ◽

2021 ◽

Vol 8 ◽

Author(s):

Pablo Barros ◽

Anne C. Bloem ◽

Inge M. Hootsmans ◽

Lena M. Opheij ◽

Romain H. A. Toebosch ◽

...

Keyword(s):

Reinforcement Learning ◽

Human Performance ◽

Single Agent ◽

Additional Contribution ◽

Simulation Environment ◽

Test Bed ◽

Competitive Strategies ◽

Learning Agents ◽

Competitive Game ◽

Experience Replay

Reinforcement learning simulation environments pose an important experimental test bed and facilitate data collection for developing AI-based robot applications. Most of them, however, focus on single-agent tasks, which limits their application to the development of social agents. This study proposes the Chef’s Hat simulation environment, which implements a multi-agent competitive card game that is a complete reproduction of the homonymous board game, designed to provoke competitive strategies in humans and emotional responses. The game was shown to be ideal for developing personalized reinforcement learning, in an online learning closed-loop scenario, as its state representation is extremely dynamic and directly related to each of the opponent’s actions. To adapt current reinforcement learning agents to this scenario, we also developed the COmPetitive Prioritized Experience Replay (COPPER) algorithm. With the help of COPPER and the Chef’s Hat simulation environment, we evaluated the following: (1) 12 experimental learning agents, trained via four different regimens (self-play, play against a naive baseline, PER, or COPPER) with three algorithms based on different state-of-the-art learning paradigms (PPO, DQN, and ACER), and two “dummy” baseline agents that take random actions, (2) the performance difference between COPPER and PER agents trained using the PPO algorithm and playing against different agents (PPO, DQN, and ACER) or all DQN agents, and (3) human performance when playing against two different collections of agents. Our experiments demonstrate that COPPER helps agents learn to adapt to different types of opponents, improving the performance when compared to off-line learning models. An additional contribution of the study is the formalization of the Chef’s Hat competitive game and the implementation of the Chef’s Hat Player Club, a collection of trained and assessed agents as an enabler for embedding human competitive strategies in social continual and competitive reinforcement learning.

Download Full-text

The limits of human performance

Essays in Biochemistry ◽

10.1042/bse0440011 ◽

2008 ◽

Vol 44 ◽

pp. 11-26 ◽

Cited By ~ 14

Author(s):

Ralph Beneke ◽

Dieter Böning

Keyword(s):

Human Performance ◽

Safety Margin ◽

Afferent Input ◽

Metabolic Energy ◽

Human Organism ◽

Mechanical Resistance ◽

As Doping ◽

Gene And Protein Expression ◽

Underlying Mechanisms ◽

Biochemical Research

Human performance, defined by mechanical resistance and distance per time, includes human, task and environmental factors, all interrelated. It requires metabolic energy provided by anaerobic and aerobic metabolic energy sources. These sources have specific limitations in the capacity and rate to provide re-phosphorylation energy, which determines individual ratios of aerobic and anaerobic metabolic power and their sustainability. In healthy athletes, limits to provide and utilize metabolic energy are multifactorial, carefully matched and include a safety margin imposed in order to protect the integrity of the human organism under maximal effort. Perception of afferent input associated with effort leads to conscious or unconscious decisions to modulate or terminate performance; however, the underlying mechanisms of cerebral control are not fully understood. The idea to move borders of performance with the help of biochemicals is two millennia old. Biochemical findings resulted in highly effective substances widely used to increase performance in daily life, during preparation for sport events and during competition, but many of them must be considered as doping and therefore illegal. Supplements and food have ergogenic potential; however, numerous concepts are controversially discussed with respect to legality and particularly evidence in terms of usefulness and risks. The effect of evidence-based nutritional strategies on adaptations in terms of gene and protein expression that occur in skeletal muscle during and after exercise training sessions is widely unknown. Biochemical research is essential for better understanding of the basic mechanisms causing fatigue and the regulation of the dynamic adaptation to physical and mental training.

Download Full-text

1880: Assessment of Basic Human Performance Resources Predicts Performance of Ureterorenoscopy in Human Cadavers

The Journal of Urology ◽

10.1016/s0022-5347(18)39072-4 ◽

2004 ◽

Vol 171 (4S) ◽

pp. 496-497

Author(s):

Edward D. Matsumoto ◽

George V. Kondraske ◽

Lucas Jacomides ◽

Kenneth Ogan ◽

Margaret S. Pearle ◽

...

Keyword(s):

Human Performance ◽

Human Cadavers

Download Full-text

Short Stress State Questionnaire

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000200 ◽

2015 ◽

Vol 31 (1) ◽

pp. 20-30 ◽

Cited By ~ 28

Author(s):

William S. Helton ◽

Katharina Näswall

Keyword(s):

Stress State ◽

Factor Structure ◽

Human Performance ◽

Self Report ◽

Confirmatory Factor ◽

Stress States ◽

Related Stress ◽

Using Data ◽

Task Conditions ◽

Multiple Samples

Conscious appraisals of stress, or stress states, are an important aspect of human performance. This article presents evidence supporting the validity and measurement characteristics of a short multidimensional self-report measure of stress state, the Short Stress State Questionnaire (SSSQ; Helton, 2004 ). The SSSQ measures task engagement, distress, and worry. A confirmatory factor analysis of the SSSQ using data pooled from multiple samples suggests the SSSQ does have a three factor structure and post-task changes are not due to changes in factor structure, but to mean level changes (state changes). In addition, the SSSQ demonstrates sensitivity to task stressors in line with hypotheses. Different task conditions elicited unique patterns of stress state on the three factors of the SSSQ in line with prior predictions. The 24-item SSSQ is a valid measure of stress state which may be useful to researchers interested in conscious appraisals of task-related stress.

Download Full-text