scholarly journals Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning

2021 ◽  
Vol 6 (3) ◽  
pp. 4257-4264
Author(s):  
Florian Fuchs ◽  
Yunlong Song ◽  
Elia Kaufmann ◽  
Davide Scaramuzza ◽  
Peter Durr
2017 ◽  
Author(s):  
Martin Dinov ◽  
Robert Leech

AbstractReinforcement learning (RL) is a general-purpose powerful machine learning framework within which we can model various deterministic, non-deterministic and complex environments. We applied RL to the problem of tracking and improving human sustained attention during a simple sustained attention to response task (SART) in a proof of concept study with two subjects, using state-of-the-art deep neural network-based RL in the form of Deep Q Networks (DQNs). While others have used RL in EEG settings previously, none have applied it in a neurofeedback (NFB) setting, which seems a natural problem within Brain Computer Interfaces (BCIs) to tackle using end-to-end RL in the form of DQNs, due to both the problem’s non-stationarity and the ability of RL to learn in a continuous setting. Furthermore, while many have explored phasic alerting previously, learning optimal alerting in a personalized way in real time is a less explored field, which we believe RL to be an most suitable solution for. First, we used empirically-derived simulated data of EEG and reaction times and subsequent parameter/algorithmic exploration within this simulated model to pick parameters for the DQN that are more likely to be optimal for the experimental setup and to explore the behavior of DQNs in this task setting. We then applied the method on two subjects and show that we get different but plausible results for both subjects, suggesting something about the behavior of DQNs in this setting. For this experimental part, we used parameters suggested to us by the simulation results. This RL-based behavioral- and neuro-feedback BCI method we have developed here is input feature agnostic and allows for complex continuous actions to be learned in other more complex closed-loop behavioral or neuro-feedback approaches.


2018 ◽  
Author(s):  
Stefan Niculae

Penetration testing is the practice of performing a simulated attack on a computer system in order to reveal its vulnerabilities. The most common approach is to gain information and then plan and execute the attack manually, by a security expert. This manual method cannot meet the speed and frequency required for efficient, large-scale secu- rity solutions development. To address this, we formalize penetration testing as a security game between an attacker who tries to compro- mise a network and a defending adversary actively protecting it. We compare multiple algorithms for finding the attacker’s strategy, from fixed-strategy to Reinforcement Learning, namely Q-Learning (QL), Extended Classifier Systems (XCS) and Deep Q-Networks (DQN). The attacker’s strength is measured in terms of speed and stealthi- ness, in the specific environment used in our simulations. The results show that QL surpasses human performance, XCS yields worse than human performance but is more stable, and the slow convergence of DQN keeps it from achieving exceptional performance, in addition, we find that all of these Machine Learning approaches outperform fixed-strategy attackers.


2017 ◽  
Author(s):  
Carolina Feher da Silva ◽  
Camila Gomes Victorino ◽  
Nestor Caticha ◽  
Marcus Vinícius Chrysóstomo Baldo

ABSTRACTResearch has not yet reached a consensus on why humans match probabilities instead of maximise in a probability learning task. The most influential explanation is that they search for patterns in the random sequence of outcomes. Other explanations, such as expectation matching, are plausible, but do not consider how reinforcement learning shapes people’s choices.We aimed to quantify how human performance in a probability learning task is affected by pattern search and reinforcement learning. We collected behavioural data from 84 young adult participants who performed a probability learning task wherein the majority outcome was rewarded with 0.7 probability, and analysed the data using a reinforcement learning model that searches for patterns. Model simulations indicated that pattern search, exploration, recency (discounting early experiences), and forgetting may impair performance.Our analysis estimated that 85% (95% HDI [76, 94]) of participants searched for patterns and believed that each trial outcome depended on one or two previous ones. The estimated impact of pattern search on performance was, however, only 6%, while those of exploration and recency were 19% and 13% respectively. This suggests that probability matching is caused by uncertainty about how outcomes are generated, which leads to pattern search, exploration, and recency.


Author(s):  
Matteo Hessel ◽  
Hubert Soyer ◽  
Lasse Espeholt ◽  
Wojciech Czarnecki ◽  
Simon Schmitt ◽  
...  

The reinforcement learning (RL) community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequentialdecision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.


2021 ◽  
Vol 8 ◽  
Author(s):  
Michael S. Lee ◽  
Henny Admoni ◽  
Reid Simmons

As robots continue to acquire useful skills, their ability to teach their expertise will provide humans the two-fold benefit of learning from robots and collaborating fluently with them. For example, robot tutors could teach handwriting to individual students and delivery robots could convey their navigation conventions to better coordinate with nearby human workers. Because humans naturally communicate their behaviors through selective demonstrations, and comprehend others’ through reasoning that resembles inverse reinforcement learning (IRL), we propose a method of teaching humans based on demonstrations that are informative for IRL. But unlike prior work that optimizes solely for IRL, this paper incorporates various human teaching strategies (e.g. scaffolding, simplicity, pattern discovery, and testing) to better accommodate human learners. We assess our method with user studies and find that our measure of test difficulty corresponds well with human performance and confidence, and also find that favoring simplicity and pattern discovery increases human performance on difficult tests. However, we did not find a strong effect for our method of scaffolding, revealing shortcomings that indicate clear directions for future work.


2021 ◽  
Vol 8 ◽  
Author(s):  
Pablo Barros ◽  
Anne C. Bloem ◽  
Inge M. Hootsmans ◽  
Lena M. Opheij ◽  
Romain H. A. Toebosch ◽  
...  

Reinforcement learning simulation environments pose an important experimental test bed and facilitate data collection for developing AI-based robot applications. Most of them, however, focus on single-agent tasks, which limits their application to the development of social agents. This study proposes the Chef’s Hat simulation environment, which implements a multi-agent competitive card game that is a complete reproduction of the homonymous board game, designed to provoke competitive strategies in humans and emotional responses. The game was shown to be ideal for developing personalized reinforcement learning, in an online learning closed-loop scenario, as its state representation is extremely dynamic and directly related to each of the opponent’s actions. To adapt current reinforcement learning agents to this scenario, we also developed the COmPetitive Prioritized Experience Replay (COPPER) algorithm. With the help of COPPER and the Chef’s Hat simulation environment, we evaluated the following: (1) 12 experimental learning agents, trained via four different regimens (self-play, play against a naive baseline, PER, or COPPER) with three algorithms based on different state-of-the-art learning paradigms (PPO, DQN, and ACER), and two “dummy” baseline agents that take random actions, (2) the performance difference between COPPER and PER agents trained using the PPO algorithm and playing against different agents (PPO, DQN, and ACER) or all DQN agents, and (3) human performance when playing against two different collections of agents. Our experiments demonstrate that COPPER helps agents learn to adapt to different types of opponents, improving the performance when compared to off-line learning models. An additional contribution of the study is the formalization of the Chef’s Hat competitive game and the implementation of the Chef’s Hat Player Club, a collection of trained and assessed agents as an enabler for embedding human competitive strategies in social continual and competitive reinforcement learning.


2008 ◽  
Vol 44 ◽  
pp. 11-26 ◽  
Author(s):  
Ralph Beneke ◽  
Dieter Böning

Human performance, defined by mechanical resistance and distance per time, includes human, task and environmental factors, all interrelated. It requires metabolic energy provided by anaerobic and aerobic metabolic energy sources. These sources have specific limitations in the capacity and rate to provide re-phosphorylation energy, which determines individual ratios of aerobic and anaerobic metabolic power and their sustainability. In healthy athletes, limits to provide and utilize metabolic energy are multifactorial, carefully matched and include a safety margin imposed in order to protect the integrity of the human organism under maximal effort. Perception of afferent input associated with effort leads to conscious or unconscious decisions to modulate or terminate performance; however, the underlying mechanisms of cerebral control are not fully understood. The idea to move borders of performance with the help of biochemicals is two millennia old. Biochemical findings resulted in highly effective substances widely used to increase performance in daily life, during preparation for sport events and during competition, but many of them must be considered as doping and therefore illegal. Supplements and food have ergogenic potential; however, numerous concepts are controversially discussed with respect to legality and particularly evidence in terms of usefulness and risks. The effect of evidence-based nutritional strategies on adaptations in terms of gene and protein expression that occur in skeletal muscle during and after exercise training sessions is widely unknown. Biochemical research is essential for better understanding of the basic mechanisms causing fatigue and the regulation of the dynamic adaptation to physical and mental training.


2004 ◽  
Vol 171 (4S) ◽  
pp. 496-497
Author(s):  
Edward D. Matsumoto ◽  
George V. Kondraske ◽  
Lucas Jacomides ◽  
Kenneth Ogan ◽  
Margaret S. Pearle ◽  
...  

2015 ◽  
Vol 31 (1) ◽  
pp. 20-30 ◽  
Author(s):  
William S. Helton ◽  
Katharina Näswall

Conscious appraisals of stress, or stress states, are an important aspect of human performance. This article presents evidence supporting the validity and measurement characteristics of a short multidimensional self-report measure of stress state, the Short Stress State Questionnaire (SSSQ; Helton, 2004 ). The SSSQ measures task engagement, distress, and worry. A confirmatory factor analysis of the SSSQ using data pooled from multiple samples suggests the SSSQ does have a three factor structure and post-task changes are not due to changes in factor structure, but to mean level changes (state changes). In addition, the SSSQ demonstrates sensitivity to task stressors in line with hypotheses. Different task conditions elicited unique patterns of stress state on the three factors of the SSSQ in line with prior predictions. The 24-item SSSQ is a valid measure of stress state which may be useful to researchers interested in conscious appraisals of task-related stress.


Sign in / Sign up

Export Citation Format

Share Document