scholarly journals Humans perseverate on punishment avoidance goals in multigoal reinforcement learning

2021 ◽  
Author(s):  
Paul B. Sharp ◽  
Evan Russek ◽  
Quentin JM Huys ◽  
Raymond J Dolan ◽  
Eran Eldar

Managing multiple goals is essential to adaptation, yet we are only beginning to understand computations by which we navigate the resource-demands entailed in so doing. Here, we sought to elucidate how humans balance reward seeking and punishment avoidance goals, and relate this to variation in its expression within anxious individuals. To do so, we developed a novel multigoal pursuit task that includes trial-specific instructed goals to either pursue reward (without risk of punishment) or avoid punishment (without the opportunity for reward). We constructed a computational model of multigoal pursuit to quantify the degree to which participants could disengage from the pursuit goals when instructed to, as well as devote less model-based resources towards goals that were less abundant. In general, participants (n=192) were less flexible in avoiding punishment than in pursuing reward. Thus, when instructed to pursue reward, participants often persisted in avoiding features that had previously been associated with punishment, even though at decision time these features were unambiguously benign. In a similar vein, participants showed no significant downregulation of avoidance when punishment avoidance goals were less abundant in the task. Importantly, individuals with chronic worry had particular difficulty disengaging from punishment avoidance under an instructed reward seeking goal. Taken together, the findings demonstrate that people avoid punishment less flexibly than they pursue reward, a difference that is more pronounced in individuals with chronic worry.

2017 ◽  
Author(s):  
Jiaming Cao ◽  
Pulkit Grover

AbstractUsing a systematic computational and modeling framework, we provide a novel Spatio-Temporal Interference-based stiMULation focUsing Strategy (STIMULUS) for high spatial precision noninvasive neurostimulation deep inside the brain. To do so, we first replicate the results of the recently proposed temporal interference (TI) stimulation (which was only tested in-vivo) in a computational model based on a Hodgkin-Huxley model for neurons and a model of current dispersion in the head. Using this computational model, we obtain a nontrivial extension of the 2-electrode-pair TI proposed originally to multielectrode TI (> 2 electrode pairs) that yields significantly higher spatial precision. To further improve precision, we develop STIMULUS techniques for generating spatial interference patterns in conjunction with temporal interference, and demonstrate strict and significant improvements over multielectrode TI. Finally, we utilize the adaptivity that is inherent in STIMULUS to create multisite neurostimulation patterns that can be dynamically steered over time.


2016 ◽  
Author(s):  
Evan M. Russek ◽  
Ida Momennejad ◽  
Matthew M. Botvinick ◽  
Samuel J. Gershman ◽  
Nathaniel D. Daw

AbstractHumans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.Author SummaryAccording to standard models, when confronted with a choice, animals and humans rely on two separate, distinct processes to come to a decision. One process deliberatively evaluates the consequences of each candidate action and is thought to underlie the ability to flexibly come up with novel plans. The other process gradually increases the propensity to perform behaviors that were previously successful and is thought to underlie automatically executed, habitual reflexes. Although computational principles and animal behavior support this dichotomy, at the neural level, there is little evidence supporting a clean segregation. For instance, although dopamine — famously implicated in drug addiction and Parkinson’s disease — currently only has a well-defined role in the automatic process, evidence suggests that it also plays a role in the deliberative process. In this work, we present a computational framework for resolving this mismatch. We show that the types of behaviors associated with either process could result from a common learning mechanism applied to different strategies for how populations of neurons could represent candidate actions. In addition to demonstrating that this account can produce the full range of flexible behavior observed in the empirical literature, we suggest experiments that could detect the various approaches within this framework.


2018 ◽  
Author(s):  
Stefano Palminteri ◽  
Laura Fontanesi ◽  
Maël Lebreton

When humans and animals learn by trial-and-error to select the most advantageous action, the progressive increase in action selection accuracy due to learning is typically accompanied by a decrease in the time needed to execute this action. Both choice and response time (RT) data can thus provide information about decision and learning processes. However, traditional reinforcement learning (RL) models focus exclusively on the increase in choice accuracy and ignore RTs. Consequently, they neither decompose the interactions between choices and RTs, nor investigate how these interactions are influenced by contextual factors. However, at least in the field of perceptual decision-making, such interactions have proven to be important to dissociate between the underlying processes. Here, we analyzed such interactions in behavioral data from four experiments, which feature manipulations of two factors: outcome valence (gains vs. losses) and feedback information (partial vs. complete feedback). A Bayesian meta-analysis revealed that these contextual factors differently affect RTs and accuracy. To disentangle the processes underlying the observed behavioral patterns, we jointly fitted choices and RTs across all experiments with a single, Bayesian, hierarchical diffusion decision model (DDM). In punishment-avoidance contexts, compared to reward-seeking contexts, participants consistently slowed down without any loss of accuracy. The DDM explained these effects by shifts in the non-decision time and threshold parameters. The reduced motor facilitation may represent the basis of Pavlovian-to-instrumental transfer biases, while the increased cautiousness might be induced by the expectation of losses and be consistent with the loss attention framework.


2019 ◽  
Author(s):  
Leor M Hackel ◽  
Jeffrey Jordan Berg ◽  
Björn Lindström ◽  
David Amodio

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.


2018 ◽  
Vol 40 ◽  
pp. 23-32 ◽  
Author(s):  
Vedrana Baličević ◽  
Hrvoje Kalinić ◽  
Sven Lončarić ◽  
Maja Čikeš ◽  
Bart Bijnens

Author(s):  
Zhenhuan Rao ◽  
Yuechen Wu ◽  
Zifei Yang ◽  
Wei Zhang ◽  
Shijian Lu ◽  
...  

2020 ◽  
Vol 68 (8) ◽  
pp. 612-624
Author(s):  
Max Pritzkoleit ◽  
Robert Heedt ◽  
Carsten Knoll ◽  
Klaus Röbenack

ZusammenfassungIn diesem Beitrag nutzen wir Künstliche Neuronale Netze (KNN) zur Approximation der Dynamik nichtlinearer (mechanischer) Systeme. Diese iterativ approximierten neuronalen Systemmodelle werden in einer Offline-Trajektorienplanung verwendet, um eine optimale Rückführung zu bestimmen, welche auf das reale System angewandt wird. Dieser Ansatz des modellbasierten bestärkenden Lernens (engl. model-based reinforcement learning (RL)) wird am Aufschwingen des Einfachwagenpendels zunächst simulativ evaluiert und zeigt gegenüber modellfreien RL-Ansätzen eine signifikante Verbesserung der Dateneffizienz. Weiterhin zeigen wir Experimentalergebnisse an einem Versuchsstand, wobei der vorgestellte Algorithmus innerhalb weniger Versuche in der Lage ist, eine für das System optimale Rückführung hinreichend gut zu approximieren.


Sign in / Sign up

Export Citation Format

Share Document