Planning Complexity Registers as a Cost in Metacontrol

2018 ◽  
Vol 30 (10) ◽  
pp. 1391-1404 ◽  
Author(s):  
Wouter Kool ◽  
Samuel J. Gershman ◽  
Fiery A. Cushman

Decision-making algorithms face a basic tradeoff between accuracy and effort (i.e., computational demands). It is widely agreed that humans can choose between multiple decision-making processes that embody different solutions to this tradeoff: Some are computationally cheap but inaccurate, whereas others are computationally expensive but accurate. Recent progress in understanding this tradeoff has been catalyzed by formalizing it in terms of model-free (i.e., habitual) versus model-based (i.e., planning) approaches to reinforcement learning. Intuitively, if two tasks offer the same rewards for accuracy but one of them is much more demanding, we might expect people to rely on habit more in the difficult task: Devoting significant computation to achieve slight marginal accuracy gains would not be “worth it.” We test and verify this prediction in a sequential reinforcement learning task. Because our paradigm is amenable to formal analysis, it contributes to the development of a computational model of how people balance the costs and benefits of different decision-making processes in a task-specific manner; in other words, how we decide when hard thinking is worth it.

Author(s):  
Thomas Boraud

This chapter assesses alternative approaches of reinforcement learning that are developed by machine learning. The initial goal of this branch of artificial intelligence, which appeared in the middle of the twentieth century, was to develop and implement algorithms that allow a machine to learn. Originally, they were computers or more or less autonomous robotic automata. As artificial intelligence has developed and cross-fertilized with neuroscience, it has begun to be used to model the learning and decision-making processes for biological agents, broadening the meaning of the word ‘machine’. Theoreticians of this discipline define several categories of learning, but this chapter only deals with those which are related to reinforcement learning. To understand how these algorithms work, it is necessary first of all to explain the Markov chain and the Markov decision-making process. The chapter then goes on to examine model-free reinforcement learning algorithms, the actor-critic model, and finally model-based reinforcement learning algorithms.


Author(s):  
Andreas Heinz

While dopaminergic neurotransmission has largely been implicated in reinforcement learning and model-based versus model-free decision making, serotonergic neurotransmission has been implicated in encoding aversive outcomes. Accordingly, serotonin dysfunction has been observed in disorders characterized by negative affect including depression, anxiety and addiction. Serotonin dysfunction in these mental disorders is described and its association with negative affect is discussed.


2019 ◽  
Author(s):  
Leor M Hackel ◽  
Jeffrey Jordan Berg ◽  
Björn Lindström ◽  
David Amodio

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.


2020 ◽  
Vol 10 (8) ◽  
pp. 508
Author(s):  
Hiroyoshi Ogishima ◽  
Shunta Maeda ◽  
Yuki Tanaka ◽  
Hironori Shimada

Background: In this study, we examined the relationships between reward-based decision-making in terms of learning rate, memory rate, exploration rate, and depression-related subjective emotional experience, in terms of interoception and feelings, to understand how reward-based decision-making is impaired in depression. Methods: In all, 52 university students were randomly assigned to an experimental group and a control group. To manipulate interoception, the participants in the experimental group were instructed to tune their internal somatic sense to the skin-conductance-response waveform presented on a display. The participants in the control group were only instructed to stay relaxed. Before and after the manipulation, the participants completed a probabilistic reversal-learning task to assess reward-based decision-making using reinforcement learning modeling. Similarly, participants completed a probe-detection task, a heartbeat-detection task, and self-rated scales. Results: The experimental manipulation of interoception was not successful. In the baseline testing, reinforcement learning modeling indicated a marginally-significant correlation between the exploration rate and depressive symptoms. However, the exploration rate was significantly associated with lower interoceptive attention and higher depressive feeling. Conclusions: The findings suggest that situational characteristics may be closely involved in reward exploration and highlight the clinically-meaningful possibility that intervention for affective processes may impact reward-based decision-making in those with depression.


2018 ◽  
Author(s):  
Nura Sidarus ◽  
Stefano Palminteri ◽  
Valérian Chambon

AbstractValue-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Yet, it remains unclear whether conflict is also perceived as a cost in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of being in conflict with an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that conflict was avoided when evidence for either action alternative was weak, demonstrating that the cost of conflict was traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one’s actions and external distractors. Our results show that the subjective cost of conflict factors into value-based decision-making, and highlights that different types of conflict may have different effects on learning about action outcomes.


Author(s):  
Todd M. Gureckis ◽  
Bradley C. Love

Reinforcement learning (RL) refers to the scientific study of how animals and machines adapt their behavior in order to maximize reward. The history of RL research can be traced to early work in psychology on instrumental learning behavior. However, the modern field of RL is a highly interdisciplinary area that lies that the intersection of ideas in computer science, machine learning, psychology, and neuroscience. This chapter summarizes the key mathematical ideas underlying this field including the exploration/exploitation dilemma, temporal-difference (TD) learning, Q-learning, and model-based versus model-free learning. In addition, a broad survey of open questions in psychology and neuroscience are reviewed.


2021 ◽  
Author(s):  
Monja P. Neuser ◽  
Franziska Kräutlein ◽  
Anne Kühnel ◽  
Vanessa Teckentrup ◽  
Jennifer Svaldi ◽  
...  

AbstractReinforcement learning is a core facet of motivation and alterations have been associated with various mental disorders. To build better models of individual learning, repeated measurement of value-based decision-making is crucial. However, the focus on lab-based assessment of reward learning has limited the number of measurements and the test-retest reliability of many decision-related parameters is therefore unknown. Here, we developed an open-source cross-platform application Influenca that provides a novel reward learning task complemented by ecological momentary assessment (EMA) for repeated assessment over weeks. In this task, players have to identify the most effective medication by selecting the best option after integrating offered points with changing probabilities (according to random Gaussian walks). Participants can complete up to 31 levels with 150 trials each. To encourage replay on their preferred device, in-game screens provide feedback on the progress. Using an initial validation sample of 127 players (2904 runs), we found that reinforcement learning parameters such as the learning rate and reward sensitivity show low to medium intra-class correlations (ICC: 0.22-0.52), indicating substantial within- and between-subject variance. Notably, state items showed comparable ICCs as reinforcement learning parameters. To conclude, our innovative and openly customizable app framework provides a gamified task that optimizes repeated assessments of reward learning to better quantify intra- and inter-individual differences in value-based decision-making over time.


2018 ◽  
Author(s):  
Xiaoxue Gao ◽  
Hongbo Yu ◽  
Ignacio Saez ◽  
Philip R. Blue ◽  
Lusha Zhu ◽  
...  

AbstractHumans are capable of integrating social contextual information into decision-making processes to adjust their attitudes towards inequity. This context-dependency emerges both when individual is better off (i.e. advantageous inequity) and worse off (i.e. disadvantageous inequity) than others. It is not clear however, whether the context-dependent processing of advantageous and disadvantageous inequity rely on dissociable or shared neural mechanisms. Here, by combining an interpersonal interactive game that gave rise to interpersonal guilt and different versions of the dictator games that enabled us to characterize individual weights on aversion to advantageous and disadvantageous inequity, we investigated the neural mechanisms underlying the two forms of inequity aversion in the interpersonal guilt context. In each round, participants played a dot-estimation task with an anonymous co-player. The co-players received pain stimulation with 50% probability when anyone responded incorrectly. At the end of each round, participants completed a dictator game, which determined payoffs of him/herself and the co-player. Both computational model-based and model-free analyses demonstrated that when inflicting pain upon co-players (i.e., the guilt context), participants cared more about advantageous inequity and became less sensitive to disadvantageous inequity, compared with other social contexts. The contextual effects on two forms of inequity aversion are uncorrelated with each other at the behavioral level. Neuroimaging results revealed that the context-dependent representation of inequity aversion exhibited a spatial gradient in activity within the insula, with anterior parts predominantly involved in the aversion to advantageous inequity and posterior parts predominantly involved in the aversion to disadvantageous inequity. The dissociable mechanisms underlying the two forms of inequity aversion are further supported by the involvement of right dorsolateral prefrontal cortex and dorsomedial prefrontal cortex in advantageous inequity processing, and the involvement of right amygdala and dorsal anterior cingulate cortex in disadvantageous inequity processing. These results extended our understanding of decision-making processes involving inequity and the social functions of inequity aversion.


Sign in / Sign up

Export Citation Format

Share Document