scholarly journals Multiplexed action-outcome representation by striatal striosome-matrix compartments detected with a novel cost-benefit foraging task

2021 ◽  
Author(s):  
Bernard Bloem ◽  
Rafiq Huda ◽  
Ken-ichi Amemori ◽  
Alexander Abate ◽  
Gaya Krishna ◽  
...  

ABSTRACTLearning about positive and negative outcomes of actions is crucial for survival and underpinned by conserved circuits including the striatum. How associations between actions and outcomes are formed is not fully understood, particularly when the outcomes have mixed positive and negative features. We developed a novel foraging (‘bandit’) task requiring mice to maximize rewards while minimizing punishments. By 2-photon Ca++ imaging, we monitored activity of 5831 identified anterodorsal striatal striosomal and matrix neurons. Surprisingly, we found that action-outcome associations for reward and punishment were combinatorially encoded rather than being integrated as overall outcome value. Single neurons could, for one action, encode outcomes of opposing valence. Striosome compartments consistently exhibited stronger representations of reinforcement outcomes than matrix, especially for high reward or punishment prediction errors. These findings demonstrate a remarkable multiplexing of action-outcome contingencies by single identified striatal neurons and suggest that striosomal neurons are differentially important in action-outcome learning.

eLife ◽  
2016 ◽  
Vol 5 ◽  
Author(s):  
Hideyuki Matsumoto ◽  
Ju Tian ◽  
Naoshige Uchida ◽  
Mitsuko Watabe-Uchida

Dopamine is thought to regulate learning from appetitive and aversive events. Here we examined how optogenetically-identified dopamine neurons in the lateral ventral tegmental area of mice respond to aversive events in different conditions. In low reward contexts, most dopamine neurons were exclusively inhibited by aversive events, and expectation reduced dopamine neurons’ responses to reward and punishment. When a single odor predicted both reward and punishment, dopamine neurons’ responses to that odor reflected the integrated value of both outcomes. Thus, in low reward contexts, dopamine neurons signal value prediction errors (VPEs) integrating information about both reward and aversion in a common currency. In contrast, in high reward contexts, dopamine neurons acquired a short-latency excitation to aversive events that masked their VPE signaling. Our results demonstrate the importance of considering the contexts to examine the representation in dopamine neurons and uncover different modes of dopamine signaling, each of which may be adaptive for different environments.


2021 ◽  
Author(s):  
Stella Voulgaropoulou ◽  
Fasya Fauzani ◽  
Janine Pfirrmann ◽  
Claudia Vingerhoets ◽  
Thérèse van Amelsvoort ◽  
...  

AbstractStressful events trigger a complex physiological reaction – the fight-or-flight response – that can hamper flexible decision-making. Inspired by key neural and peripheral characteristics of the fight-or-flight response, here we ask whether acute stress changes how humans learn about costs and benefits. Participants were randomly exposed to an acute stress or no-stress control condition after which they completed a cost-benefit reinforcement learning task. Acute stress improved learning to maximize benefits (monetary rewards) relative to minimising energy expenditure (grip force). Using computational modelling, we demonstrate that costs and benefits can exert asymmetric effects on decisions when prediction errors that convey information about the reward value and cost of actions receive inappropriate importance; a process associated with distinct alterations in pupil size fluctuations. These results provide new insights into learning strategies under acute stress – which, depending on the context, may be maladaptive or beneficial - and candidate neuromodulatory mechanisms that could underlie such behaviour.


2022 ◽  
pp. 398-417
Author(s):  
Sean Fitzpatrick ◽  
Timothy Marsh

While gamification represents one of the largest technology trends of the last decade, only a limited selection of literature exists that explores the negative outcomes of contemporary gamified services, applications, and systems. This chapter explores the consequences of gamified systems and services, investigating contemporary implementations of gamification and acknowledging the ethical concerns raised by researchers towards contemporary gamified services. This chapter further explores these ethical concerns through a critical instance case study of China's Social Credit System and arrives at informed observations on the potential for gamified cycles of reward and punishment to encourage unethical activity within organisations as well as legitimise ideological objectives that violate fundamental human rights. Recommendations are then made for researchers to explore this potential further, while recognising how gamification may justify the authority and practices of organisations, particularly those engaged in unethical and dehumanising behaviour.


2019 ◽  
Vol 8 (1) ◽  
pp. 155-168
Author(s):  
Allison M. Stuppy-Sullivan ◽  
Joshua W. Buckholtz ◽  
Arielle Baskin-Sommers

Aberrant cost–benefit decision making is a key factor related to individual differences in the expression of substance use disorders (SUDs). Previous research highlights how delay-cost sensitivity affects variability in SUDs; however, other forms of cost–benefit decision making—effort-based choice—have received less attention. We administered the Effort Expenditure for Rewards Task (EEfRT) in an SUD-enriched community sample ( N = 80). Individuals with more severe SUDs were less likely to use information about expected value when deciding between high-effort, high-reward and low-effort, low-reward options. Furthermore, individuals whose severity of use was primarily related to avoiding aversive affective states and individuals with heightened sensitivity to delay costs during intertemporal decision making were the least sensitive to expected value signals when making decisions to engage in effortful behavior. Together, these findings suggest that individuals with more severe SUDs have difficulty integrating multiple decision variables to guide behavior during effort-based decision making.


2019 ◽  
Vol 73 (2) ◽  
pp. 249-259 ◽  
Author(s):  
Yanlong Song ◽  
Siyuan Lu ◽  
Ann L Smiley-Oyen

Visuomotor adaptation involves multiple processes such as explicit learning, implicit learning from sensory prediction errors, and model-free mechanisms like use-dependent plasticity. Recent findings show that reward and punishment differently affect visuomotor adaptation. This study examined whether punishment and reward had distinct effects on explicit learning. When participants practised adapting to a large, abrupt visual rotation during reaching for a virtual visual target, visual feedback of the cursor was not provided. Only performance-based scalar reward or punishment feedback (money gained or lost) was used, thereby emphasising explicit processes during adaptation. The results revealed that punishment, compared with reward, induced faster adaptation and greater variability of reaching in the initial phase of adaptation. We interpret these findings as reflecting enhanced explicit learning, likely due to loss aversion.


2021 ◽  
pp. 316-337
Author(s):  
Denis Mareschal ◽  
Sam Blakeman

In this chapter we review the extent to which rapid one-short learning or fast-mapping exists in human learning. We find that it exists in both children and adults, but that it is almost always accompanied by slow consolidated learning in which new knowledge is integrated with existing knowledge-bases. Rapid learning is also present in a broad range of non-human species, particularly in the context of high reward values. We argue that reward prediction errors guide the extent to which fast or slow learning dominates, and present a Complementary Learning Systems neural network model (CTDL) of cortical/hippocampal learning that uses reward prediction errors to adjudicate between learning in the two systems. Developing human-like artificial intelligence will require implementing multiple learning and inference systems governed by a flexible control system with an equal capacity to that of human control systems.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Maëlle C. M. Gueguen ◽  
Alizée Lopez-Persem ◽  
Pablo Billeke ◽  
Jean-Philippe Lachaux ◽  
Sylvain Rheims ◽  
...  

AbstractWhether maximizing rewards and minimizing punishments rely on distinct brain systems remains debated, given inconsistent results coming from human neuroimaging and animal electrophysiology studies. Bridging the gap across techniques, we recorded intracerebral activity from twenty participants while they performed an instrumental learning task. We found that both reward and punishment prediction errors (PE), estimated from computational modeling of choice behavior, correlate positively with broadband gamma activity (BGA) in several brain regions. In all cases, BGA scaled positively with the outcome (reward or punishment versus nothing) and negatively with the expectation (predictability of reward or punishment). However, reward PE were better signaled in some regions (such as the ventromedial prefrontal and lateral orbitofrontal cortex), and punishment PE in other regions (such as the anterior insula and dorsolateral prefrontal cortex). These regions might therefore belong to brain systems that differentially contribute to the repetition of rewarded choices and the avoidance of punished choices.


Author(s):  
Christina E. Wierenga ◽  
Erin Reilly ◽  
Amanda Bischoff-Grethe ◽  
Walter H. Kaye ◽  
Gregory G. Brown

ABSTRACT Objectives: Anorexia nervosa (AN) is associated with altered sensitivity to reward and punishment. Few studies have investigated whether this results in aberrant learning. The ability to learn from rewarding and aversive experiences is essential for flexibly adapting to changing environments, yet individuals with AN tend to demonstrate cognitive inflexibility, difficulty set-shifting and altered decision-making. Deficient reinforcement learning may contribute to repeated engagement in maladaptive behavior. Methods: This study investigated learning in AN using a probabilistic associative learning task that separated learning of stimuli via reward from learning via punishment. Forty-two individuals with Diagnostic and Statistical Manual of Mental Disorders (DSM)-5 restricting-type AN were compared to 38 healthy controls (HCs). We applied computational models of reinforcement learning to assess group differences in learning, thought to be driven by violations in expectations, or prediction errors (PEs). Linear regression analyses examined whether learning parameters predicted BMI at discharge. Results: AN had lower learning rates than HC following both positive and negative PE (p < .02), and were less likely to exploit what they had learned. Negative PE on punishment trials predicted lower discharge BMI (p < .001), suggesting individuals with more negative expectancies about avoiding punishment had the poorest outcome. Conclusions: This is the first study to show lower rates of learning in AN following both positive and negative outcomes, with worse punishment learning predicting less weight gain. An inability to modify expectations about avoiding punishment might explain persistence of restricted eating despite negative consequences, and suggests that treatments that modify negative expectancy might be effective in reducing food avoidance in AN.


2020 ◽  
Author(s):  
Maëlle CM Gueguen ◽  
Pablo Billeke ◽  
Jean-Philippe Lachaux ◽  
Sylvain Rheims ◽  
Philippe Kahane ◽  
...  

SummaryWhether maximizing rewards and minimizing punishments rely on distinct brain systems remains debated, inconsistent results coming from human neuroimaging and animal electrophysiology studies. Bridging the gap across species and techniques, we recorded intracerebral activity from twenty patients with epilepsy while they performed an instrumental learning task. We found that both reward and punishment prediction errors (PE), estimated from computational modeling of choice behavior, correlated positively with broadband gamma activity (BGA) in several brain regions. In all cases, BGA increased with both outcome (reward or punishment versus nothing) and surprise (how unexpected the outcome is). However, some regions (such as the ventromedial prefrontal and lateral orbitofrontal cortex) were more sensitive to reward PE, whereas others (such as the anterior insula and dorsolateral prefrontal cortex) were more sensitive to punishment PE. Thus, opponent systems in the human brain might mediate the repetition of rewarded choices and the avoidance of punished choices.


2017 ◽  
Vol 145 ◽  
pp. 135-142 ◽  
Author(s):  
Sara Karimi ◽  
Azam Mesdaghinia ◽  
Zahra Farzinpour ◽  
Gholamali Hamidi ◽  
Abbas Haghparast

Sign in / Sign up

Export Citation Format

Share Document