scholarly journals Parallel reward and punishment control in humans and robots: Safe reinforcement learning using the MaxPain algorithm

Author(s):  
Stefan Elfwing ◽  
Ben Seymour
2019 ◽  
Author(s):  
Jennifer R Sadler ◽  
Grace Elisabeth Shearrer ◽  
Nichollette Acosta ◽  
Kyle Stanley Burger

BACKGROUND: Dietary restraint represents an individual’s intent to limit their food intake and has been associated with impaired passive food reinforcement learning. However, the impact of dietary restraint on an active, response dependent learning is poorly understood. In this study, we tested the relationship between dietary restraint and food reinforcement learning using an active, instrumental conditioning task. METHODS: A sample of ninety adults completed a response-dependent instrumental conditioning task with reward and punishment using sweet and bitter tastes. Brain response via functional MRI was measured during the task. Participants also completed anthropometric measures, reward/motivation related questionnaires, and a working memory task. Dietary restraint was assessed via the Dutch Restrained Eating Scale. RESULTS: Two groups were selected from the sample: high restraint (n=29, score >2.5) and low restraint (n=30; score <1.85). High restraint was associated with significantly higher BMI (p=0.003) and lower N-back accuracy (p=0.045). The high restraint group also was marginally better at the instrumental conditioning task (p=0.066, r=0.37). High restraint was also associated with significantly greater brain response in the intracalcarine cortex (MNI: 15, -69, 12; k=35, pfwe< 0.05) to bitter taste, compared to neutral taste.CONCLUSIONS: High restraint was associated with improved performance on an instrumental task testing how individuals learn from reward and punishment. This may be mediated by greater brain response in the primary visual cortex, which has been associated with mental representation. Results suggest that dietary restraint does not impair response-dependent reinforcement learning.


2018 ◽  
Vol 83 (9) ◽  
pp. S157
Author(s):  
Christina Wierenga ◽  
Amanda Bischoff-Grethe ◽  
Emily Romero ◽  
Danika Peterson ◽  
Tiffany Brown ◽  
...  

2019 ◽  
Vol 7 (6) ◽  
pp. 1372-1388
Author(s):  
Miranda L. Beltzer ◽  
Stephen Adams ◽  
Peter A. Beling ◽  
Bethany A. Teachman

Adaptive social behavior requires learning probabilities of social reward and punishment and updating these probabilities when they change. Given prior research on aberrant reinforcement learning in affective disorders, this study examines how social anxiety affects probabilistic social reinforcement learning and dynamic updating of learned probabilities in a volatile environment. Two hundred and twenty-two online participants completed questionnaires and a computerized ball-catching game with changing probabilities of reward and punishment. Dynamic learning rates were estimated to assess the relative importance ascribed to new information in response to volatility. Mixed-effects regression was used to analyze throw patterns as a function of social anxiety symptoms. Higher social anxiety predicted fewer throws to the previously punishing avatar and different learning rates after certain role changes, suggesting that social anxiety may be characterized by difficulty updating learned social probabilities. Socially anxious individuals may miss the chance to learn that a once-punishing situation no longer poses a threat.


2018 ◽  
Author(s):  
Mark K Ho ◽  
Fiery Andrews Cushman ◽  
Michael L. Littman ◽  
Joseph L. Austerweil

Carrots and sticks motivate behavior, and people can teach new behaviors to other organisms, such as children or non-human animals, by tapping into their reward learning mechanisms. But how people teach with reward and punishment depends on their expectations about the learner. We examine how people teach using reward and punishment by contrasting two hypotheses. The first is evaluative feedback as reinforcement, where rewards and punishments are used to shape learner behavior through reinforcement learning mechanisms. The second is evaluative feedback as communication, where rewards and punishments are used to signal target behavior to a learning agent reasoning about a teacher’s pedagogical goals. We present formalizations of learning from these two teaching strategies based on computational frameworks for reinforcement learning. Our analysis based on these models motivates a simple interactive teaching paradigm that distinguishes between the two teaching hypotheses. Across three sets of experiments, we find that people are strongly biased to use evaluative feedback communicatively rather than as reinforcement.


Author(s):  
Yoshihisa Fujita ◽  
Sho Yagishita ◽  
Haruo Kasai ◽  
Shin Ishii

AbstractGeneralization enables applying past experience to similar but nonidentical situations. Therefore, it may be essential for adaptive behaviors. Recent neurobiological observation indicates that the striatal dopamine system achieves generalization and subsequent discrimination by updating corticostriatal synaptic connections in differential response to reward and punishment. To analyze how the computational characteristics in this system affect behaviors, we proposed a novel reinforcement learning model with multilayer neural networks in which the synaptic weights of only the last layer are updated according to the prediction error. We set fixed connections between the input and hidden layers so as to maintain the similarity of inputs in the hidden-layer representation. This network enabled fast generalization, and thereby facilitated safe and efficient exploration in reinforcement learning tasks, compared to algorithms which do not show generalization. However, disturbance in the network induced aberrant valuation. In conclusion, the unique computation suggested by corticostriatal plasticity has the advantage of providing safe and quick adaptations to unknown environments, but on the other hand has the potential defect which can induce maladaptive behaviors like delusional symptoms of psychiatric disorders.Author summaryThe brain has an ability to generalize knowledge obtained from reward- and punishment-related learning. Animals that have been trained to associate a stimulus with subsequent reward or punishment respond not only to the same stimulus but also to resembling stimuli. How does generalization affect behaviors in situations where individuals are required to adapt to unknown environments? It may enable efficient learning and promote adaptive behaviors, but inappropriate generalization may disrupt behaviors by associating reward or punishment with irrelevant stimuli. The effect of generalization here should depend on computational characteristics of underlying biological basis in the brain, namely, the striatal dopamine system. In this research, we made a novel computational model based on the characteristics of the striatal dopamine system. Our model enabled fast generalization and showed its advantage of providing safe and quick adaptation to unknown environments. By contrast, disturbance of our model induced abnormal behaviors. The results suggested the advantage and the shortcoming of generalization by the striatal dopamine system.


2013 ◽  
Vol 150 (2-3) ◽  
pp. 592-593 ◽  
Author(s):  
Gagan Fervaha ◽  
Ofer Agid ◽  
George Foussias ◽  
Gary Remington

2020 ◽  
Vol 87 (9) ◽  
pp. S141
Author(s):  
Christina Wierenga ◽  
Erin Reilly ◽  
Amanda Bischoff-Grethe ◽  
Walter Kaye ◽  
Gregory Brown

Author(s):  
Christina E. Wierenga ◽  
Erin Reilly ◽  
Amanda Bischoff-Grethe ◽  
Walter H. Kaye ◽  
Gregory G. Brown

ABSTRACT Objectives: Anorexia nervosa (AN) is associated with altered sensitivity to reward and punishment. Few studies have investigated whether this results in aberrant learning. The ability to learn from rewarding and aversive experiences is essential for flexibly adapting to changing environments, yet individuals with AN tend to demonstrate cognitive inflexibility, difficulty set-shifting and altered decision-making. Deficient reinforcement learning may contribute to repeated engagement in maladaptive behavior. Methods: This study investigated learning in AN using a probabilistic associative learning task that separated learning of stimuli via reward from learning via punishment. Forty-two individuals with Diagnostic and Statistical Manual of Mental Disorders (DSM)-5 restricting-type AN were compared to 38 healthy controls (HCs). We applied computational models of reinforcement learning to assess group differences in learning, thought to be driven by violations in expectations, or prediction errors (PEs). Linear regression analyses examined whether learning parameters predicted BMI at discharge. Results: AN had lower learning rates than HC following both positive and negative PE (p < .02), and were less likely to exploit what they had learned. Negative PE on punishment trials predicted lower discharge BMI (p < .001), suggesting individuals with more negative expectancies about avoiding punishment had the poorest outcome. Conclusions: This is the first study to show lower rates of learning in AN following both positive and negative outcomes, with worse punishment learning predicting less weight gain. An inability to modify expectations about avoiding punishment might explain persistence of restricted eating despite negative consequences, and suggests that treatments that modify negative expectancy might be effective in reducing food avoidance in AN.


Sign in / Sign up

Export Citation Format

Share Document