average rewards
Recently Published Documents


TOTAL DOCUMENTS

18
(FIVE YEARS 3)

H-INDEX

8
(FIVE YEARS 0)

2020 ◽  
Author(s):  
Kevin Berlemont ◽  
Jean-Pierre Nadal

AbstractIn experiments on perceptual decision-making, individuals learn a categorization task through trial-and-error protocols. We explore the capacity of a decision-making attractor network to learn a categorization task through reward-based, Hebbian type, modifications of the weights incoming from the stimulus encoding layer. For the latter, we assume a standard layer of a large number of stimulus specific neurons. Within the general framework of Hebbian learning, authors have hypothesized that the learning rate is modulated by the reward at each trial. Surprisingly, we find that, when the coding layer has been optimized in view of the categorization task, such reward-modulated Hebbian learning (RMHL) fails to extract efficiently the category membership. In a previous work we showed that the attractor neural networks nonlinear dynamics accounts for behavioral confidence in sequences of decision trials. Taking advantage of these findings, we propose that learning is controlled by confidence, as computed from the neural activity of the decision-making attractor network. Here we show that this confidence-controlled, reward-based, Hebbian learning efficiently extracts categorical information from the optimized coding layer. The proposed learning rule is local, and, in contrast to RMHL, does not require to store the average rewards obtained on previous trials. In addition, we find that the confidence-controlled learning rule achieves near optimal performance.


2018 ◽  
Author(s):  
Hilary Don ◽  
A Ross Otto ◽  
Astin Cornwall ◽  
Tyler Davis ◽  
Darrell A. Worthy

Learning about reward and expected values of choice alternatives is critical for adaptive behavior. Although human choice is affected by the presentation frequency of reward-related alternatives, this is overlooked by some dominant models of value learning. For instance, the delta rule learns average rewards, whereas the decay rule learns cumulative rewards for each option. In a binary-outcome choice task, participants selected between pairs of options that had reward probabilities of .65 (A) versus .35 (B) or .75 (C) versus .25 (D). Crucially, during training there were twice as many AB trials as CD trials, therefore option A was associated with higher cumulative reward, while option C gave higher average reward. Participants then decided between novel combinations of options (e.g., AC). Participants preferred option A, a result predicted by the Decay model, but not the Delta model. This suggests that expected values are based more on total reward than average reward.


2011 ◽  
Vol 25 (4) ◽  
pp. 537-560 ◽  
Author(s):  
Eugene A. Feinberg ◽  
Fenghsu Yang

In this article we study optimal admission to an M/M/k/N queue with several customer types. The reward structure consists of revenues collected from admitted customers and holding costs, both of which depend on customer types. This article studies average rewards per unit time and describes the structures of stationary optimal, canonical, bias optimal, and Blackwell optimal policies. Similar to the case without holding costs, bias optimal and Blackwell optimal policies are unique, coincide, and have a trunk reservation form with the largest optimal control level for each customer type. Problems with one holding cost rate have been studied previously in the literature.


2009 ◽  
Vol 20 (8) ◽  
pp. 955-962 ◽  
Author(s):  
Dean Mobbs ◽  
Demis Hassabis ◽  
Ben Seymour ◽  
Jennifer L. Marchant ◽  
Nikolaus Weiskopf ◽  
...  

A pernicious paradox in human motivation is the occasional reduced performance associated with tasks and situations that involve larger-than-average rewards. Three broad explanations that might account for such performance decrements are attentional competition (distraction theories), inhibition by conscious processes (explicit-monitoring theories), and excessive drive and arousal (overmotivation theories). Here, we report incentive-dependent performance decrements in humans in a reward-pursuit task; subjects were less successful in capturing a more valuable reward in a computerized maze. Concurrent functional magnetic resonance imaging revealed that increased activity in ventral midbrain, a brain area associated with incentive motivation and basic reward responding, correlated with both reduced number of captures and increased number of near-misses associated with imminent high rewards. These data cast light on the neurobiological basis of choking under pressure and are consistent with overmotivation accounts.


Sign in / Sign up

Export Citation Format

Share Document