future reward
Recently Published Documents


TOTAL DOCUMENTS

32
(FIVE YEARS 8)

H-INDEX

9
(FIVE YEARS 1)

Neuron ◽  
2021 ◽  
Author(s):  
Xiang Mou ◽  
Abhishekh Pokhrel ◽  
Prakul Suresh ◽  
Daoyun Ji

Author(s):  
Oliver Härmson ◽  
Laura L. Grima ◽  
Marios C. Panayi ◽  
Masud Husain ◽  
Mark E. Walton

AbstractThe serotonin (5-HT) system, particularly the 5-HT2C receptor, has consistently been implicated in behavioural control. However, while some studies have focused on the role 5-HT2C receptors play in regulating motivation to work for reward, others have highlighted its importance in response restraint. To date, it is unclear how 5-HT transmission at this receptor regulates the balance of response invigoration and restraint in anticipation of future reward. In addition, it remains to be established how 5-HT2C receptors gate the influence of internal versus cue-driven processes over reward-guided actions. To elucidate these issues, we investigated the effects of administering the 5-HT2C receptor antagonist SB242084, both systemically and directly into the nucleus accumbens core (NAcC), in rats performing a Go/No-Go task for small or large rewards. The results were compared to the administration of d-amphetamine into the NAcC, which has previously been shown to promote behavioural activation. Systemic perturbation of 5-HT2C receptors—but crucially not intra-NAcC infusions—consistently boosted rats’ performance and instrumental vigour on Go trials when they were required to act. Concomitantly, systemic administration also reduced their ability to withhold responding for rewards on No-Go trials, particularly late in the holding period. Notably, these effects were often apparent only when the reward on offer was small. By contrast, inducing a hyperdopaminergic state in the NAcC with d-amphetamine strongly impaired response restraint on No-Go trials both early and late in the holding period, as well as speeding action initiation. Together, these findings suggest that 5-HT2C receptor transmission, outside the NAcC, shapes the vigour of ongoing goal-directed action as well as the likelihood of responding as a function of expected reward.


2021 ◽  
Author(s):  
Aenne Brielmann ◽  
Peter Dayan

People invest precious time and resources on sensory experiences such as watching movies or listening to music. Yet, we still have a poor understanding of how sensory experiences gain aesthetic value. We propose a model of aesthetic value that integrates existing theories with literature on conventional primary and secondary rewards such as food and money. We assume that the states of observers' sensory and cognitive systems adapt to process stimuli effectively in both the present and the future. These system states collectively comprise a probabilistic generative model of stimuli in the environment. Two interlinked components generate value: immediate sensory reward and the change in expected future reward. Immediate sensory reward is taken as the fluency with which a stimulus is processed, quantified by the likelihood of that stimulus given an observer's state. The change in expected future reward is taken as the change in fluency with which likely future stimuli will be processed. It is quantified by the change in the divergence between the observer's system state and the distribution of stimuli that the observer expects to see over the long term.Simulations show that a simple version of the model can account for empirical data on the effects of exposure, complexity, and symmetry on aesthetic value judgments. Taken together, our model melds processing fluency theories (immediate reward) and learning theories (change in expected future reward). Its application offers insight as to how the interplay of immediate processing fluency and learning gives rise to aesthetic value judgments.


2020 ◽  
Author(s):  
Oliver Haermson ◽  
Laura Lucy Grima ◽  
Marios Panayi ◽  
Masud Husain ◽  
Mark Edwin Walton

The serotonin (5-HT) system, particularly the 5-HT2C receptor, has consistently been implicated in behavioural control. However, while some studies have focused on the role 5HT2C receptors plays in regulating motivation to work for reward, others have highlighted its importance in response restraint. To date, it is unclear how 5-HT transmission at this receptor regulates the balance of response invigoration and restraint in anticipation of future reward. In addition, it remains to be established how 5HT2C receptors gate the influence of internal versus cue-driven processes over reward-guided actions. To elucidate these issues, we investigated the effects of administering the 5HT2C receptor antagonist SB242084, both systemically and directly into the nucleus accumbens core (NAcC), in rats performing a Go/No-Go task for small or large rewards. The results were compared to administration of d-amphetamine into the NAcC, which has previously been shown to promote behavioural activation. Systemic perturbation of 5HT2C receptors - but crucially not intra-NAcC infusions - consistently boosted rats' performance and instrumental vigour on Go trials when they were required to act. Concomitantly, systemic administration also reduced their ability to withhold responding for rewards on No-Go trials, particularly late in the holding period. Notably, these effects were often apparent only when the reward on offer was small. By contrast, inducing a hyperdopaminergic state in the NAcC with d-amphetamine strongly impaired response restraint on No-Go trials both early and late in the holding period, as well as speeding action initiation. Together, these findings suggest that 5HT2C receptor transmission, outside the NAcC, shapes the vigour of ongoing goal-directed action as well as the likelihood of responding as a function of expected reward.


2020 ◽  
Vol 12 (21) ◽  
pp. 8883
Author(s):  
Kun Jin ◽  
Wei Wang ◽  
Xuedong Hua ◽  
Wei Zhou

As the key element of urban transportation, taxis services significantly provide convenience and comfort for residents’ travel. However, the reality has not shown much efficiency. Previous researchers mainly aimed to optimize policies by order dispatch on ride-hailing services, which cannot be applied in cruising taxis services. This paper developed the reinforcement learning (RL) framework to optimize driving policies on cruising taxis services. Firstly, we formulated the drivers’ behaviours as the Markov decision process (MDP) progress, considering the influences after taking action in the long run. The RL framework using dynamic programming and data expansion was employed to calculate the state-action value function. Following the value function, drivers can determine the best choice and then quantify the expected future reward at a particular state. By utilizing historic orders data in Chengdu, we analysed the function value’s spatial distribution and demonstrated how the model could optimize the driving policies. Finally, the realistic simulation of the on-demand platform was built. Compared with other benchmark methods, the results verified that the new model performs better in increasing total revenue, answer rate and decreasing waiting time, with the relative percentages of 4.8%, 6.2% and −27.27% at most.


Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 5511-5511
Author(s):  
Kenneth H. Shain ◽  
Daniel Hart ◽  
Ariosto Siqueira Silva ◽  
Raghunandanreddy Alugubelli ◽  
Gabriel De Avila ◽  
...  

Over the last decade we have witnessed an explosion in the number of therapeutic options available to patients with multiple myeloma (MM). In spite of the marked improvements in patient outcomes paralleling these approvals, MM remains an incurable malignancy for the vast majority of patients following a course of therapeutic successes and failures. As such, there remains a dire need to develop new tools to improve the management of MM patients. A number of groups are leading efforts to combine big data and artificial intelligence to better inform patient care via precision medicine. At Moffitt, in collaboration with the M2Gen/ORIEN (Oncology Research Information Exchange Network), we have begun to accumulate big data in MM. Patients opt in to (consent) for collection of rich clinical data (demographics, staging, risk, complete disease course treatment data) and in the setting of bone marrow biopsy the allocation of CD138-selected cells for molecular analysis (whole exome sequencing (WES) and RNA sequencing as well as peripheral blood mononuclear cells for WES). To date, we have collected over 1000 samples for over 800 individual patients with plasma cell disorders. In the setting of oncology, the ultimate goal of model will be selection of ideal treatments. We expect that AI analysis may validate of patient response to treatments and enable cohort selection, as real patient cohorts can be selected from those predicted by the model. One approach is to utilize reinforcement learning (RL). In RL, the algorithm attempts to learn actions to optimize a type action a defined state and weight any tradeoffs for maximal reward. Our initial utilization of RL involved a relatively small cohort of 402 patients with treatment medication data. This encompassed 1692 lines of treatment with a mean of 4.21 lines of therapy per patient (Median of 4 lines per patient). This included 132 combinations of 22 myeloma therapeutics. The heterogeneity in treatment is highlighted by the fact that no pathways overlap after line 4. Each Q-value in Q-table is the current reward for an action in a state plus the discounted anticipated future reward for taking that action. Iteration helps you converge on the actual values for the future reward (can be model-free). The end result is a policy, P(s), that tells you what the ideal action is at state. There are a near infinite number of possible states, considering treatment history, age, GEP, cytogenetics, comorbidities, staging and others. We presume that action makes intuitive sense as medication (treatment) only and that reward should be some form of treatment response. We have begun the iterative process of trying different state and reward functions. Median imputation shows 5% improvement in response accuracy over listwise, but median imputation throws off practical accuracy in a binary reward case. While we found that the exercise has great potential. We found that there are possible improvements (multiple imputation). We will need to expand covariate analysis. Combinatorics need to be considered in machine learning in medium-sized data sets. Model-free machine learning is limited on medium-sized data. As such, combined resources and/or utilization of large networks such as ORIEN will be critical for the successful integration of RL or other AI tools in MM. We also learned that adding variables to the model doesn't necessarily increase accuracy. Future work will involve continued application of alternate state/reward functions. Loosen iQ-learning framework to allow for better covariate selection for state/reward functions. Improve imputation techniques to include more covariates and have more certainty in model accuracy. We may also refine accuracy metric to allow for prediction of bucketed response and temporal disease burden (M-spike vs. time). Updated data on a larger cohort will be presented at the annual meeting. Disclosures Shain: Adaptive Biotechnologies: Consultancy; Celgene: Membership on an entity's Board of Directors or advisory committees; Bristol-Myers Squibb: Membership on an entity's Board of Directors or advisory committees; Amgen: Membership on an entity's Board of Directors or advisory committees; Takeda: Membership on an entity's Board of Directors or advisory committees; Sanofi Genzyme: Membership on an entity's Board of Directors or advisory committees; AbbVie: Research Funding; Janssen: Membership on an entity's Board of Directors or advisory committees. Dai:M2Gen: Employment. Nishihori:Novartis: Research Funding; Karyopharm: Research Funding. Brayer:Janssen: Consultancy, Speakers Bureau; BMS: Consultancy, Speakers Bureau. Alsina:Bristol-Myers Squibb: Research Funding; Janssen: Speakers Bureau; Amgen: Speakers Bureau. Baz:Celgene: Membership on an entity's Board of Directors or advisory committees, Research Funding; Karyopharm: Membership on an entity's Board of Directors or advisory committees, Research Funding; AbbVie: Research Funding; Merck: Research Funding; Sanofi: Research Funding; Bristol-Myers Squibb: Research Funding. Dalton:MILLENNIUM PHARMACEUTICALS, INC.: Honoraria.


2019 ◽  
Vol 31 (4) ◽  
pp. 681-709 ◽  
Author(s):  
Zoran Tiganj ◽  
Samuel J. Gershman ◽  
Per B. Sederberg ◽  
Marc W. Howard

Natural learners must compute an estimate of future outcomes that follow from a stimulus in continuous time. Widely used reinforcement learning algorithms discretize continuous time and estimate either transition functions from one step to the next (model-based algorithms) or a scalar value of exponentially discounted future reward using the Bellman equation (model-free algorithms). An important drawback of model-based algorithms is that computational cost grows linearly with the amount of time to be simulated. An important drawback of model-free algorithms is the need to select a timescale required for exponential discounting. We present a computational mechanism, developed based on work in psychology and neuroscience, for computing a scale-invariant timeline of future outcomes. This mechanism efficiently computes an estimate of inputs as a function of future time on a logarithmically compressed scale and can be used to generate a scale-invariant power-law-discounted estimate of expected future reward. The representation of future time retains information about what will happen when. The entire timeline can be constructed in a single parallel operation that generates concrete behavioral and neural predictions. This computational mechanism could be incorporated into future reinforcement learning algorithms.


Games ◽  
2018 ◽  
Vol 9 (4) ◽  
pp. 102
Author(s):  
Fabrizio Adriani ◽  
Silvia Sonderegger

We formally explore the idea that punishment of norm-breakers may be a vehicle for the older generation to teach youngsters about social norms. We show that this signaling role provides sufficient incentives to sustain costly punishing behavior. People punish norm-breakers to pass information about past history to the younger generation. This creates a link between past, present, and future punishment. Information about the past is important for youngsters, because the past shapes the future. Reward-based mechanisms may also work and are welfare superior to punishment-based ones. However, reward-based mechanisms are fragile, since punishment is a more compelling signaling device (in a sense that we make precise).


2018 ◽  
Vol 84 (3) ◽  
pp. 706-712
Author(s):  
Helen Tibboel ◽  
Baptist Liefooghe
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document