scholarly journals Flexibility to contingency changes distinguishes habitual and goal-directed strategies in humans

2017 ◽  
Author(s):  
Julie J. Lee ◽  
Mehdi Keramati

AbstractDecision-making in the real world presents the challenge of requiring flexible yet prompt behavior, a balance that has been characterized in terms of a trade-off between a slower, prospective goal-directed model-based (MB) strategy and a fast, retrospective habitual model-free (MF) strategy. Theory predicts that flexibility to changes in both reward values and transition contingencies can determine the relative influence of the two systems in reinforcement learning, but few studies have manipulated the latter. Therefore, we developed a novel two-level contingency change task in which transition contingencies between states change every few trials; MB and MF control predict different responses following these contingency changes, allowing their relative influence to be inferred. Additionally, we manipulated the rate of contingency changes in order to determine whether contingency change volatility would play a role in shifting subjects between a MB and MF strategy. We found that human subjects employed a hybrid MB/MF strategy on the task, corroborating the parallel contribution of MB and MF systems in reinforcement learning. Further, subjects did not remain at one level of MB/MF behavior but rather displayed a shift towards more MB behavior over the first two blocks that was not attributable to the rate of contingency changes but rather to the extent of training. We demonstrate that flexibility to contingency changes can distinguish MB and MF strategies, with human subjects utilizing a hybrid strategy that shifts towards more MB behavior over blocks, consequently corresponding to a higher payoff.Author SummaryTo make good decisions, we must learn to associate actions with their true outcomes. Flexibility to changes in action/outcome relationships, therefore, is essential for optimal decision-making. For example, actions can lead to outcomes that change in value – one day, your favorite food is poorly made and thus less pleasant. Alternatively, changes can occur in terms of contingencies – ordering a dish of one kind and instead receiving another. How we respond to such changes is indicative of our decision-making strategy; habitual learners will continue to choose their favorite food even if the quality has gone down, whereas goal-directed learners will soon learn it is better to choose another dish. A popular paradigm probes the effect of value changes on decision making, but the effect of contingency changes is still unexplored. Therefore, we developed a novel task to study the latter. We find that humans used a mixed habitual/goal-directed strategy in which they became more goal-directed over the course of the task, and also earned more rewards with increasing goal-directed behavior. This shows that flexibility to contingency changes is adaptive for learning from rewards, and indicates that flexibility to contingency changes can reveal which decision-making strategy is used.

2020 ◽  
Author(s):  
Claire Rosalie Smid ◽  
Wouter Kool ◽  
Tobias U. Hauser ◽  
Nikolaus Steinbeis

Human decision-making is underpinned by distinct systems that differ in their flexibility and associated computational cost. A widely accepted dichotomy distinguishes a flexible but costly model-based system and a cheap but rigid model-free system. Optimal decision-making requires adaptive arbitration between these two systems depending on environmental demands. Previous developmental studies suggest that model-based decision-making only emerges in adolescence. Here, we show that when using a paradigm more conducive to model-based decision-making, children as young as 5 years show contributions from a model-based system to their behaviour. Furthermore, we find that between the ages 5 to 11, children demonstrate increasing metacontrol, which is the engagement of cost-benefit arbitration over decision-making systems on a trial-by-trial basis. Our results suggest that model-based decision-making emerges much earlier than previously believed, while adaptive arbitration between computationally cheap and costly systems continues to undergo developmental changes during childhood.


2011 ◽  
Vol 23 (4) ◽  
pp. 817-851 ◽  
Author(s):  
Rafal Bogacz ◽  
Tobias Larsen

This article seeks to integrate two sets of theories describing action selection in the basal ganglia: reinforcement learning theories describing learning which actions to select to maximize reward and decision-making theories proposing that the basal ganglia selects actions on the basis of sensory evidence accumulated in the cortex. In particular, we present a model that integrates the actor-critic model of reinforcement learning and a model assuming that the cortico-basal-ganglia circuit implements a statistically optimal decision-making procedure. The values of corico-striatal weights required for optimal decision making in our model differ from those provided by standard reinforcement learning models. Nevertheless, we show that an actor-critic model converges to the weights required for optimal decision making when biologically realistic limits on synaptic weights are introduced. We also describe the model's predictions concerning reaction times and neural responses during learning, and we discuss directions required for further integration of reinforcement learning and optimal decision-making theories.


Stat ◽  
2021 ◽  
Author(s):  
Hengrui Cai ◽  
Rui Song ◽  
Wenbin Lu

Kybernetes ◽  
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Guangsheng Zhang ◽  
Xiao Wang ◽  
Zhiqing Meng ◽  
Qirui Zhang ◽  
Kexin Wu

PurposeTo remedy the inherent defect in current research that focuses only on a single type of participants, this paper endeavors to look into the situation as an evolutionary game between a representative Logistics Service Integrator (LSI) and a representative Functional Logistics Service Provider (FLSP) in an environment with sudden crisis and tries to analyze how LSI supervises FLSP's operations and how FLSP responds in a recurrent pattern with different interruption probabilities.Design/methodology/approachRegarding the risks of supply chain interruption in emergencies, this paper develops a two-level model of single LSI and single FLSP, using Evolutionary Game theory to analyze their optimal decision-making, as well as their strategic behaviors on different risk levels regarding the interruption probability to achieve the optimal return with bounded rationality.FindingsThe results show that on a low-risk level, if LSI increases the degree of punishment, it will fail to enhance FLSP's operational activeness in the long term; when the risk rises to an intermediate level, a circular game occurs between LSI and FLSP; and on a high level of risk, FLSP will actively take actions, and its functional probability further impacts LSI's strategic choices. Finally, this paper analyzes the moderating impact of punishment intensity and social reputation loss on the evolutionary model in emergencies and provides relevant managerial implications.Originality/valueFirst, by taking both interruption probability and emergencies into consideration, this paper explores the interactions among the factors relevant to LSI's and FLSP's optimal decision-making. Second, this paper analyzes the optimal evolutionary game strategies of LSI and FLSP with different interruption probability and the range of their optimal strategies. Third, the findings of this paper provide valuable implications for relevant practices, such that the punishment intensity and social reputation loss determine the optimal strategies of LSI and FLSP, and thus it is an effective vehicle for LSSC system administrator to achieve the maximum efficiency of the system.


Sign in / Sign up

Export Citation Format

Share Document