scholarly journals Adaptive integration of habits into depth-limited planning defines a habitual-goal–directed spectrum

2016 ◽  
Vol 113 (45) ◽  
pp. 12868-12873 ◽  
Author(s):  
Mehdi Keramati ◽  
Peter Smittenaar ◽  
Raymond J. Dolan ◽  
Peter Dayan

Behavioral and neural evidence reveal a prospective goal-directed decision process that relies on mental simulation of the environment, and a retrospective habitual process that caches returns previously garnered from available choices. Artificial systems combine the two by simulating the environment up to some depth and then exploiting habitual values as proxies for consequences that may arise in the further future. Using a three-step task, we provide evidence that human subjects use such a normative plan-until-habit strategy, implying a spectrum of approaches that interpolates between habitual and goal-directed responding. We found that increasing time pressure led to shallower goal-directed planning, suggesting that a speed-accuracy tradeoff controls the depth of planning with deeper search leading to more accurate evaluation, at the cost of slower decision-making. We conclude that subjects integrate habit-based cached values directly into goal-directed evaluations in a normative manner.

eLife ◽  
2015 ◽  
Vol 4 ◽  
Author(s):  
Jan Drugowitsch ◽  
Gregory C DeAngelis ◽  
Dora E Angelaki ◽  
Alexandre Pouget

For decisions made under time pressure, effective decision making based on uncertain or ambiguous evidence requires efficient accumulation of evidence over time, as well as appropriately balancing speed and accuracy, known as the speed/accuracy trade-off. For simple unimodal stimuli, previous studies have shown that human subjects set their speed/accuracy trade-off to maximize reward rate. We extend this analysis to situations in which information is provided by multiple sensory modalities. Analyzing previously collected data (<xref ref-type="bibr" rid="bib4">Drugowitsch et al., 2014</xref>), we show that human subjects adjust their speed/accuracy trade-off to produce near-optimal reward rates. This trade-off can change rapidly across trials according to the sensory modalities involved, suggesting that it is represented by neural population codes rather than implemented by slow neuronal mechanisms such as gradual changes in synaptic weights. Furthermore, we show that deviations from the optimal speed/accuracy trade-off can be explained by assuming an incomplete gradient-based learning of these trade-offs.


Author(s):  
Gerard Derosiere ◽  
David Thura ◽  
Paul Cisek ◽  
Julie Duqué

Humans and other animals often need to balance the desire to gather sensory information (to make the best choice) with the urgency to act, facing a speed-accuracy tradeoff (SAT). Given the ubiquity of SAT across species, extensive research has been devoted to understanding the computational mechanisms allowing its regulation at different timescales, including from one context to another, and from one decision to another. However, animals must frequently change their SAT on even shorter timescales - i.e., over the course of an ongoing decision - and little is known about the mechanisms that allow such rapid adaptations. The present study aimed at addressing this issue. Human subjects performed a decision task with changing evidence. In this task, subjects received rewards for correct answers but incurred penalties for mistakes. An increase or a decrease in penalty occurring halfway through the trial promoted rapid SAT shifts, favoring speeded decisions either in the early or in the late stage of the trial. Importantly, these shifts were associated with stage-specific adjustments in the accuracy criterion exploited for committing to a choice. Those subjects who decreased the most their accuracy criterion at a given decision stage exhibited the highest gain in speed, but also the highest cost in terms of performance accuracy at that time. Altogether, the current findings offer a unique extension of previous work, by suggesting that dynamic changes in accuracy criterion allow the regulation of the SAT within the timescale of a single decision.


2018 ◽  
Author(s):  
Hector Palada ◽  
Rachel A Searston ◽  
Annabel Persson ◽  
Timothy Ballard ◽  
Matthew B Thompson

Evidence accumulation models have been used to describe the cognitive processes underlying performance in tasks involving two-choice decisions about unidimensional stimuli, such as motion or orientation. Given the multidimensionality of natural stimuli, however, we might expect qualitatively different patterns of evidence accumulation in more applied perceptual tasks. One domain that relies heavily on human decisions about complex natural stimuli is fingerprint discrimination. We know little about the ability of evidence accumulation models to account for the dynamic decision process of a fingerprint examiner resolving if two different prints belong to the same finger or not. Here, we apply a dynamic decision-making model — the linear ballistic accumulator (LBA) — to fingerprint discrimination decisions in order to gain insight into the cognitive processes underlying these complex perceptual judgments. Across three experiments, we show that the LBA provides an accurate description of the fingerprint discrimination decision process with manipulations in visual noise, speed-accuracy emphasis, and training. Our results demonstrate that the LBA is a promising model for furthering our understanding of applied decision-making with naturally varying visual stimuli.


Author(s):  
Hossein Esfandiari ◽  
MohammadTaghi HajiAghayi ◽  
Brendan Lucier ◽  
Michael Mitzenmacher

We consider online variations of the Pandora’s box problem (Weitzman 1979), a standard model for understanding issues related to the cost of acquiring information for decision-making. Our problem generalizes both the classic Pandora’s box problem and the prophet inequality framework. Boxes are presented online, each with a random value and cost drawn jointly from some known distribution. Pandora chooses online whether to open each box given its cost, and then chooses irrevocably whether to keep the revealed prize or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies to decide which boxes to open (without knowledge of the value inside)1. We consider variations where Pandora can collect multiple prizes subject to feasibility constraints, such as cardinality, matroid, or knapsack constraints. We also consider variations related to classic multi-armed bandit problems from reinforcement learning. Our results use a reduction-based framework where we separate the issues of the cost of acquiring information from the online decision process of which prizes to keep. Our work shows that in many scenarios, Pandora can achieve a good approximation to the best possible performance.


2017 ◽  
Vol 29 (8) ◽  
pp. 1433-1444 ◽  
Author(s):  
Tuğçe Tosun ◽  
Dilara Berkay ◽  
Alexander T. Sack ◽  
Yusuf Ö. Çakmak ◽  
Fuat Balcı

Decisions are made based on the integration of available evidence. The noise in evidence accumulation leads to a particular speed–accuracy tradeoff in decision-making, which can be modulated and optimized by adaptive decision threshold setting. Given the effect of pre-SMA activity on striatal excitability, we hypothesized that the inhibition of pre-SMA would lead to higher decision thresholds and an increased accuracy bias. We used offline continuous theta burst stimulation to assess the effect of transient inhibition of the right pre-SMA on the decision processes in a free-response two-alternative forced-choice task within the drift diffusion model framework. Participants became more cautious and set higher decision thresholds following right pre-SMA inhibition compared with inhibition of the control site (vertex). Increased decision thresholds were accompanied by an accuracy bias with no effects on post-error choice behavior. Participants also exhibited higher drift rates as a result of pre-SMA inhibition compared with the vertex inhibition. These results, in line with the striatal theory of speed–accuracy tradeoff, provide evidence for the functional role of pre-SMA activity in decision threshold modulation. Our results also suggest that pre-SMA might be a part of the brain network associated with the sensory evidence integration.


Author(s):  
Victor Mittelstädt ◽  
Jeff Miller ◽  
Hartmut Leuthold ◽  
Ian Grant Mackenzie ◽  
Rolf Ulrich

AbstractThe cognitive processes underlying the ability of human performers to trade speed for accuracy is often conceptualized within evidence accumulation models, but it is not yet clear whether and how these models can account for decision-making in the presence of various sources of conflicting information. In the present study, we provide evidence that speed-accuracy tradeoffs (SATs) can have opposing effects on performance across two different conflict tasks. Specifically, in a single preregistered experiment, the mean reaction time (RT) congruency effect in the Simon task increased, whereas the mean RT congruency effect in the Eriksen task decreased, when the focus was put on response speed versus accuracy. Critically, distributional RT analyses revealed distinct delta plot patterns across tasks, thus indicating that the unfolding of distractor-based response activation in time is sufficient to explain the opposing pattern of congruency effects. In addition, a recent evidence accumulation model with the notion of time-varying conflicting information was successfully fitted to the experimental data. These fits revealed task-specific time-courses of distractor-based activation and suggested that time pressure substantially decreases decision boundaries in addition to reducing the duration of non-decision processes and the rate of evidence accumulation. Overall, the present results suggest that time pressure can have multiple effects in decision-making under conflict, but that strategic adjustments of decision boundaries in conjunction with different time-courses of distractor-based activation can produce counteracting effects on task performance with different types of distracting sources of information.


PLoS ONE ◽  
2008 ◽  
Vol 3 (7) ◽  
pp. e2635 ◽  
Author(s):  
Jason Ivanoff ◽  
Philip Branning ◽  
René Marois

2020 ◽  
Author(s):  
Gregory Edward Cox ◽  
Gordon D. Logan ◽  
Jeffrey Schall ◽  
Thomas Palmeri

Evidence accumulation is a computational framework that accounts for behavior as well as the dynamics of individual neurons involved in decision making. Linking these two levels of description reveals a scaling paradox: How do choices and response times (RT) explained by models assuming single accumulators arise from a large ensemble of idiosyncratic accumulator neurons? We created a simulation model that makes decisions by aggregating across ensembles of accumulators, thereby instantiating the essential structure of neural ensembles that make decisions. Across different levels of simulated choice difficulty and speed-accuracy emphasis, choice proportions and RT distributions simulated by the ensembles are invariant to ensemble size and the accumulated evidence at RT is invariant across RT when the accumulators are at least moderately correlated in either baseline evidence or rates of accumulation and when RT is not governed by the most extreme accumulators. To explore the relationship between the low-level ensemble accumulators and high-level cognitive models, we fit simulated ensemble behavior with a standard LBA model. The standard LBA model generally recovered the core accumulator parameters (particularly drift rates and residual time) of individual ensemble accumulators with high accuracy, with variability parameters of the standard LBA modulating as a function of various ensemble parameters. Ensembles of accumulators also provide an alternative conception of speed-accuracy tradeoff without relying on varying thresholds of individual accumulators, instead by adjusting how ensembles of accumulators are aggregated or by how accumulators are correlated within ensembles. These results clarify relationships between neural and computational accounts of decision making.


2020 ◽  
Author(s):  
Kobe Desender ◽  
Luc Vermeylen ◽  
Tom Verguts

AbstractHumans differ in their capability to judge the accuracy of their own choices via confidence judgments. Signal detection theory has been used to quantify the extent to which confidence tracks accuracy via M-ratio, often referred to as metacognitive efficiency. This measure, however, is static in that it does not consider the dynamics of decision making. This could be problematic because humans may shift their level of response caution to alter the tradeoff between speed and accuracy. Such shifts could induce unaccounted-for sources of variation in the assessment of metacognition. Instead, evidence accumulation frameworks consider decision making, including the computation of confidence, as a dynamic process unfolding over time. We draw on evidence accumulation frameworks to examine the influence of response caution on metacognition. Simulation results demonstrate that response caution has an influence on M-ratio. We then tested and confirmed that this was also the case in human participants who were explicitly instructed to either focus on speed or accuracy. We next demonstrated that this association between M-ratio and response caution was also present in an experiment without any reference towards speed. The latter finding was replicated in an independent dataset. In contrast, when data were analyzed with a novel dynamic measure of metacognition, which we refer to as v-ratio, in all of the three studies there was no effect of speed-accuracy tradeoff. These findings have important implications for research on metacognition, such as the question about domain-generality, individual differences in metacognition and its neural correlates.


2017 ◽  
Author(s):  
Falk Lieder ◽  
Tom Griffiths ◽  
Quentin J.M. Huys ◽  
Noah D. Goodman

People’s estimates of numerical quantities are systematically biased towards their initial guess. This anchoring bias is usually interpreted as sign of human irrationality, but it has recently been suggested that the anchoring bias instead results from people’s rational use of their finite time and limited cognitive resources. If this were true, then adjustment should decrease with the relative cost of time. To test this hypothesis, we designed a new numerical estimation paradigm that controls people’s knowledge and varies the cost of time and error independently while allowing people to invest as much or as little time and effort into refining their estimate as they wish. Two experimentsconfirmed the prediction that adjustment decreases with time cost but increases with error cost regardless of whether the anchor was self-generated or provided. These results support the hypothesis that people rationally adapt their number of adjustments to achieve a near-optimal speed-accuracy tradeoff. This suggests that the anchoring bias might be a signature of the rational use of finite time and limited cognitive resources rather than a sign of human irrationality.


Sign in / Sign up

Export Citation Format

Share Document