scholarly journals Competing cognitive pressures on human exploration in the absence of trade-off with exploitation

2022 ◽  
Author(s):  
Clemence Almeras ◽  
Valerian Chambon ◽  
Valentin Wyart

Exploring novel environments through sequential sampling is essential for efficient decision-making under uncertainty. In the laboratory, human exploration has been studied in situations where exploration is traded against reward maximisation. By design, these ‘explore-exploit’ dilemmas confound the behavioural characteristics of exploration with those of the trade-off itself. Here we designed a sequential sampling task where exploration can be studied and compared in the presence and absence of trade-off with exploitation. Detailed model-based analyses of choice behaviour revealed specific exploration patterns arising in situations where information seeking is not traded against reward seeking. Human choices are directed toward the most uncertain option available, but only after an initial sampling phase consisting of choice streaks from each novel option. These findings outline competing cognitive pressures on information seeking: the repeated sampling of the current option (for hypothesis testing), and the directed sampling of the most uncertain option available (for structure mapping).

eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Valentina Vellani ◽  
Lianne P de Vries ◽  
Anne Gaule ◽  
Tali Sharot

Humans are motivated to seek information from their environment. How the brain motivates this behavior is unknown. One speculation is that the brain employs neuromodulatory systems implicated in primary reward-seeking, in particular dopamine, to instruct information-seeking. However, there has been no causal test for the role of dopamine in information-seeking. Here, we show that administration of a drug that enhances dopamine function (dihydroxy-L-phenylalanine; L-DOPA) reduces the impact of valence on information-seeking. Specifically, while participants under Placebo sought more information about potential gains than losses, under L-DOPA this difference was not observed. The results provide new insight into the neurobiology of information-seeking and generates the prediction that abnormal dopaminergic function (such as in Parkinson’s disease) will result in valence-dependent changes to information-seeking.


2019 ◽  
Author(s):  
Hashem Sadeghiyeh ◽  
Siyu Wang ◽  
Maxwell R. Alberhasky ◽  
Hannah M. Kyllo ◽  
Amitai Shenhav ◽  
...  

AbstractThe explore-exploit dilemma describes the trade off that occurs any time we must choose between exploring unknown options and exploiting options we know well. Implicit in this trade off is how we value future rewards — exploiting is usually better in the short term, but in the longer term the benefits of exploration can be huge. Thus, in theory there should be a tight connection between how much people value future rewards, i.e. how much they discount future rewards relative to immediate rewards, and how likely they are to explore, with less ‘temporal discounting’ associated with more exploration. By measuring individual differences in temporal discounting and correlating them with explore-exploit behavior, we tested whether this theoretical prediction holds in practice. We used the 27-item Delay-Discounting Questionnaire to estimate temporal discounting and the Horizon Task to quantify two strategies of explore-exploit behavior: directed exploration, where information drives exploration by choice, and random exploration, where behavioral variability drives exploration by chance. We find a clear correlation between temporal discounting and directed exploration, with more temporal discounting leading to less directed exploration. Conversely, we find no relationship between temporal discounting and random exploration. Unexpectedly, we find that the relationship with directed exploration appears to be driven by a correlation between temporal discounting and uncertainty seeking at short time horizons, rather than information seeking at long horizons. Taken together our results suggest a nuanced relationship between temporal discounting and explore-exploit behavior that may be mediated by multiple factors.


2021 ◽  
Author(s):  
Vasilios Pallikaras ◽  
Francis Carter ◽  
David Natanael Velázquez Martínez ◽  
Andreas Arvanitogiannis ◽  
Peter Shizgal

AbstractBackgroundOptogenetic experiments reveal functional roles of specific neurons. However, such inferences have been restricted by widespread adoption of a fixed set of stimulation parameters. Broader exploration of the parameter space can deepen insight into the mapping between selective neural activity and behavior. In this way, characteristics of the activated neurons, such as temporal integration, can be inferred.ObjectiveTo determine whether an equal-energy principle accounts for the interaction of pulse duration and optical power in optogenetic excitation.MethodsSix male TH::Cre rats worked for optogenetic (ChannelRhodopsin-2) stimulation of Ventral Tegmental Area dopamine neurons. We used a within-subject design to describe the trade-off between pulse duration and optical power in determining reward seeking. Parameters were customized for each subject on the basis of behavioral effectiveness.ResultsWithin a useful range of powers (~12.6-31.6 mW) the product of optical power and pulse duration required to produce a given level of reward seeking was roughly constant. Such reciprocity is consistent with Bloch’s law, which posits an equal-energy principle of temporal summation over short durations in human vision. The trade-off between pulse duration and power broke down at higher powers.ConclusionsOptical power can be substituted for pulse duration to scale the region of neuronal excitation in behavioral optogenetic experiments. Power and duration can be adjusted reciprocally for brief durations and lower powers. The findings demonstrate the utility of within-subject and trade-off designs in optogenetics and of parameter adjustment based on functional endpoints instead of physical properties of the stimulation.HighlightsWe provide behaviorally derived intensity-duration curves for ChannelRhodopsin-2.Duration trades off almost perfectly with power within useful ranges.This trade-off breaks down at high optical powers.Pulse duration and optical power scale the area of neuronal excitation equivalently.Behaviorally derived trade-offs can reveal optogenetic excitation mechanisms.


2018 ◽  
Vol 35 (2) ◽  
pp. 529-564 ◽  
Author(s):  
Jinglai Wu ◽  
Zhen Luo ◽  
Nong Zhang ◽  
Wei Gao

Purpose This paper aims to study the sampling methods (or design of experiments) which have a large influence on the performance of the surrogate model. To improve the adaptability of modelling, a new sequential sampling method termed as sequential Chebyshev sampling method (SCSM) is proposed in this study. Design/methodology/approach The high-order polynomials are used to construct the global surrogated model, which retains the advantages of the traditional low-order polynomial models while overcoming their disadvantage in accuracy. First, the zeros of Chebyshev polynomials with the highest allowable order will be used as sampling candidates to improve the stability and accuracy of the high-order polynomial model. In the second step, some initial sampling points will be selected from the candidates by using a coordinate alternation algorithm, which keeps the initial sampling set uniformly distributed. Third, a fast sequential sampling scheme based on the space-filling principle is developed to collect more samples from the candidates, and the order of polynomial model is also updated in this procedure. The final surrogate model will be determined as the polynomial that has the largest adjusted R-square after the sequential sampling is terminated. Findings The SCSM has better performance in efficiency, accuracy and stability compared with several popular sequential sampling methods, e.g. LOLA-Voronoi algorithm and global Monte Carlo method from the SED toolbox, and the Halton sequence. Originality/value The SCSM has good performance in building the high-order surrogate model, including the high stability and accuracy, which may save a large amount of cost in solving complicated engineering design or optimisation problems.


2019 ◽  
Author(s):  
Danielle Navarro ◽  
Ben R Newell ◽  
Christin Schulze

How do people solve the explore–exploit trade-off in a changing environment? In this paper we present experimental evidence from an “observe or bet” task, in which people have to determine when to engage in information-seeking behavior and when to switch to reward-taking actions. In particular we focus on the comparison between people’s behavior in a changing environment and their behavior in an unchanging one. Our experimental work is motivated by rational analysis of the problem that makes strong predictions about information search and reward seeking in static and changeable environments. Our results show a striking agreement between human behavior and the optimal policy, but also highlight a number of systematic differences. In particular, we find that while people often employ suboptimal strategies the first time they encounter the learning problem, most people are able to approximate the correct strategy after minimal experience. In order to describe both the manner in which people’s choices are similar to but slightly different from an optimal standard, we introduce four process models for the observe or bet task and evaluate them as potential theories of human behavior.


2021 ◽  
Author(s):  
Aaron L Wong ◽  
Audrey L Green ◽  
Mitchell W Isaacs

When faced with multiple potential movement options, individuals either reach directly to one of the options, or initiate a reach intermediate between the options. It remains unclear why people generate these two types of behaviors. Using the go-before-you-know task (commonly used to study behavior under choice uncertainty), we examined two key questions. First, do these two types of responses reflect distinct movement strategies, or are they simply examples of a more general response to choice uncertainty? If the former, the relative desirability (i.e., weighing the likelihood of successfully hitting the target versus the attainable reward) of the two target options might be computed differently for direct versus intermediate reaches. We showed that indeed, when exogenous reward and success likelihood (i.e., endogenous reward) differ between the two options, direct reaches were more strongly biased by likelihood whereas intermediate movements were more strongly biased by reward. Second, what drives individual differences in how people respond under uncertainty? We found that risk/reward-seeking individuals generated a larger proportion of intermediate reaches and were more sensitive to trial-to-trial changes in reward, suggesting these movements reflect a strategy to maximize reward. In contrast, risk-adverse individuals tended to generate more direct reaches in an attempt to maximize success. Together, these findings suggest that when faced with choice uncertainty, individuals adopt movement strategies consistent with their risk/reward-seeking tendency, preferentially biasing behavior toward exogenous rewards or endogenous success and consequently modulating the relative desirability of the available options.


2019 ◽  
Vol 42 ◽  
Author(s):  
Maya Zhe Wang ◽  
Benjamin Y. Hayden

AbstractInformation seeking, especially when motivated by strategic learning and intrinsic curiosity, could render the new mechanism “incentive hope” proposed by Anselme & Güntürkün sufficient, but not necessary to explain how reward uncertainty promotes reward seeking and consumption. Naturalistic and foraging-like tasks can help parse motivational processes that bridge learning and foraging behaviors and identify their neural underpinnings.


2017 ◽  
Vol 44 (5) ◽  
pp. 1052-1067 ◽  
Author(s):  
Chen Wang ◽  
Yanliu Huang

Abstract This research examines how incidentally induced consumer curiosity influences subsequent indulgent decisions. Prior research has primarily focused on the effect of curiosity on information seeking in the present domain. The current research goes further to propose that the curiosity effect can spill over to prompt consumers to prefer indulgent options in other, unrelated domains (e.g., food, money). This situation is likely to occur because curiosity motivates individuals to seek the missing information as the specific information reward in the current domain. Such desire to obtain the information reward primes a reward-seeking goal, which in turn leads to increased preferences for indulgent options in subsequent, unrelated domains. Furthermore, the impact of curiosity on indulgent options possesses goal-priming properties as identified by the literature. That is, the effect should (1) persist after a time delay, and (2) diminish when the reward-seeking goal is satiated by the obtainment of a reward before the indulgent task. We conduct a series of studies to provide support for our hypotheses. This research contributes to both curiosity and indulgence decision literature and offers important practical implications.


2018 ◽  
Author(s):  
Charles Findling ◽  
Vasilisa Skvortsova ◽  
Rémi Dromnelle ◽  
Stefano Palminteri ◽  
Valentin Wyart

AbstractWhen learning the value of actions in volatile environments, humans often make seemingly irrational decisions which fail to maximize expected value. We reasoned that these ‘non-greedy’ decisions, instead of reflecting information seeking during choice, may be caused by computational noise in the learning of action values. Here, using reinforcement learning (RL) models of behavior and multimodal neurophysiological data, we show that the majority of non-greedy decisions stems from this learning noise. The trial-to-trial variability of sequential learning steps and their impact on behavior could be predicted both by BOLD responses to obtained rewards in the dorsal anterior cingulate cortex (dACC) and by phasic pupillary dilation – suggestive of neuromodulatory fluctuations driven by the locus coeruleus-norepinephrine (LC-NE) system. Together, these findings indicate that most of behavioral variability, rather than reflecting human exploration, is due to the limited computational precision of reward-guided learning.


2019 ◽  
Vol 42 ◽  
Author(s):  
Davood G. Gozli ◽  
Ci Jun Gao

AbstractThe concepts want, hope, and exploration cannot be organized in relation to a single type of motive (e.g., motive for food). They require, in addition, the motive for acquiring and maintaining a stable scheme that enables reward-directed activity. Facing unpredictability, the animal has to seek not only reward, but also a new equilibrated state within which reward seeking is possible.


Sign in / Sign up

Export Citation Format

Share Document