scholarly journals Experience resetting in reinforcement learning facilitates exploration–exploitation transitions during a behavioral task for primates

2021 ◽  
Author(s):  
Kazuhiro Sakamoto ◽  
Hidetake Okuzaki ◽  
Akinori Sato ◽  
Hajime Mushiake

AbstractThe exploration–exploitation trade-off is a fundamental problem in re-inforcement learning. To study the neural mechanisms involved in this problem, a target search task in which exploration and exploitation phases appear alternately is useful. Monkeys well trained in this task clearly understand that they have entered the exploratory phase and quickly acquire new experiences by resetting their previous experiences. In this study, we used a simple model to show that experience resetting in the exploratory phase improves performance rather than decreasing the greediness of action selection, and we then present a neural network-type model enabling experience resetting.

Author(s):  
Hua Zhang ◽  
Youmin Xi

In previous studies on coordinating exploration-exploitation activities, much attention has been paid on network structures while the roles played by actors’ strategic behavior have been largely ignored. In this paper, the authors extend March’s simulation model on parallel problem solving by adding structurally equivalent imitation. In this way, one can examine how the interaction of network structure with agent behavior affects the knowledge process and finally influence group performance. This simulation experiment suggests that under the condition of regular network, the classical trade-off between exploration and exploitation will appear in the case of the preferentially attached network when agents adopt structure equivalence imitation. The whole organization implicitly would be divided into independent sub-groups that converge on different performance level and lead the organization to a lower performance level. The authors also explored the performance in the mixed organization and the management implication.


2017 ◽  
Author(s):  
Tommy C. Blanchard ◽  
Samuel J. Gershman

AbstractBalancing exploration and exploitation is a fundamental problem in reinforcement learning. Previous neuroimaging studies of the exploration-exploitation dilemma could not completely disentangle these two processes, making it difficult to unambiguously identify their neural signatures. We overcome this problem using a task in which subjects can either observe (pure exploration) or bet (pure exploitation). Insula and dorsal anterior cingulate cortex showed significantly greater activity on observe trials compared to bet trials, suggesting that these regions play a role in driving exploration. A model-based analysis of task performance suggested that subjects chose to observe until a critical evidence threshold was reached. We observed a neural signature of this evidence accumulation process in ventromedial prefrontal cortex. These findings support theories positing an important role for anterior cingulate cortex in exploration, while also providing a new perspective on the roles of insula and ventromedial prefrontal cortex.Significance StatementSitting down at a familiar restaurant, you may choose to order an old favorite or sample a new dish. In reinforcement learning theory, this is known as the exploration-exploitation dilemma. The optimal solution is known to be intractable; therefore, humans must use heuristic strategies. Behavioral studies have revealed several candidate strategies, but identifying the neural mechanisms underlying these strategies is complicated due to the fact that exploration and exploitation are not perfectly dissociable in standard tasks. Using an “observe or bet” task, we identify for the first time pure neural correlates of exploration and exploitation in the human brain.


Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 977
Author(s):  
Mohammad Majid al-Rifaie

The trade off between exploration and exploitation is one of the key challenges in evolutionary and swarm optimisers which are led by guided and stochastic search. This work investigates the exploration and exploitation balance in a minimalist swarm optimiser in order to offer insights into the population’s behaviour. The minimalist and vector-stripped nature of the algorithm—dispersive flies optimisation or DFO—reduces the challenges of understanding particles’ oscillation around constantly changing centres, their influence on one another, and their trajectory. The aim is to examine the population’s dimensional behaviour in each iteration and each defined exploration-exploitation zone, and to subsequently offer improvements to the working of the optimiser. The derived variants, titled unified DFO or uDFO, are successfully applied to an extensive set of test functions, as well as high-dimensional tomographic reconstruction, which is an important inverse problem in medical and industrial imaging.


2019 ◽  
Vol 25 (7) ◽  
pp. 1515-1536 ◽  
Author(s):  
Pierluigi Rippa ◽  
Cristina Ponsiglione ◽  
Anca Bocanet ◽  
Guido Capaldo ◽  
Giuseppe Zollo

Purpose The purpose of this paper is to contribute to the debate on exploration–exploitation trade-off in the context of new ventures creation, where, particularly at the empirical level, there is a limited understanding of whether and how this trade-off is achieved and how start-ups performances are affected by the way in which they face the exploration–exploitation dilemma. Design/methodology/approach A qualitative case study approach has been adopted as a methodology to conduct the research. Six Italian innovative start-ups were selected and analyzed through in-depth interviews with founders and data collection to understand whether and how start-ups adopt exploration and exploitation solutions to face critical events in their business lives. Findings The most evident result of this study is that start-ups adopt more frequently a temporal separation of exploration and exploitation activities as the preferred mode for balancing learning and innovation tension. They do not seem to exhibit a defined or a common path in the way they realize the temporal separation between exploration and exploitation. Instead, they mostly oscillate. The ambidextrous solution is selected in only a few cases and not consecutively. The pre-entry knowledge profile seems to influence the choice of start-ups at the beginning of their lives. Practical implications This research has implications for the whole start-up’s ecosystem, comprising incubators/accelerators, advisors, intermediaries, venture capitalists, new venture founders and policymakers. For example, by knowing the typology of knowledge and competence gaps start-ups usually aim to fill when they face particular events, intermediaries (such as incubators) could better plan initiatives and strategies supporting new ventures in the process of growth and stabilization. Furthermore, the venture capitalists can benefit from this research, by planning specific interventions for each critical event based on specific resources and competencies gaps and guiding for more promising start-ups. Originality/value This paper presents a novel application of entrepreneurial learning approach in the context of new venture creation. To reach this aim, a classification of exploration/exploitation solutions has been developed.


2018 ◽  
Author(s):  
Ke Sang ◽  
Peter Martin Todd ◽  
Robert Goldstone ◽  
Thomas T. Hills

How, and how well, do people switch between exploration and exploitation to search for and accumulate resources? We study the decision processes underlying such exploration/exploitation tradeoffs by using a novel card selection task. With experience, participants learn to switch appropriately between exploration and exploitation and approach optimal performance. We model participants’ behavior on this task with random, threshold, and sampling strategies, and find that a linear decreasing threshold rule best fits participants’ results. Further evidence that participants use decreasing threshold-based strategies comes from reaction time differences between exploration and exploitation; however, participants themselves report non-decreasing thresholds. Decreasing threshold strategies that “front-load” exploration and switch quickly to exploitation are particularly effective in resource accumulation tasks, in contrast to optimal stopping problems like the Secretary Problem requiring longer exploration.


2021 ◽  
Author(s):  
Jeffrey Cockburn ◽  
Vincent Man ◽  
William A Cunningham ◽  
John P O'Doherty

Recent evidence suggests that both novelty and uncertainty act as potent features guiding exploration. However, these variables are often conflated with each other experimentally, and an understanding of how these attributes interact to regulate the balance between exploration and exploitation has proved elusive. Using a novel task designed to decouple stimulus novelty and estimation uncertainty, we identify separable behavioral and neural mechanisms by which exploration is colored. We show that uncertainty was avoided except when the information gained through exploration could be reliably exploited in the future. In contrast, and contrary to existing theory, novel options grew increasingly attractive relative to familiar counterparts irrespective of the opportunity to leverage their consequences and despite the uncertainty inherent to novel options. These findings led us to develop a formal computational framework in which uncertainty directed choice adapts to the prospective utility of exploration, while novel stimuli persistently draw favor as a result of inflated reward expectations biasing an exploitative strategy. Crucially, novelty is proposed to actively modulate uncertainty processing, effectively blunting the influence of uncertainty in shaping the subjective utility ascribed to novel stimuli. Both behavioral data and fMRI activity sampled from the ventromedial prefrontal cortex, frontopolar cortex and ventral striatum validate this model, thereby establishing a computational account that can not only explain behavior but also shed light on the functional contribution of these key brain regions to the exploration/exploitation trade-off. Our results point to multiple strategies and neural substrates charged with balancing the explore/exploit dilemma, with each targeting distinct aspects of the decision problem to foster a manageable decomposition of an otherwise intractable task.


1986 ◽  
Vol 55 (4) ◽  
pp. 696-714 ◽  
Author(s):  
J. van der Steen ◽  
I. S. Russell ◽  
G. O. James

We studied the effects of unilateral frontal eye-field (FEF) lesions on eye-head coordination in monkeys that were trained to perform a visual search task. Eye and head movements were recorded with the scleral search coil technique using phase angle detection in a homogeneous electromagnetic field. In the visual search task all three animals showed a neglect for stimuli presented in the field contralateral to the lesion. In two animals the neglect disappeared within 2-3 wk. One animal had a lasting deficit. We found that FEF lesions that are restricted to area 8 cause only temporary deficits in eye and head movements. Up to a week after the lesion the animals had a strong preference to direct gaze and head to the side ipsilateral to the lesion. Animals tracked objects in contralateral space with combined eye and head movements, but failed to do this with the eyes alone. It was found that within a few days after the lesion, eye and head movements in the direction of the target were initiated, but they were inadequate and had long latencies. Within 1 wk latencies had regained preoperative values. Parallel with the recovery on the behavioral task, head movements became more prominent than before the lesion. Four weeks after the lesion, peak velocity of the head movement had increased by a factor of two, whereas the duration showed a twofold decrease compared with head movements before the lesion. No effects were seen on the duration and peak velocity of gaze. After the recovery on the behavioral task had stabilized, a relative neglect in the hemifield contralateral to the lesion could still be demonstrated by simultaneously presenting two stimuli in the left and right visual hemifields. The neglect is not due to a sensory deficit, but to a disorder of programming. The recovery from unilateral neglect after a FEF lesion is the result of a different orienting behavior, in which head movements become more important. It is concluded that the FEF plays an important role in the organization and coordination of eye and head movements and that lesions of this area result in subtle but permanent changes in eye-head coordination.


2021 ◽  
Author(s):  
Alina Ferecatu ◽  
Arnaud De Bruyn

This paper develops a learning model to describe decision makers' exploration/exploitation trade-offs and their link to psychometric traits.


Sign in / Sign up

Export Citation Format

Share Document