Trade-Off Between Exploration and Exploitation

Author(s):  
Thomas T Hills
2020 ◽  
Author(s):  
Robert C Wilson ◽  
Elizabeth Bonawitz ◽  
Vincent Costa ◽  
Becket Ebitz

Explore-exploit decisions require us to trade off the benefits of exploring unknown options to learn more about them, with exploiting known options, for immediate reward. Such decisions are ubiquitous in nature, but from a computational perspective, they are notoriously hard. There is therefore much interest in how humans and animals make these decisions and recently there has been an explosion of research in this area. Here we provide a biased and incomplete snapshot of this field focusing on the major finding that many organisms use two distinct strategies to solve the explore-exploit dilemma: a bias for information (`directed exploration') and the randomization of choice (`random exploration'). We review evidence for the existence of these strategies, their computational properties, their neural implementations, as well as how directed and random exploration vary over the lifespan. We conclude by highlighting open questions in this field that are ripe to both explore and exploit.


Author(s):  
Hua Zhang ◽  
Youmin Xi

In previous studies on coordinating exploration-exploitation activities, much attention has been paid on network structures while the roles played by actors’ strategic behavior have been largely ignored. In this paper, the authors extend March’s simulation model on parallel problem solving by adding structurally equivalent imitation. In this way, one can examine how the interaction of network structure with agent behavior affects the knowledge process and finally influence group performance. This simulation experiment suggests that under the condition of regular network, the classical trade-off between exploration and exploitation will appear in the case of the preferentially attached network when agents adopt structure equivalence imitation. The whole organization implicitly would be divided into independent sub-groups that converge on different performance level and lead the organization to a lower performance level. The authors also explored the performance in the mixed organization and the management implication.


2021 ◽  
Author(s):  
Kazuhiro Sakamoto ◽  
Hidetake Okuzaki ◽  
Akinori Sato ◽  
Hajime Mushiake

AbstractThe exploration–exploitation trade-off is a fundamental problem in re-inforcement learning. To study the neural mechanisms involved in this problem, a target search task in which exploration and exploitation phases appear alternately is useful. Monkeys well trained in this task clearly understand that they have entered the exploratory phase and quickly acquire new experiences by resetting their previous experiences. In this study, we used a simple model to show that experience resetting in the exploratory phase improves performance rather than decreasing the greediness of action selection, and we then present a neural network-type model enabling experience resetting.


Sign in / Sign up

Export Citation Format

Share Document