scholarly journals Explore/exploit tradeoff strategies in a resource accumulation search task

Author(s):  
Ke Sang ◽  
Peter Martin Todd ◽  
Robert Goldstone ◽  
Thomas T. Hills

How, and how well, do people switch between exploration and exploitation to search for and accumulate resources? We study the decision processes underlying such exploration/exploitation tradeoffs by using a novel card selection task. With experience, participants learn to switch appropriately between exploration and exploitation and approach optimal performance. We model participants’ behavior on this task with random, threshold, and sampling strategies, and find that a linear decreasing threshold rule best fits participants’ results. Further evidence that participants use decreasing threshold-based strategies comes from reaction time differences between exploration and exploitation; however, participants themselves report non-decreasing thresholds. Decreasing threshold strategies that “front-load” exploration and switch quickly to exploitation are particularly effective in resource accumulation tasks, in contrast to optimal stopping problems like the Secretary Problem requiring longer exploration.

2021 ◽  
Author(s):  
Kazuhiro Sakamoto ◽  
Hidetake Okuzaki ◽  
Akinori Sato ◽  
Hajime Mushiake

AbstractThe exploration–exploitation trade-off is a fundamental problem in re-inforcement learning. To study the neural mechanisms involved in this problem, a target search task in which exploration and exploitation phases appear alternately is useful. Monkeys well trained in this task clearly understand that they have entered the exploratory phase and quickly acquire new experiences by resetting their previous experiences. In this study, we used a simple model to show that experience resetting in the exploratory phase improves performance rather than decreasing the greediness of action selection, and we then present a neural network-type model enabling experience resetting.


2021 ◽  
Author(s):  
Alina Ferecatu ◽  
Arnaud De Bruyn

This paper develops a learning model to describe decision makers' exploration/exploitation trade-offs and their link to psychometric traits.


2019 ◽  
Author(s):  
Nathaniel J. Blanco ◽  
Vladimir Sloutsky

Organisms need to constantly balance the competing demands of gathering information and using previously acquired information to obtain rewarding outcomes (i.e., the “exploration- exploitation” dilemma). Exploration is critical to obtain information to discover how the world works, which should be particularly important for young children. While studies have shown that young children explore in response to surprising events, little is known about how they balance exploration and exploitation across multiple decisions or about how this process changes with development. In this study we compare decision-making patterns of children and adults and evaluate the relative influences of reward-seeking, random exploration, and systematic switching (which approximates uncertainty-directed exploration). In a second experiment we directly test the effect of uncertainty on children’s choices. Influential models of decision-making generally describe systematic exploration as a computationally refined capacity that relies on top-down cognitive control. We demonstrate that (1) systematic patterns dominate young children’s behavior (facilitating exploration), despite protracted development of cognitive control, and (2) that uncertainty plays a major, but complicated, role in determining children’s choices. We conclude that while young children’s immature top-down control should hinder adult-like systematic exploration, other mechanisms may pick up the slack, facilitating broad information gathering in a systematic fashion to build a foundation of knowledge for use later in life.


2020 ◽  
Vol 44 (2) ◽  
Author(s):  
Ke Sang ◽  
Peter M. Todd ◽  
Robert L. Goldstone ◽  
Thomas T. Hills

2016 ◽  
Vol 48 (3) ◽  
pp. 726-743 ◽  
Author(s):  
Mitsushi Tamaki

Abstract The best-choice problem and the duration problem, known as versions of the secretary problem, are concerned with choosing an object from those that appear sequentially. Let (B,p) denote the best-choice problem and (D,p) the duration problem when the total number N of objects is a bounded random variable with prior p=(p1, p2,...,pn) for a known upper bound n. Gnedin (2005) discovered the correspondence relation between these two quite different optimal stopping problems. That is, for any given prior p, there exists another prior q such that (D,p) is equivalent to (B,q). In this paper, motivated by his discovery, we attempt to find the alternate correspondence {p(m),m≥0}, i.e. an infinite sequence of priors such that (D,p(m-1)) is equivalent to (B,p(m)) for all m≥1, starting with p(0)=(0,...,0,1). To be more precise, the duration problem is distinguished into (D1,p) or (D2,p), referred to as model 1 or model 2, depending on whether the planning horizon is N or n. The aforementioned problem is model 1. For model 2 as well, we can find the similar alternate correspondence {p[m],m≥ 0}. We treat both the no-information model and the full-information model and examine the limiting behaviors of their optimal rules and optimal values related to the alternate correspondences as n→∞. A generalization of the no-information model is given. It is worth mentioning that the alternate correspondences for model 1 and model 2 are respectively related to the urn sampling models without replacement and with replacement.


2015 ◽  
Vol 19 (01) ◽  
pp. 1550008 ◽  
Author(s):  
RANGGA ALMAHENDRA ◽  
BJÖRN AMBOS

The exploration–exploitation tension has been resonated and applied in diverse areas of management research. Its applications have deviated substantially from the scope of organisational learning as originally proposed by March [(1991). Exploration and exploitation in organizational learning. Organization Science, 2(1), 71–87]. Scholars have developed set of definitions, new conceptualisations, and varied applications in rejuvenating the concept; and literatures on this topic seem do not significantly ensure a conclusive picture. It is still also unclear what are the antecedents and following scientific breakthroughs which may have led to the divergence of this construct. This study offers an added value as it becomes the first to apply a bibliometric analysis, combined with fine-grained content analysis to attain a more comprehensive understanding on how the construct of exploration–exploitation have grown and evolved during the last 20 years. We attempt to grasp the structural pattern of citing behaviour and collective understanding among scholars, through conducting in-depth bibliographic review in a complete population of articles on this topic, published in leading journals following March [(1991). Exploration and exploitation in organizational learning. Organization Science, 2(1), 71–87]. This study identifies the intellectual base articles which form the basis of the exploration–exploitation and the turning point articles that shift the discussion into different directions.


Organizacija ◽  
2015 ◽  
Vol 48 (2) ◽  
pp. 112-119 ◽  
Author(s):  
Mladenka Popadić ◽  
Matej Černe ◽  
Ines Milohnić

Abstract Background and Purpose: The construct of organizational ambidexterity (OA) has attracted the growing attention in management research. Previous empirical research has investigated the effect of organisational ambidexterity on performance from various perspectives. This study aims to resolve the contradictory previous research findings on the relationship between organisational ambidexterity and innovation performance. We unpack this construct with combined dimension of ambidexterity, which relates to a combination of high levels of both exploration and exploitation (introduction of products or services that were new to the market and new to the firm). Methodology: We frame our ambidexterity hypothesis in terms of firm’s innovation orientation. The hypothesis is tested by using Community Innovation Survey (CIS) 2006 micro data at the organizational level in twelve countries. To operationalize an ambidexterity and firms innovation outcome, we used self-reported measures of innovativeness. Results: To test our hypothesis, we developed a set of models and tested them with multiple hierarchical linear regression analyses. The results indicate that exploration and exploitation are positively related to firm’s innovation performances which supports our assumption that both are complementary. Furthermore, we find that above and over their independent effects, through combining them into a single construct of organizational ambidexterity, this variable remains negatively and significantly related to innovation performance. Conclusion: These results provides the managers with an idea of when managing trade-offs between exploration and exploitation would be more favorable versus detrimental. For firms with lower organizational ambidexterity, the relationship between exploration-exploitation and the firm’s innovation performance is a more positive one.


1996 ◽  
Vol 28 (3) ◽  
pp. 828-852 ◽  
Author(s):  
David Assaf ◽  
Ester Samuel-Cahn

n candidates, represented by n i.i.d. continuous random variables X1, …, Xn with known distribution arrive sequentially, and one of them must be chosen, using a non-anticipating stopping rule. The objective is to minimize the expected rank (among the ranks of X1, …, Xn) of the candidate chosen, where the best candidate, i.e. the one with smallest X-value, has rank one, etc. Let the value of the optimal rule be Vn, and lim Vn = V. We prove that V > 1.85. Limiting consideration to the class of threshold rules of the form tn = min {k: Xk ≦ ak for some constants ak, let Wn be the value of the expected rank for the optimal threshold rule, and lim Wn = W. We show 2.295 < W < 2.327.


2015 ◽  
Vol 21 (5) ◽  
pp. 1140-1161 ◽  
Author(s):  
Rita Lavikka ◽  
Riitta Smeds ◽  
Miia Jaatinen

Purpose – The purpose of this paper is to discover a three-step process for building contextual ambidexterity into inter-organizational IT-enabled service processes through developmental interventions. Design/methodology/approach – A longitudinal action research project was conducted. The empirical study consisted of three consecutive developmental interventions to support the collaborative development effort of an IT company and its customer network to efficiently serve their present and future customers. The data consists of process modeling and simulation workshop discussions, interviews, observation, and archival data. The development effort was studied for over a year. Findings – The study shows that the three developmental interventions acted as a process for balancing the exploration-exploitation tension in inter-organizational service processes. The sequential interventions facilitated the studied organizations in crossing the inter-organizational knowledge boundaries and creating shared domain knowledge, creating common understanding of the collaborative IT-enabled service processes, and co-developing the coordination mechanisms that are essential for the continuous exploration and exploitation of the new ideas in the future collaborative service processes. These three steps built capacity for the inter-organizational management system to achieve synergies between goals, resources, and activities in the inter-organizational collaboration. Originality/value – The study contributes to the understanding on the process of building inter-organizational ambidexterity. The study presents a three-step process for building inter-organizational contextual ambidexterity into the IT-enabled service processes through developmental interventions. Research on inter-organizational contextual ambidexterity is combined with research on coordination and knowledge management.


2021 ◽  
Author(s):  
Franziska Regnath ◽  
Sebastiaan Mathôt

AbstractThe adaptive gain theory (AGT) posits that activity in the locus coeruleus (LC) is linked to two behavioral modes: exploitation, characterized by focused attention on a single task; and exploration, characterized by a lack of focused attention and frequent switching between tasks. Furthermore, pupil size correlates with LC activity, such that large pupils indicate increased LC firing, and by extension also exploration behavior. Most evidence for this correlation in humans comes from complex behavior in game-like tasks. However, predictions of the AGT naturally extend to a very basic form of behavior: eye movements. To test this, we used a visual-search task. Participants searched for a target among many distractors, while we measured their pupil diameter and eye movements. The display was divided into four randomly generated regions of different colors. Although these regions were irrelevant to the task, participants were sensitive to their boundaries, and dwelled within regions for longer than expected by chance. Crucially, pupil size increased before eye movements that carried gaze from one region to another. We propose that eye movements that stay within regions (or objects) correspond to exploitation behavior, whereas eye movements that switch between regions (or objects) correspond to exploration behavior.Public Significance StatementWhen people experience increased arousal, their pupils dilate. The adaptive-gain theory proposes that pupil size reflects neural activity in the locus coeruleus (LC), which in turn is associated with two behavioral modes: a vigilant, distractible mode (“exploration”), and a calm, focused mode (“exploitation”). During exploration, pupils are larger and LC activity is higher than during exploitation. Here we show that the predictions of this theory generalize to eye movements: smaller pupils coincide with eye movements indicative of exploitation, while pupils slightly dilate just before make eye movements that are indicative of exploration.


Sign in / Sign up

Export Citation Format

Share Document