exploration versus exploitation
Recently Published Documents


TOTAL DOCUMENTS

32
(FIVE YEARS 8)

H-INDEX

12
(FIVE YEARS 2)

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-18
Author(s):  
Wali Khan Mashwani ◽  
Ruqayya Haider ◽  
Samir Brahim Belhaouari

Constrained optimization plays an important role in many decision-making problems and various real-world applications. In the last two decades, various evolutionary algorithms (EAs) were developed and still are developing under the umbrella of evolutionary computation. In general, EAs are mainly categorized into nature-inspired and swarm-intelligence- (SI-) based paradigms. All these developed algorithms have some merits and also demerits. Particle swarm optimization (PSO), firefly algorithm, ant colony optimization (ACO), and bat algorithm (BA) have gained much popularity and they have successfully tackled various test suites of benchmark functions and real-world problems. These SI-based algorithms follow the social and interactive principles to perform their search process while approximating solution for the given problems. In this paper, a multiswarm-intelligence-based algorithm (MSIA) is developed to cope with bound constrained functions. The suggested algorithm integrates the SI-based algorithms to evolve population and handle exploration versus exploitation issues. Thirty bound constrained benchmark functions are used to evaluate the performance of the proposed algorithm. The test suite of benchmark function is recently designed for the special session of EAs competition in IEEE Congress on Evolutionary Computation (IEEE-CEC′13). The suggested algorithm has approximated promising solutions with good convergence and diversity maintenance for most of the used bound constrained single optimization problems.


2020 ◽  
Vol 68 (5) ◽  
pp. 1585-1604
Author(s):  
Sajad Modaresi ◽  
Denis Sauré ◽  
Juan Pablo Vielma

When moving from the traditional to combinatorial multiarmed bandit setting, addressing the classical exploration versus exploitation trade-off is a challenging task. In “Learning in Combinatorial Optimization: What and How to Explore,” Modaresi, Sauré, and Vielma show that the combinatorial setting has salient features that distinguish it from the traditional bandit. In particular, combinatorial structure induces correlation between cost of different solutions, thus raising the questions of what parameters to estimate and how to collect and combine information. The authors answer such questions by developing a novel optimization problem called the lower-bound problem (LBP). They establish a fundamental limit on asymptotic performance of any admissible policy and propose near-optimal LBP-based policies. Because LBP is likely intractable in practice, they propose policies that instead solve a proxy for LBP, which they call the optimality cover problem (OCP). They provide strong evidence of practical tractability of OCP and illustrate the markedly superior performance of OCP-based policies numerically.


2020 ◽  
Vol 2020 (1) ◽  
pp. 18629
Author(s):  
Yudi Hou ◽  
Yuchen Zhang ◽  
Shenghui Wang ◽  
Xue Wan ◽  
Kaicheng Liao

2019 ◽  
Vol 42 ◽  
Author(s):  
Nader Chmait ◽  
David L. Dowe ◽  
David G. Green ◽  
Yuan-Fang Li

AbstractFor artificial agents trading off exploration (food seeking) versus (short-term) exploitation (or consumption), our experiments suggest that uncertainty (interpreted information, theoretically) magnifies food seeking. In more uncertain environments, with food distributed uniformly randomly, exploration appears to be beneficial. In contrast, in biassed (less uncertain) environments, with food concentrated in only one part, exploitation appears to be more advantageous. Agents also appear to do better in biassed environments.


2018 ◽  
Vol 45 (8) ◽  
pp. 3178-3203 ◽  
Author(s):  
Brian L. Connelly ◽  
Wei Shi ◽  
Robert E. Hoskisson ◽  
Balaji R. Koka

In this study, we theorize about how different types of institutional investors influence firms’ choice of exploration versus exploitation for their joint ventures (JVs). Exploratory JVs engender risk, uncertain outcomes, and ex post contractual updating, whereas exploitative JVs allow for ex ante contracts. We argue that dedicated institutional investors (DIIs), who maintain concentrated holdings over time regardless of current earnings, offer tolerance for failure and reward for long-term success that encourages managerial choice of exploratory JVs. Transient institutional investors (TIIs), who trade frequently based on near-term performance metrics, prefer ex ante contracts and use exit to discipline managers who do not meet their short-term performance objectives. This suggests that TIIs may influence managers to reduce the extent to which they choose exploratory (as opposed to exploitative) JVs. Furthermore, we argue that the transactional governance of TIIs gives way to the relational monitoring of DIIs when both types of shareholders are present. As a result, the likelihood of choosing exploration, versus exploitation, as a JV formation strategy is greatest in the presence of high DII and TII ownership. We examine JVs among S&P 500 firms over the years 2000 to 2010, and results largely support our theory.


Author(s):  
Qiming Fu ◽  
Quan Liu ◽  
Shan Zhong ◽  
Heng Luo ◽  
Hongjie Wu ◽  
...  

In reinforcement learning (RL), the exploration/exploitation (E/E) dilemma is a very crucial issue, which can be described as searching between the exploration of the environment to find more profitable actions, and the exploitation of the best empirical actions for the current state. We focus on the single trajectory RL problem where an agent is interacting with a partially unknown MDP over single trajectories, and try to deal with the E/E in this setting. Given the reward function, we try to find a good E/E strategy to address the MDPs under some MDP distribution. This is achieved by selecting the best strategy in mean over a potential MDP distribution from a large set of candidate strategies, which is done by exploiting single trajectories drawn from plenty of MDPs. In this paper, we mainly make the following contributions: (1) We discuss the strategy-selector algorithm based on formula set and polynomial function. (2) We provide the theoretical and experimental regret analysis of the learned strategy under an given MDP distribution. (3) We compare these methods with the “state-of-the-art” Bayesian RL method experimentally.


Sign in / Sign up

Export Citation Format

Share Document