scholarly journals How Active is Active Learning: Value Function Method Versus an Approximation Method

2020 ◽  
Vol 56 (3) ◽  
pp. 675-693
Author(s):  
Hans M. Amman ◽  
Marco P. Tucci

AbstractIn a previous paper Amman et al. (Macroecon Dyn, 2018) compare the two dominant approaches for solving models with optimal experimentation (also called active learning), i.e. the value function and the approximation method. By using the same model and dataset as in Beck and Wieland (J Econ Dyn Control 26:1359–1377, 2002), they find that the approximation method produces solutions close to those generated by the value function approach and identify some elements of the model specifications which affect the difference between the two solutions. They conclude that differences are small when the effects of learning are limited. However the dataset used in the experiment describes a situation where the controller is dealing with a nonstationary process and there is no penalty on the control. The goal of this paper is to see if their conclusions hold in the more commonly studied case of a controller facing a stationary process and a positive penalty on the control.

2018 ◽  
Vol 24 (5) ◽  
pp. 1073-1086 ◽  
Author(s):  
Hans M. Amman ◽  
David A. Kendrick ◽  
Marco P. Tucci

In the economics literature, there are two dominant approaches for solving models with optimal experimentation (also called active learning). The first approach is based on the value function and the second on an approximation method. In principle the value function approach is the preferred method. However, it suffers from thecurse of dimensionalityand is only applicable to small problems with a limited number of policy variables. The approximation method allows for a computationally larger class of models, but may produce results that deviate from the optimal solution. Our simulations indicate that when the effects of learning are limited, the differences may be small. However, when there is sufficient scope for learning, the value function solution seems more aggressive in the use of the policy variable.


2014 ◽  
Vol 51 (2) ◽  
pp. 436-452
Author(s):  
Shangzhen Luo

In this paper we study a reinsurance game between two insurers whose surplus processes are modeled by arithmetic Brownian motions. We assume a minimax criterion in the game. One insurer tries to maximize the probability of absolute dominance while the other tries to minimize it through reinsurance control. Here absolute dominance is defined as the event that liminf of the difference of the surplus levels tends to -∞. Under suitable parameter conditions, the game is solved with the value function and the Nash equilibrium strategy given in explicit form.


2014 ◽  
Vol 51 (02) ◽  
pp. 436-452 ◽  
Author(s):  
Shangzhen Luo

In this paper we study a reinsurance game between two insurers whose surplus processes are modeled by arithmetic Brownian motions. We assume a minimax criterion in the game. One insurer tries to maximize the probability of absolute dominance while the other tries to minimize it through reinsurance control. Here absolute dominance is defined as the event that liminf of the difference of the surplus levels tends to -∞. Under suitable parameter conditions, the game is solved with the value function and the Nash equilibrium strategy given in explicit form.


2005 ◽  
Vol 17 (2) ◽  
pp. 335-359 ◽  
Author(s):  
Jun Morimoto ◽  
Kenji Doya

This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both off-line learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H∞ control, we consider a differential game in which a “disturbing” agent tries to make the worst possible disturbance while a “control” agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H∞ control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.


2014 ◽  
Vol 51 (02) ◽  
pp. 436-452
Author(s):  
Shangzhen Luo

In this paper we study a reinsurance game between two insurers whose surplus processes are modeled by arithmetic Brownian motions. We assume a minimax criterion in the game. One insurer tries to maximize the probability of absolute dominance while the other tries to minimize it through reinsurance control. Here absolute dominance is defined as the event that liminf of the difference of the surplus levels tends to -∞. Under suitable parameter conditions, the game is solved with the value function and the Nash equilibrium strategy given in explicit form.


2011 ◽  
Author(s):  
Anouk Festjens ◽  
Siegfried Dewitte ◽  
Enrico Diecidue ◽  
Sabrina Bruyneel

2021 ◽  
Vol 14 (3) ◽  
pp. 130
Author(s):  
Jonas Al-Hadad ◽  
Zbigniew Palmowski

The main objective of this paper is to present an algorithm of pricing perpetual American put options with asset-dependent discounting. The value function of such an instrument can be described as VAPutω(s)=supτ∈TEs[e−∫0τω(Sw)dw(K−Sτ)+], where T is a family of stopping times, ω is a discount function and E is an expectation taken with respect to a martingale measure. Moreover, we assume that the asset price process St is a geometric Lévy process with negative exponential jumps, i.e., St=seζt+σBt−∑i=1NtYi. The asset-dependent discounting is reflected in the ω function, so this approach is a generalisation of the classic case when ω is constant. It turns out that under certain conditions on the ω function, the value function VAPutω(s) is convex and can be represented in a closed form. We provide an option pricing algorithm in this scenario and we present exact calculations for the particular choices of ω such that VAPutω(s) takes a simplified form.


Sign in / Sign up

Export Citation Format

Share Document