SPLINE APPROXIMATIONS TO VALUE FUNCTIONS

1997 ◽  
Vol 1 (1) ◽  
pp. 255-277 ◽  
Author(s):  
MICHAEL A. TRICK ◽  
STANLEY E. ZIN

We review the properties of algorithms that characterize the solution of the Bellman equation of a stochastic dynamic program, as the solution to a linear program. The variables in this problem are the ordinates of the value function; hence, the number of variables grows with the state space. For situations in which this size becomes computationally burdensome, we suggest the use of low-dimensional cubic-spline approximations to the value function. We show that fitting this approximation through linear programming provides upper and lower bounds on the solution to the original large problem. The information contained in these bounds leads to inexpensive improvements in the accuracy of approximate solutions.

Author(s):  
Yangchen Pan ◽  
Hengshuai Yao ◽  
Amir-massoud Farahmand ◽  
Martha White

Dyna is an architecture for model based reinforcement learning (RL), where simulated experience from a model is used to update policies or value functions. A key component of Dyna is search control, the mechanism to generate the state and action from which the agent queries the model, which remains largely unexplored. In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function. This has the effect of propagating value from high value regions and of preemptively updating value estimates of the regions that the agent is likely to visit next. We derive a noisy projected natural gradient algorithm for hill climbing, and highlight a connection to Langevin dynamics. We provide an empirical demonstration on four classical domains that our algorithm, HC Dyna, can obtain significant sample efficiency improvements. We study the properties of different sampling distributions for search control, and find that there appears to be a benefit specifically from using the samples generated by climbing on current value estimates from low value to high value region.


2020 ◽  
Vol 9 (2) ◽  
pp. 459-470
Author(s):  
Helin Wu ◽  
Yong Ren ◽  
Feng Hu

Abstract In this paper, we investigate some kind of Dynkin game under g-expectation induced by backward stochastic differential equation (short for BSDE). The lower and upper value functions $$\underline{V}_t=ess\sup \nolimits _{\tau \in {\mathcal {T}_t}} ess\inf \nolimits _{\sigma \in {\mathcal {T}_t}}\mathcal {E}^g_t[R(\tau ,\sigma )]$$ V ̲ t = e s s sup τ ∈ T t e s s inf σ ∈ T t E t g [ R ( τ , σ ) ] and $$\overline{V}_t=ess\inf \nolimits _{\sigma \in {\mathcal {T}_t}} ess\sup \nolimits _{\tau \in {\mathcal {T}_t}}\mathcal {E}^g_t[R(\tau ,\sigma )]$$ V ¯ t = e s s inf σ ∈ T t e s s sup τ ∈ T t E t g [ R ( τ , σ ) ] are defined, respectively. Under some suitable assumptions, a pair of saddle points is obtained and the value function of Dynkin game $$V(t)=\underline{V}_t=\overline{V}_t$$ V ( t ) = V ̲ t = V ¯ t follows. Furthermore, we also consider the constrained case of Dynkin game.


2020 ◽  
Vol 10 (1) ◽  
pp. 235-259
Author(s):  
Katharina Bata ◽  
Hanspeter Schmidli

AbstractWe consider a risk model in discrete time with dividends and capital injections. The goal is to maximise the value of a dividend strategy. We show that the optimal strategy is of barrier type. That is, all capital above a certain threshold is paid as dividend. A second problem adds tax to the dividends but an injection leads to an exemption from tax. We show that the value function fulfils a Bellman equation. As a special case, we consider the case of premia of size one. In this case we show that the optimal strategy is a two barrier strategy. That is, there is a barrier if a next dividend of size one can be paid without tax and a barrier if the next dividend of size one will be taxed. In both models, we illustrate the findings by de Finetti’s example.


1984 ◽  
Vol 16 (1) ◽  
pp. 16-16
Author(s):  
Domokos Vermes

We consider the optimal control of deterministic processes with countably many (non-accumulating) random iumps. A necessary and sufficient optimality condition can be given in the form of a Hamilton-jacobi-Bellman equation which is a functionaldifferential equation with boundary conditions in the case considered. Its solution, the value function, is continuously differentiable along the deterministic trajectories if. the random jumps only are controllable and it can be represented as a supremum of smooth subsolutions in the general case, i.e. when both the deterministic motion and the random jumps are controlled (cf. the survey by M. H. A. Davis (p.14)).


2009 ◽  
Vol 9 (1) ◽  
Author(s):  
Axel Anderson

This paper characterizes the behavior of value functions in dynamic stochastic discounted programming models near fixed points of the state space. When the second derivative of the flow payoff function is bounded, the value function is proportional to a linear function plus geometric term. A specific formula for the exponent of this geometric term is provided. This exponent continuously falls in the rate of patience.If the state variable is a martingale, the second derivative of the value function is unbounded. If the state variable is instead a strict local submartingale, then the same holds for the first derivative of the value function. Thus, the proposed approximation is more accurate than Taylor series approximation.The approximation result is used to characterize locally optimal policies in several fundamental economic problems.


Author(s):  
Junlong Zhang ◽  
Osman Y. Özaltın

We develop an exact value function-based approach to solve a class of bilevel integer programs with stochastic right-hand sides. We first study structural properties and design two methods to efficiently construct the value function of a bilevel integer program. Most notably, we generalize the integer complementary slackness theorem to bilevel integer programs. We also show that the value function of a bilevel integer program can be characterized by its values on a set of so-called bilevel minimal vectors. We then solve the value function reformulation of the original bilevel integer program with stochastic right-hand sides using a branch-and-bound algorithm. We demonstrate the performance of our solution methods on a set of randomly generated instances. We also apply the proposed approach to a bilevel facility interdiction problem. Our computational experiments show that the proposed solution methods can efficiently optimize large-scale instances. The performance of our value function-based approach is relatively insensitive to the number of scenarios, but it is sensitive to the number of constraints with stochastic right-hand sides. Summary of Contribution: Bilevel integer programs arise in many different application areas of operations research including supply chain, energy, defense, and revenue management. This paper derives structural properties of the value functions of bilevel integer programs. Furthermore, it proposes exact solution algorithms for a class of bilevel integer programs with stochastic right-hand sides. These algorithms extend the applicability of bilevel integer programs to a larger set of decision-making problems under uncertainty.


1996 ◽  
Vol 53 (1) ◽  
pp. 51-62 ◽  
Author(s):  
Shigeaki Koike

The value function is presented by minimisation of a cost functional over admissible controls. The associated first order Bellman equations with varying control are treated. It turns out that the value function is a viscosity solution of the Bellman equation and the comparison principle holds, which is an essential tool in obtaining the uniqueness of the viscosity solutions.


Mathematics ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. 1109 ◽  
Author(s):  
Agnieszka Wiszniewska-Matyszkiel ◽  
Rajani Singh

We study general classes of discrete time dynamic optimization problems and dynamic games with feedback controls. In such problems, the solution is usually found by using the Bellman or Hamilton–Jacobi–Bellman equation for the value function in the case of dynamic optimization and a set of such coupled equations for dynamic games, which is not always possible accurately. We derive general rules stating what kind of errors in the calculation or computation of the value function do not result in errors in calculation or computation of an optimal control or a Nash equilibrium along the corresponding trajectory. This general result concerns not only errors resulting from using numerical methods but also errors resulting from some preliminary assumptions related to replacing the actual value functions by some a priori assumed constraints for them on certain subsets. We illustrate the results by a motivating example of the Fish Wars, with singularities in payoffs.


Sign in / Sign up

Export Citation Format

Share Document