approximate value iteration
Recently Published Documents


TOTAL DOCUMENTS

26
(FIVE YEARS 4)

H-INDEX

4
(FIVE YEARS 0)

Author(s):  
Arunselvan Ramaswamy ◽  
Shalabh Bhatnagar

In this paper, we consider the stochastic iterative counterpart of the value iteration scheme wherein only noisy and possibly biased approximations of the Bellman operator are available. We call this counterpart the approximate value iteration (AVI) scheme. Neural networks are often used as function approximators, in order to counter Bellman’s curse of dimensionality. In this paper, they are used to approximate the Bellman operator. Because neural networks are typically trained using sample data, errors and biases may be introduced. The design of AVI accounts for implementations with biased approximations of the Bellman operator and sampling errors. We present verifiable sufficient conditions under which AVI is stable (almost surely bounded) and converges to a fixed point of the approximate Bellman operator. To ensure the stability of AVI, we present three different yet related sets of sufficient conditions that are based on the existence of an appropriate Lyapunov function. These Lyapunov function–based conditions are easily verifiable and new to the literature. The verifiability is enhanced by the fact that a recipe for the construction of the necessary Lyapunov function is also provided. We also show that the stability analysis of AVI can be readily extended to the general case of set-valued stochastic approximations. Finally, we show that AVI can also be used in more general circumstances, that is, for finding fixed points of contractive set-valued maps.


2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Majid Khalilzadeh ◽  
Hossein Neghabi ◽  
Ramin Ahadi

<p style='text-indent:20px;'>Advertising has always been considered a key part of marketing strategy and played a prominent role in the success or failure of products. This paper investigates a multi-product and multi-period advertising budget allocation, determining the amount of advertising budget for each product through the time horizon. Imperative factors including life cycle stage, <inline-formula><tex-math id="M1">\begin{document}$ BCG $\end{document}</tex-math></inline-formula> matrix class, competitors' reactions, and budget constraints affect the joint chain of decisions for all products to maximize the total profits. To do so, we define a stochastic sequential resource allocation problem and use an approximate dynamic programming (<inline-formula><tex-math id="M2">\begin{document}$ ADP $\end{document}</tex-math></inline-formula>) algorithm to alleviate the huge size of the problem and multi-dimensional uncertainties of the environment. These uncertainties are the reactions of competitors based on the current status of the market and our decisions, as well as the stochastic effectiveness (rewards) of the taken action. We apply an approximate value iteration (<inline-formula><tex-math id="M3">\begin{document}$ AVI $\end{document}</tex-math></inline-formula>) algorithm on a numerical example and compare the results with four different policies to highlight our managerial contributions. In the end, the validity of our proposed approach is assessed against a genetic algorithm. To do so, we simplify the environment by fixing the competitor's reaction and considering a deterministic environment.</p>


2018 ◽  
Vol 3 (2) ◽  
pp. 1330-1337 ◽  
Author(s):  
Julia Vinogradska ◽  
Bastian Bischoff ◽  
Jan Peters

Author(s):  
Timothy A. Mann ◽  
Shie Mannor ◽  
Doina Precup

The options framework provides a concrete way to implement and reason about temporally extended actions. Existing literature has demonstrated the value of planning with options empirically, but there is a lack of theoretical analysis formalizing when planning with options is more efficient than planning with primitive actions. We provide a general analysis of the convergence rate of a popular Approximate Value Iteration (AVI) algorithm called Fitted Value Iteration (FVI) with options. Our analysis reveals that longer duration options and a pessimistic estimate of the value function both lead to faster convergence. Furthermore, options can improve convergence even when they are suboptimal and sparsely distributed throughout the state space. Next we consider generating useful options for planning based on a subset of landmark states. This suggests a new algorithm, Landmark-based AVI (LAVI), that represents the value function only at landmark states. We analyze OFVI and LAVI using the proposed landmark-based options and compare the two algorithms. Our theoretical and experimental results demonstrate that options can play an important role in AVI by decreasing approximation error and inducing fast convergence.


Automatica ◽  
2017 ◽  
Vol 78 ◽  
pp. 79-87 ◽  
Author(s):  
Yongqiang Li ◽  
Zhongsheng Hou ◽  
Yuanjing Feng ◽  
Ronghu Chi

Sign in / Sign up

Export Citation Format

Share Document