Finite Approximation Error-Based Value Iteration ADP

Author(s):  
Derong Liu ◽  
Qinglai Wei ◽  
Ding Wang ◽  
Xiong Yang ◽  
Hongliang Li
2015 ◽  
Vol 53 ◽  
pp. 375-438 ◽  
Author(s):  
Timothy A. Mann ◽  
Shie Mannor ◽  
Doina Precup

Temporally extended actions have proven useful for reinforcement learning, but their duration also makes them valuable for efficient planning. The options framework provides a concrete way to implement and reason about temporally extended actions. Existing literature has demonstrated the value of planning with options empirically, but there is a lack of theoretical analysis formalizing when planning with options is more efficient than planning with primitive actions. We provide a general analysis of the convergence rate of a popular Approximate Value Iteration (AVI) algorithm called Fitted Value Iteration (FVI) with options. Our analysis reveals that longer duration options and a pessimistic estimate of the value function both lead to faster convergence. Furthermore, options can improve convergence even when they are suboptimal and sparsely distributed throughout the state-space. Next we consider the problem of generating useful options for planning based on a subset of landmark states. This suggests a new algorithm, Landmark-based AVI (LAVI), that represents the value function only at the landmark states. We analyze both FVI and LAVI using the proposed landmark-based options and compare the two algorithms. Our experimental results in three different domains demonstrate the key properties from the analysis. Our theoretical and experimental results demonstrate that options can play an important role in AVI by decreasing approximation error and inducing fast convergence.


Author(s):  
Timothy A. Mann ◽  
Shie Mannor ◽  
Doina Precup

The options framework provides a concrete way to implement and reason about temporally extended actions. Existing literature has demonstrated the value of planning with options empirically, but there is a lack of theoretical analysis formalizing when planning with options is more efficient than planning with primitive actions. We provide a general analysis of the convergence rate of a popular Approximate Value Iteration (AVI) algorithm called Fitted Value Iteration (FVI) with options. Our analysis reveals that longer duration options and a pessimistic estimate of the value function both lead to faster convergence. Furthermore, options can improve convergence even when they are suboptimal and sparsely distributed throughout the state space. Next we consider generating useful options for planning based on a subset of landmark states. This suggests a new algorithm, Landmark-based AVI (LAVI), that represents the value function only at landmark states. We analyze OFVI and LAVI using the proposed landmark-based options and compare the two algorithms. Our theoretical and experimental results demonstrate that options can play an important role in AVI by decreasing approximation error and inducing fast convergence.


2021 ◽  
Author(s):  
Benjamin J. Rothaupt ◽  
Stefan Notter ◽  
Walter Fichter

2020 ◽  
Vol 2 (7) ◽  
pp. 91-99
Author(s):  
E. V. KOSTYRIN ◽  
◽  
M. S. SINODSKAYA ◽  

The article analyzes the impact of certain factors on the volume of investments in the environment. Regression equations describing the relationship between the volume of investment in the environment and each of the influencing factors are constructed, the coefficients of the Pearson pair correlation between the dependent variable and the influencing factors, as well as pairwise between the influencing factors, are calculated. The average approximation error for each regression equation is determined. A correlation matrix is constructed and a conclusion is made. The developed econometric model is implemented in the program of separate collection of municipal solid waste (MSW) in Moscow. The efficiency of the model of investment management in the environment is evaluated on the example of the growth of planned investments in the activities of companies specializing in the export and processing of solid waste.


Sign in / Sign up

Export Citation Format

Share Document