value iteration Latest Research Papers

In this paper, we consider the stochastic iterative counterpart of the value iteration scheme wherein only noisy and possibly biased approximations of the Bellman operator are available. We call this counterpart the approximate value iteration (AVI) scheme. Neural networks are often used as function approximators, in order to counter Bellman’s curse of dimensionality. In this paper, they are used to approximate the Bellman operator. Because neural networks are typically trained using sample data, errors and biases may be introduced. The design of AVI accounts for implementations with biased approximations of the Bellman operator and sampling errors. We present verifiable sufficient conditions under which AVI is stable (almost surely bounded) and converges to a fixed point of the approximate Bellman operator. To ensure the stability of AVI, we present three different yet related sets of sufficient conditions that are based on the existence of an appropriate Lyapunov function. These Lyapunov function–based conditions are easily verifiable and new to the literature. The verifiability is enhanced by the fact that a recipe for the construction of the necessary Lyapunov function is also provided. We also show that the stability analysis of AVI can be readily extended to the general case of set-valued stochastic approximations. Finally, we show that AVI can also be used in more general circumstances, that is, for finding fixed points of contractive set-valued maps.

Download Full-text

Value Iteration Networks with Double Estimator for Planetary Rover Path Planning

Sensors ◽

10.3390/s21248418 ◽

2021 ◽

Vol 21 (24) ◽

pp. 8418

Author(s):

Xiang Jin ◽

Wei Lan ◽

Tianlin Wang ◽

Pengyao Yu

Keyword(s):

Path Planning ◽

Large Scale ◽

Computational Cost ◽

Poor Performance ◽

Value Iteration ◽

Training Strategy ◽

Large Size ◽

Planning Algorithm ◽

Value Estimation ◽

Path Planning Algorithm

Path planning technology is significant for planetary rovers that perform exploration missions in unfamiliar environments. In this work, we propose a novel global path planning algorithm, based on the value iteration network (VIN), which is embedded within a differentiable planning module, built on the value iteration (VI) algorithm, and has emerged as an effective method to learn to plan. Despite the capability of learning environment dynamics and performing long-range reasoning, the VIN suffers from several limitations, including sensitivity to initialization and poor performance in large-scale domains. We introduce the double value iteration network (dVIN), which decouples action selection and value estimation in the VI module, using the weighted double estimator method to approximate the maximum expected value, instead of maximizing over the estimated action value. We have devised a simple, yet effective, two-stage training strategy for VI-based models to address the problem of high computational cost and poor performance in large-size domains. We evaluate the dVIN on planning problems in grid-world domains and realistic datasets, generated from terrain images of a moon landscape. We show that our dVIN empirically outperforms the baseline methods and generalize better to large-scale environments.

Download Full-text

Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee

Neural Networks ◽

10.1016/j.neunet.2021.08.025 ◽

2021 ◽

Vol 144 ◽

pp. 176-186

Author(s):

Mingming Ha ◽

Ding Wang ◽

Derong Liu

Keyword(s):

Neural Network ◽

Optimal Control ◽

Value Iteration

Download Full-text

Data‐driven robust value iteration control with application to wind turbine pitch control

Optimal Control Applications and Methods ◽

10.1002/oca.2834 ◽

2021 ◽

Author(s):

Yang Liu ◽

Zhanpeng Jiang ◽

Lichao Hao ◽

Zuoxia Xing ◽

Mingyang Chen ◽

...

Keyword(s):

Wind Turbine ◽

Data Driven ◽

Value Iteration ◽

Pitch Control

Download Full-text

Distributionally Robust Markov Decision Processes and Their Connection to Risk Measures

Mathematics of Operations Research ◽

10.1287/moor.2021.1187 ◽

2021 ◽

Author(s):

Nicole Bäuerle ◽

Alexander Glauner

Keyword(s):

Robust Optimization ◽

Markov Decision Processes ◽

Optimization Problem ◽

Stackelberg Game ◽

Risk Measure ◽

Minimax Theorem ◽

Decision Processes ◽

Value Iteration ◽

Markov Decision ◽

Robust Optimization Problem

We consider robust Markov decision processes with Borel state and action spaces, unbounded cost, and finite time horizon. Our formulation leads to a Stackelberg game against nature. Under integrability, continuity, and compactness assumptions, we derive a robust cost iteration for a fixed policy of the decision maker and a value iteration for the robust optimization problem. Moreover, we show the existence of deterministic optimal policies for both players. This is in contrast to classical zero-sum games. In case the state space is the real line, we show under some convexity assumptions that the interchange of supremum and infimum is possible with the help of Sion’s minimax theorem. Further, we consider the problem with special ambiguity sets. In particular, we are able to derive some cases where the robust optimization problem coincides with the minimization of a coherent risk measure. In the final section, we discuss two applications: a robust linear-quadratic problem and a robust problem for managing regenerative energy.

Download Full-text

Accelerated value iteration via Anderson mixing

Science China Information Sciences ◽

10.1007/s11432-019-2889-x ◽

2021 ◽

Vol 64 (12) ◽

Author(s):

Yujun Li

Keyword(s):

Value Iteration

Download Full-text

Generalized Value Iteration for Nonaffine Adaptive Tracking Control with Application to a Wastewater Treatment System

10.1109/icrae53653.2021.9657807 ◽

2021 ◽

Author(s):

Ning Gao ◽

Ding Wang ◽

Mingming Zhao ◽

Lingzhi Hu

Keyword(s):

Wastewater Treatment ◽

Tracking Control ◽

Treatment System ◽

Value Iteration ◽

Adaptive Tracking Control ◽

Wastewater Treatment System ◽

Adaptive Tracking

Download Full-text

A Lyapunov-based version of the Value Iteration algorithm formulated as a discrete-time switched affine system

International Journal of Control ◽

10.1080/00207179.2021.2005260 ◽

2021 ◽

pp. 1-0

Author(s):

Raffaele Iervolino ◽

Massimo Tipaldi ◽

Ali Forootani

Keyword(s):

Discrete Time ◽

Iteration Algorithm ◽

Value Iteration ◽

Affine System ◽

Value Iteration Algorithm

Download Full-text

value iteration
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Value iteration and adaptive optimal output regulation with assured convergence rate

System Stability of Learning-Based Linear Optimal Control With General Discounted Value Iteration

Analyzing Approximate Value Iteration Algorithms

Value Iteration Networks with Double Estimator for Planetary Rover Path Planning

Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee

Data‐driven robust value iteration control with application to wind turbine pitch control

Distributionally Robust Markov Decision Processes and Their Connection to Risk Measures

Accelerated value iteration via Anderson mixing

Generalized Value Iteration for Nonaffine Adaptive Tracking Control with Application to a Wastewater Treatment System

A Lyapunov-based version of the Value Iteration algorithm formulated as a discrete-time switched affine system

Export Citation Format

value iterationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Value iteration and adaptive optimal output regulation with assured convergence rate

System Stability of Learning-Based Linear Optimal Control With General Discounted Value Iteration

Analyzing Approximate Value Iteration Algorithms

Value Iteration Networks with Double Estimator for Planetary Rover Path Planning

Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee

Data‐driven robust value iteration control with application to wind turbine pitch control

Distributionally Robust Markov Decision Processes and Their Connection to Risk Measures

Accelerated value iteration via Anderson mixing

Generalized Value Iteration for Nonaffine Adaptive Tracking Control with Application to a Wastewater Treatment System

A Lyapunov-based version of the Value Iteration algorithm formulated as a discrete-time switched affine system

value iteration
Recently Published Documents