Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP withl2-regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency.

Download Full-text

Conditions on Features for Temporal Difference-Like Methods to Converge

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/357 ◽

2019 ◽

Author(s):

Marcus Hutter ◽

Samuel Yang-Zhao ◽

Sultan Javed Majeed

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Bellman Equation ◽

Complete Characterization ◽

Value Functions ◽

Approximation Space ◽

State Aggregation ◽

Linear Function Approximation ◽

The Value Function

The convergence of many reinforcement learning (RL) algorithms with linear function approximation has been investigated extensively but most proofs assume that these methods converge to a unique solution. In this paper, we provide a complete characterization of non-uniqueness issues for a large class of reinforcement learning algorithms, simultaneously unifying many counter-examples to convergence in a theoretical framework. We achieve this by proving a new condition on features that can determine whether the convergence assumptions are valid or non-uniqueness holds. We consider a general class of RL methods, which we call natural algorithms, whose solutions are characterized as the fixed point of a projected Bellman equation. Our main result proves that natural algorithms converge to the correct solution if and only if all the value functions in the approximation space satisfy a certain shape. This implies that natural algorithms are, in general, inherently prone to converge to the wrong solution for most feature choices even if the value function can be represented exactly. Given our results, we show that state aggregation-based features are a safe choice for natural algorithms and also provide a condition for finding convergent algorithms under other feature constructions.

Download Full-text

Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling

Machine Learning ◽

10.1007/s10994-020-05912-5 ◽

2021 ◽

Author(s):

L. A. Prashanth ◽

Nathaniel Korda ◽

Rémi Munos

Keyword(s):

Linear Function ◽

Function Approximation ◽

Temporal Difference ◽

Temporal Difference Learning ◽

Uniform Sampling ◽

Linear Function Approximation ◽

Concentration Bounds ◽

Batch Data

Download Full-text

Studying Inertia Effects in Open Channel Flow Using Saint-Venant Equations

Water ◽

10.3390/w10111652 ◽

2018 ◽

Vol 10 (11) ◽

pp. 1652

Author(s):

Dong-Sin Shih ◽

Gour-Tsyh Yeh

Keyword(s):

Stokes Equations ◽

Field Application ◽

Benchmark Problems ◽

Algebraic Equations ◽

Step Size ◽

Time Step ◽

Cross Sectional ◽

Inertia Effects ◽

Time Step Size ◽

Saint Venant Equations

One-dimensional (1D) Saint-Venant equations, which originated from the Navier–Stokes equations, are usually applied to express the transient stream flow. The governing equation is based on the mass continuity and momentum equivalence. Its momentum equation, partially comprising the inertia, pressure, gravity, and friction-induced momentum loss terms, can be expressed as kinematic wave (KIW), diffusion wave (DIW), and fully dynamic wave (DYW) flow. In this study, the method of characteristics (MOCs) is used for solving the diagonalized Saint-Venant equations. A computer model, CAMP1DF, including KIW, DIW, and DYW approximations, is developed. Benchmark problems from MacDonald et al. (1997) are examined to study the accuracy of the CAMP1DF model. The simulations revealed that CAMP1DF can simulate almost identical results that are valid for various fluvial conditions. The proposed scheme that not only allows a large time step size but also solves half of the simultaneous algebraic equations. Simulations of accuracy and efficiency are both improved. Based on the physical relevance, the simulations clearly showed that the DYW approximation has the best performance, whereas the KIW approximation results in the largest errors. Moreover, the field non-prismatic case of the Zhuoshui River in central Taiwan is studied. The simulations indicate that the DYW approach does not ensure achievement of a better simulation result than the other two approximations. The investigated cross-sectional geometries play an important role in stream routing. Because of the consideration of the acceleration terms, the simulated hydrograph of a DYW reveals more physical characteristics, particularly regarding the raising and recession of limbs. Note that the KIW does not require assignment of a downstream boundary condition, making it more convenient for field application.

Download Full-text

KERNEL CONVERGENCE ESTIMATES FOR DIFFUSIONS WITH CONTINUOUS COEFFICIENTS

International Journal of Theoretical and Applied Finance ◽

10.1142/s0219024911006619 ◽

2011 ◽

Vol 14 (07) ◽

pp. 979-1004

Author(s):

CLAUDIO ALBANESE

Keyword(s):

Convergence Rate ◽

Convergence Rates ◽

Fourier Transforms ◽

Uniform Continuity ◽

Time Step ◽

Uniform Bounds ◽

Kernel Convergence ◽

Discretization Schemes ◽

A New Technique ◽

Derivatives Of

Bidirectional valuation models are based on numerical methods to obtain kernels of parabolic equations. Here we address the problem of robustness of kernel calculations vis a vis floating point errors from a theoretical standpoint. We are interested in kernels of one-dimensional diffusion equations with continuous coefficients as evaluated by means of explicit discretization schemes of uniform step h > 0 in the limit as h → 0. We consider both semidiscrete triangulations with continuous time and explicit Euler schemes with time step so small that the Courant condition is satisfied. We find uniform bounds for the convergence rate as a function of the degree of smoothness. We conjecture these bounds are indeed sharp. The bounds also apply to the time derivatives of the kernel and its first two space derivatives. The proof is constructive and is based on a new technique of path conditioning for Markov chains and a renormalization group argument. We make the simplifying assumption of time-independence and use longitudinal Fourier transforms in the time direction. Convergence rates depend on the degree of smoothness and Hölder differentiability of the coefficients. We find that the fastest convergence rate is of order O(h2) and is achieved if the coefficients have a bounded second derivative. Otherwise, explicit schemes still converge for any degree of Hölder differentiability except that the convergence rate is slower. Hölder continuity itself is not strictly necessary and can be relaxed by an hypothesis of uniform continuity.

Download Full-text

Convergence of Q-learning with linear function approximation

2007 European Control Conference (ECC) ◽

10.23919/ecc.2007.7068926 ◽

2007 ◽

Cited By ~ 6

Author(s):

Francisco S. Melo ◽

M. Isabel Ribeiro

Keyword(s):

Linear Function ◽

Function Approximation ◽

Q Learning ◽

Linear Function Approximation

Download Full-text

Local Discontinuous Galerkin Method for Nonlinear Time-Space Fractional Subdiffusion/Superdiffusion Equations

Mathematical Problems in Engineering ◽

10.1155/2020/6954239 ◽

2020 ◽

Vol 2020 ◽

pp. 1-21

Author(s):

Meilan Qiu ◽

Dewang Li ◽

Yanyun Wu

Keyword(s):

Convergence Rate ◽

Fractional Derivative ◽

Discontinuous Galerkin ◽

Numerical Schemes ◽

Step Size ◽

Time Step ◽

Local Discontinuous Galerkin ◽

Time Space ◽

Backward Euler ◽

Heterogeneous Rocks

Fractional partial differential equations with time-space fractional derivatives describe some important physical phenomena. For example, the subdiffusion equation (time order 0<α<1) is more suitable to describe the phenomena of charge carrier transport in amorphous semiconductors, nuclear magnetic resonance (NMR) diffusometry in percolative, Rouse, or reptation dynamics in polymeric systems, the diffusion of a scalar tracer in an array of convection rolls, or the dynamics of a bead in a polymeric network, and so on. However, the superdiffusion case (1<α<2) is more accurate to depict the special domains of rotating flows, collective slip diffusion on solid surfaces, layered velocity fields, Richardson turbulent diffusion, bulk-surface exchange controlled dynamics in porous glasses, the transport in micelle systems and heterogeneous rocks, quantum optics, single molecule spectroscopy, the transport in turbulent plasma, bacterial motion, and even for the flight of an albatross (for more physical applications of fractional sub-super diffusion equations, one can see Metzler and Klafter in 2000). In this work, we establish two fully discrete numerical schemes for solving a class of nonlinear time-space fractional subdiffusion/superdiffusion equations by using backward Euler difference 1<α<2 or second-order central difference 1<α<2/local discontinuous Galerkin finite element mixed method. By introducing the mathematical induction method, we show the concrete analysis for the stability and the convergence rate under the L2 norm of the two LDG schemes. In the end, we adopt several numerical experiments to validate the proposed model and demonstrate the features of the two numerical schemes, such as the optimal convergence rate in space direction is close to Ohk+1. The convergence rate in time direction can arrive at Oτ2−α when the fractional derivative is 0<α<1. If the fractional derivative parameter is 1<α<2 and we choose the relationship as h=C′τ (h denotes the space step size, C′ is a constant, and τ is the time step size), then the time convergence rate can reach to Oτ3−α. The experiment results illustrate that the proposed method is effective in solving nonlinear time-space fractional subdiffusion/superdiffusion equations.

Download Full-text

Q-Learning with Linear Function Approximation

Learning Theory - Lecture Notes in Computer Science ◽

10.1007/978-3-540-72927-3_23 ◽

2007 ◽

pp. 308-322 ◽

Cited By ~ 18

Author(s):

Francisco S. Melo ◽

M. Isabel Ribeiro

Keyword(s):

Linear Function ◽

Function Approximation ◽

Q Learning ◽

Linear Function Approximation

Download Full-text

A current-mode piecewise-linear function approximation circuit based on fuzzy-logic

ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187) ◽

10.1109/iscas.1998.703925 ◽

2002 ◽

Cited By ~ 2

Author(s):

N. Manaresi ◽

R. Rovatti ◽

E. Franchi ◽

G. Baccarani

Keyword(s):

Fuzzy Logic ◽

Linear Function ◽

Function Approximation ◽

Piecewise Linear ◽

Current Mode ◽

Piecewise Linear Function ◽

Linear Function Approximation ◽

A Current

Download Full-text

A piecewise-linear function approximation using current mode circuits

[Proceedings] 1992 IEEE International Symposium on Circuits and Systems ◽

10.1109/iscas.1992.230378 ◽

2003 ◽

Cited By ~ 18

Author(s):

J. Ramirez-Angulo ◽

E. Sanchez-Sinencio ◽

A. Rodriguez-Vazquez

Keyword(s):

Linear Function ◽

Function Approximation ◽

Piecewise Linear ◽

Current Mode ◽

Piecewise Linear Function ◽

Current Mode Circuits ◽

Linear Function Approximation

Download Full-text

Performance Analyses of IDEAL Algorithm on Highly Skewed Grid System

Advances in Mechanical Engineering ◽

10.1155/2014/813510 ◽

2014 ◽

Vol 6 ◽

pp. 813510

Author(s):

Dongliang Sun ◽

Jinliang Xu ◽

Peng Ding

Keyword(s):

Convergence Rate ◽

Curvilinear Coordinates ◽

Grid System ◽

Time Step ◽

Fine Grid ◽

Simpler Algorithm ◽

Inclined Cavity ◽

Transfer Problems ◽

Ideal Algorithm ◽

The Ideal

IDEAL is an efficient segregated algorithm for the fluid flow and heat transfer problems. This algorithm has now been extended to the 3D nonorthogonal curvilinear coordinates. Highly skewed grids in the nonorthogonal curvilinear coordinates can decrease the convergence rate and deteriorate the calculating stability. In this study, the feasibility of the IDEAL algorithm on highly skewed grid system is analyzed by investigating the lid-driven flow in the inclined cavity. It can be concluded that the IDEAL algorithm is more robust and more efficient than the traditional SIMPLER algorithm, especially for the highly skewed and fine grid system. For example, at θ = 5° and grid number = 70 × 70 × 70, the convergence rate of the IDEAL algorithm is 6.3 times faster than that of the SIMPLER algorithm, and the IDEAL algorithm can converge almost at any time step multiple.

Download Full-text