policy iteration algorithm
Recently Published Documents


TOTAL DOCUMENTS

65
(FIVE YEARS 12)

H-INDEX

14
(FIVE YEARS 1)

2020 ◽  
Vol 4 (3) ◽  
pp. 686-691
Author(s):  
Navid Moshtaghi Yazdani ◽  
Reihaneh Kardehi Moghaddam ◽  
Bahare Kiumarsi ◽  
Hamidreza Modares

2020 ◽  
Vol 144 ◽  
pp. 01001
Author(s):  
Iram Parvez ◽  
Jianjian Shen

In hydro scheduling, unit commitment is a complex sub-problem. This paper proposes a new approximate dynamic programming technique to solve unit commitment. A new method called Least Square Policy Iteration (LSPI) algorithm is introduced which is efficient and faster in convergence. This algorithm takes the properties of widely used algorithm least square temporal difference (LSTD), enhance it further and make it useful for optimization problems. First value function is to find a fixed policy by using least square temporal difference Q (LSTDQ) algorithm which is similar to LSTD, then LSPI is introduced for making the policy iteration algorithm by using the results of LSTDQ. It combines the data efficiency of LSTDQ and policy-search efficiency of policy iteration.


Author(s):  
Pierre-Loïc Garoche

This chapter considers other configurations aside from the direct synthesis of invariants as bound templates. A first case arises when the methods shown in the previous chapter only synthesizes the template but not the bound. A second appears when one wants to analyze a system with multiple templates. This chapter looks at bounds on each variable and considers the templates 𝑝‎(𝑥‎) = 𝑥²‎𝑖‎ for each variable 𝑥‎𝑖‎ in state characterization 𝑥‎ ∈‎ Σ‎. The chapter thus proposes a policy iteration algorithm, based on sum-of-squares (SOS) optimization, to refine such template bounds. In practice, the chapter uses it by combining a Lyapunov-based template obtained using one of the previous methods with additional templates encoding bounds on some variables or property specific templates.


Sign in / Sign up

Export Citation Format

Share Document