Two-Stage Reinforcement Learning Policy Search for Grid-Interactive Building Control

2022 ◽  
pp. 1-1
Author(s):  
Xiangyu Zhang ◽  
Yue Chen ◽  
Andrey Bernstein ◽  
Rohit Chintala ◽  
Peter Graf ◽  
...  
2021 ◽  
Vol 6 (2) ◽  
pp. 1950-1957
Author(s):  
Zhe Hu ◽  
Yu Zheng ◽  
Jia Pan

Aerospace ◽  
2021 ◽  
Vol 8 (10) ◽  
pp. 299
Author(s):  
Bin Yang ◽  
Pengxuan Liu ◽  
Jinglang Feng ◽  
Shuang Li

This paper presents a novel and robust two-stage pursuit strategy for the incomplete-information impulsive space pursuit-evasion missions considering the J2 perturbation. The strategy firstly models the impulsive pursuit-evasion game problem into a far-distance rendezvous stage and a close-distance game stage according to the perception range of the evader. For the far-distance rendezvous stage, it is transformed into a rendezvous trajectory optimization problem and a new objective function is proposed to obtain the pursuit trajectory with the optimal terminal pursuit capability. For the close-distance game stage, a closed-loop pursuit approach is proposed using one of the reinforcement learning algorithms, i.e., the deep deterministic policy gradient algorithm, to solve and update the pursuit trajectory for the incomplete-information impulsive pursuit-evasion missions. The feasibility of this novel strategy and its robustness to different initial states of the pursuer and evader and to the evasion strategies are demonstrated for the sun-synchronous orbit pursuit-evasion game scenarios. The results of the Monte Carlo tests show that the successful pursuit ratio of the proposed method is over 91% for all the given scenarios.


2021 ◽  
Author(s):  
N. Snehal ◽  
W. Pooja ◽  
K. Sonam ◽  
S. R. Wagh ◽  
N. M. Singh

Author(s):  
Abhinav Verma

We study the problem of generating interpretable and verifiable policies for Reinforcement Learning (RL). Unlike the popular Deep Reinforcement Learning (DRL) paradigm, in which the policy is represented by a neural network, the aim of this work is to find policies that can be represented in highlevel programming languages. Such programmatic policies have several benefits, including being more easily interpreted than neural networks, and being amenable to verification by scalable symbolic methods. The generation methods for programmatic policies also provide a mechanism for systematically using domain knowledge for guiding the policy search. The interpretability and verifiability of these policies provides the opportunity to deploy RL based solutions in safety critical environments. This thesis draws on, and extends, work from both the machine learning and formal methods communities.


Sign in / Sign up

Export Citation Format

Share Document