Two-Stage Reinforcement Learning Policy Search for Grid-Interactive Building Control

This paper presents a novel and robust two-stage pursuit strategy for the incomplete-information impulsive space pursuit-evasion missions considering the J2 perturbation. The strategy firstly models the impulsive pursuit-evasion game problem into a far-distance rendezvous stage and a close-distance game stage according to the perception range of the evader. For the far-distance rendezvous stage, it is transformed into a rendezvous trajectory optimization problem and a new objective function is proposed to obtain the pursuit trajectory with the optimal terminal pursuit capability. For the close-distance game stage, a closed-loop pursuit approach is proposed using one of the reinforcement learning algorithms, i.e., the deep deterministic policy gradient algorithm, to solve and update the pursuit trajectory for the incomplete-information impulsive pursuit-evasion missions. The feasibility of this novel strategy and its robustness to different initial states of the pursuer and evader and to the evasion strategies are demonstrated for the sun-synchronous orbit pursuit-evasion game scenarios. The results of the Monte Carlo tests show that the successful pursuit ratio of the proposed method is over 91% for all the given scenarios.

Download Full-text

Postural Control of Two-Stage Inverted Pendulum Using Reinforcement Learning and Self-organizing Map

Adaptive and Natural Computing Algorithms - Lecture Notes in Computer Science ◽

10.1007/978-3-540-71629-7_81 ◽

2007 ◽

pp. 722-729

Author(s):

Jae-kang Lee ◽

Tae-seok Oh ◽

Yun-su Shin ◽

Tae-jun Yoon ◽

Il-hwan Kim

Keyword(s):

Reinforcement Learning ◽

Postural Control ◽

Inverted Pendulum ◽

Self Organizing Map ◽

Two Stage ◽

Self Organizing

Download Full-text

Control of an Acrobot system using reinforcement learning with probabilistic policy search

10.1109/anzcc53563.2021.9628194 ◽

2021 ◽

Author(s):

N. Snehal ◽

W. Pooja ◽

K. Sonam ◽

S. R. Wagh ◽

N. M. Singh

Keyword(s):

Reinforcement Learning ◽

Policy Search

Download Full-text

Autonomous helicopter control using reinforcement learning policy search methods

Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164) ◽

10.1109/robot.2001.932842 ◽

2002 ◽

Cited By ~ 72

Author(s):

J.A. Bagnell ◽

J.G. Schneider

Keyword(s):

Reinforcement Learning ◽

Search Methods ◽

Helicopter Control ◽

Policy Search ◽

Autonomous Helicopter

Download Full-text

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

2018 Annual American Control Conference (ACC) ◽

10.23919/acc.2018.8431181 ◽

2018 ◽

Cited By ~ 9

Author(s):

Xiao Li ◽

Yao Ma ◽

Calin Belta

Keyword(s):

Reinforcement Learning ◽

Temporal Logic ◽

Search Method ◽

Policy Search ◽

Learning Tasks

Download Full-text

Verifiable and Interpretable Reinforcement Learning through Program Synthesis

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019902 ◽

2019 ◽

Vol 33 ◽

pp. 9902-9903

Author(s):

Abhinav Verma

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Reinforcement Learning ◽

Programming Languages ◽

Formal Methods ◽

Domain Knowledge ◽

Policy Search ◽

Safety Critical ◽

Symbolic Methods

We study the problem of generating interpretable and verifiable policies for Reinforcement Learning (RL). Unlike the popular Deep Reinforcement Learning (DRL) paradigm, in which the policy is represented by a neural network, the aim of this work is to find policies that can be represented in highlevel programming languages. Such programmatic policies have several benefits, including being more easily interpreted than neural networks, and being amenable to verification by scalable symbolic methods. The generation methods for programmatic policies also provide a mechanism for systematically using domain knowledge for guiding the policy search. The interpretability and verifiability of these policies provides the opportunity to deploy RL based solutions in safety critical environments. This thesis draws on, and extends, work from both the machine learning and formal methods communities.

Download Full-text

Policy Search in Infinite-Horizon Discounted Reinforcement Learning: Advances through Connections to Non-Convex Optimization : Invited Presentation

2019 53rd Annual Conference on Information Sciences and Systems (CISS) ◽

10.1109/ciss.2019.8693017 ◽

2019 ◽

Author(s):

Kaiqing Zhang ◽

Alec Koppel ◽

Hao Zhu ◽

Tamer Bascar

Keyword(s):

Reinforcement Learning ◽

Convex Optimization ◽

Infinite Horizon ◽

Policy Search

Download Full-text