Autonomous helicopter control using reinforcement learning policy search methods

Purpose The purpose of this study is to examine the effect of passive and active morphing of blade root chord length and blade taper on the control effort of the flight control system (FCS) of a helicopter. Design/methodology/approach Physics-based helicopter models, which are functions of passive and active morphing, are created and applied in helicopter FCS design to determine the control effort. Findings Helicopters, having both passively and actively morphing blade root chord length and blade taper, experience less control effort than the ones having either only passively morphing blade root chord length or only blade taper or only actively morphing blade root chord length and blade taper. Practical implications Both passively and actively morphing blade root chord length and blade taper can be implemented for more economical autonomous helicopter flights. Originality/value Main novelty of our article is simultaneous application of passive and active morphing ideas on helicopter root chord length and blade taper. It is also proved in this study that using both passive and active morphing ideas on helicopter blade root chord and blade taper causes much less energy consumption than using either only passive morphing idea on helicopter blade root chord and blade taper or only active morphing idea on helicopter blade root chord and blade taper. This also reduces fuel consumption and also makes environment cleaner.

Download Full-text

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

2018 Annual American Control Conference (ACC) ◽

10.23919/acc.2018.8431181 ◽

2018 ◽

Cited By ~ 9

Author(s):

Xiao Li ◽

Yao Ma ◽

Calin Belta

Keyword(s):

Reinforcement Learning ◽

Temporal Logic ◽

Search Method ◽

Policy Search ◽

Learning Tasks

Download Full-text

Verifiable and Interpretable Reinforcement Learning through Program Synthesis

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019902 ◽

2019 ◽

Vol 33 ◽

pp. 9902-9903

Author(s):

Abhinav Verma

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Reinforcement Learning ◽

Programming Languages ◽

Formal Methods ◽

Domain Knowledge ◽

Policy Search ◽

Safety Critical ◽

Symbolic Methods

We study the problem of generating interpretable and verifiable policies for Reinforcement Learning (RL). Unlike the popular Deep Reinforcement Learning (DRL) paradigm, in which the policy is represented by a neural network, the aim of this work is to find policies that can be represented in highlevel programming languages. Such programmatic policies have several benefits, including being more easily interpreted than neural networks, and being amenable to verification by scalable symbolic methods. The generation methods for programmatic policies also provide a mechanism for systematically using domain knowledge for guiding the policy search. The interpretability and verifiability of these policies provides the opportunity to deploy RL based solutions in safety critical environments. This thesis draws on, and extends, work from both the machine learning and formal methods communities.

Download Full-text

Policy Search in Infinite-Horizon Discounted Reinforcement Learning: Advances through Connections to Non-Convex Optimization : Invited Presentation

2019 53rd Annual Conference on Information Sciences and Systems (CISS) ◽

10.1109/ciss.2019.8693017 ◽

2019 ◽

Author(s):

Kaiqing Zhang ◽

Alec Koppel ◽

Hao Zhu ◽

Tamer Bascar

Keyword(s):

Reinforcement Learning ◽

Convex Optimization ◽

Infinite Horizon ◽

Policy Search

Download Full-text

Reinforcement learning with guided policy search using Gaussian processes

The 2012 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2012.6252509 ◽

2012 ◽

Cited By ~ 3

Author(s):

Hunor S. Jakab ◽

Lehel Csato

Keyword(s):

Reinforcement Learning ◽

Gaussian Processes ◽

Policy Search

Download Full-text

Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning

Neural Computation ◽

10.1162/neco_a_00199 ◽

2011 ◽

Vol 23 (11) ◽

pp. 2798-2832 ◽

Cited By ~ 11

Author(s):

Hirotaka Hachiya ◽

Jan Peters ◽

Masashi Sugiyama

Keyword(s):

Reinforcement Learning ◽

Expectation Maximization ◽

Search Method ◽

Weighted Regression ◽

High Dimensional ◽

Extended Version ◽

Conference Paper ◽

Policy Search ◽

Learning Framework ◽

Sampling Cost

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R[Formula: see text]), is demonstrated through robot learning experiments. (This letter is an extended version of our earlier conference paper: Hachiya, Peters, & Sugiyama, 2009 .)

Download Full-text