Multiconstrained Gliding Guidance Based on Optimal and Reinforcement Learning Method

In order to improve the autonomy of gliding guidance for complex flight missions, this paper proposes a multiconstrained intelligent gliding guidance strategy based on optimal guidance and reinforcement learning (RL). Three-dimensional optimal guidance is introduced to meet the terminal latitude, longitude, altitude, and flight-path-angle constraints. A velocity control strategy through lateral sinusoidal maneuver is proposed, and an analytical terminal velocity prediction method considering maneuvering flight is studied. Aiming at the problem that the maneuvering amplitude in velocity control cannot be determined offline, an intelligent parameter adjustment method based on RL is studied. This method considers parameter determination as a Markov Decision Process (MDP) and designs a state space via terminal speed and an action space with maneuvering amplitude. In addition, it constructs a reward function that integrates terminal velocity error and gliding guidance tasks and uses Q-Learning to achieve the online intelligent adjustment of maneuvering amplitude. The simulation results show that the intelligent gliding guidance method can meet various terminal constraints with high accuracy and can improve the autonomous decision-making ability under complex tasks effectively.

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Deep Reinforcement Learning for Multi-contact Motion Planning of Hexapod Robots

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/328 ◽

2021 ◽

Author(s):

Huiqiao Fu ◽

Kaiqiang Tang ◽

Peng Li ◽

Wenqi Zhang ◽

Xinpeng Wang ◽

...

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

State Transition ◽

Center Of Mass ◽

Legged Locomotion ◽

Careful Planning ◽

Physical Systems ◽

Reward Function ◽

Markov Decision ◽

Hexapod Robots

Legged locomotion in a complex environment requires careful planning of the footholds of legged robots. In this paper, a novel Deep Reinforcement Learning (DRL) method is proposed to implement multi-contact motion planning for hexapod robots moving on uneven plum-blossom piles. First, the motion of hexapod robots is formulated as a Markov Decision Process (MDP) with a speciﬁed reward function. Second, a transition feasibility model is proposed for hexapod robots, which describes the feasibility of the state transition under the condition of satisfying kinematics and dynamics, and in turn determines the rewards. Third, the footholds and Center-of-Mass (CoM) sequences are sampled from a diagonal Gaussian distribution and the sequences are optimized through learning the optimal policies using the designed DRL algorithm. Both of the simulation and experimental results on physical systems demonstrate the feasibility and efficiency of the proposed method. Videos are shown at https://videoviewpage.wixsite.com/mcrl.

Download Full-text

MODEL-BASED SECURITY ANALYSIS OF FPGA DESIGNS THROUGH REINFORCEMENT LEARNING

Acta Polytechnica ◽

10.14311/ap.2019.59.0518 ◽

2019 ◽

Vol 59 (5) ◽

pp. 518-526

Author(s):

Michael Vetter

Keyword(s):

Reinforcement Learning ◽

Security Analysis ◽

Domain Specific Language ◽

System Description ◽

Hierarchical Learning ◽

Reward Function ◽

Domain Specific ◽

Markov Decision ◽

High Level ◽

Multiple Scenarios

Finding potential security weaknesses in any complex IT system is an important and often challenging task best started in the early stages of the development process. We present a method that transforms this task for FPGA designs into a reinforcement learning (RL) problem. This paper introduces a method to generate a Markov Decision Process based RL model from a formal, high-level system description (formulated in the domain-specific language) of the system under review and different, quantified assumptions about the system’s security. Probabilistic transitions and the reward function can be used to model the varying resilience of different elements against attacks and the capabilities of an attacker. This information is then used to determine a plausible data exfiltration strategy. An example with multiple scenarios illustrates the workflow. A discussion of supplementary techniques like hierarchical learning and deep neural networks concludes this paper.

Download Full-text

FLOW SHOP SCHEDULING WITH REINFORCEMENT LEARNING

Asia Pacific Journal of Operational Research ◽

10.1142/s0217595913500140 ◽

2013 ◽

Vol 30 (05) ◽

pp. 1350014 ◽

Cited By ~ 2

Author(s):

ZHICONG ZHANG ◽

WEIPING WANG ◽

SHOUYAN ZHONG ◽

KAISHUN HU

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Large Scale ◽

Flow Shop ◽

Flow Shop Scheduling ◽

Scheduling Problems ◽

Shop Scheduling ◽

Reward Function ◽

Markov Decision

Reinforcement learning (RL) is a state or action value based machine learning method which solves large-scale multi-stage decision problems such as Markov Decision Process (MDP) and Semi-Markov Decision Process (SMDP) problems. We minimize the makespan of flow shop scheduling problems with an RL algorithm. We convert flow shop scheduling problems into SMDPs by constructing elaborate state features, actions and the reward function. Minimizing the accumulated reward is equivalent to minimizing the schedule objective function. We apply on-line TD(λ) algorithm with linear gradient-descent function approximation to solve the SMDPs. To examine the performance of the proposed RL algorithm, computational experiments are conducted on benchmarking problems in comparison with other scheduling methods. The experimental results support the efficiency of the proposed algorithm and illustrate that the RL approach is a promising computational approach for flow shop scheduling problems worthy of further investigation.

Download Full-text

Object Affordance Driven Inverse Reinforcement Learning Through Conceptual Abstraction and Advice

Paladyn Journal of Behavioral Robotics ◽

10.1515/pjbr-2018-0021 ◽

2018 ◽

Vol 9 (1) ◽

pp. 277-294 ◽

Cited By ~ 1

Author(s):

Rupam Bhattacharyya ◽

Shyamanta M. Hazarika

Keyword(s):

Reinforcement Learning ◽

High Dimensional ◽

Inverse Reinforcement Learning ◽

Intent Recognition ◽

Reward Function ◽

Object Affordances ◽

Learning Agent ◽

Markov Decision ◽

Observed Behaviour ◽

Object Affordance

Abstract Within human Intent Recognition (IR), a popular approach to learning from demonstration is Inverse Reinforcement Learning (IRL). IRL extracts an unknown reward function from samples of observed behaviour. Traditional IRL systems require large datasets to recover the underlying reward function. Object affordances have been used for IR. Existing literature on recognizing intents through object affordances fall short of utilizing its true potential. In this paper, we seek to develop an IRL system which drives human intent recognition along with the capability to handle high dimensional demonstrations exploiting the capability of object affordances. An architecture for recognizing human intent is presented which consists of an extended Maximum Likelihood Inverse Reinforcement Learning agent. Inclusion of Symbolic Conceptual Abstraction Engine (SCAE) along with an advisor allows the agent to work on Conceptually Abstracted Markov Decision Process. The agent recovers object affordance based reward function from high dimensional demonstrations. This function drives a Human Intent Recognizer through identification of probable intents. Performance of the resulting system on the standard CAD-120 dataset shows encouraging result.

Download Full-text

Efficient PAC Reinforcement Learning in Regular Decision Processes

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/279 ◽

2021 ◽

Author(s):

Alessandro Ronca ◽

Giuseppe De Giacomo

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Polynomial Time ◽

Optimal Policy ◽

Decision Process ◽

Transition Function ◽

Decision Processes ◽

Reward Function ◽

Markov Decision ◽

Reward Functions

Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.

Download Full-text

Multi-Frame Star Image Denoising Algorithm Based on Deep Reinforcement Learning and Mixed Poisson–Gaussian Likelihood

Sensors ◽

10.3390/s20215983 ◽

2020 ◽

Vol 20 (21) ◽

pp. 5983

Author(s):

Ming Xie ◽

Zhenduo Zhang ◽

Wenbo Zheng ◽

Ying Li ◽

Kai Cao

Keyword(s):

Reinforcement Learning ◽

Image Denoising ◽

Gaussian Noise ◽

Likelihood Function ◽

Learning Algorithm ◽

Likelihood Estimation ◽

Star Image ◽

Reward Function ◽

Markov Decision ◽

Star Images

Mixed Poisson–Gaussian noise exists in the star images and is difficult to be effectively suppressed via maximum likelihood estimation (MLE) method due to its complicated likelihood function. In this article, the MLE method is incorporated with a state-of-the-art machine learning algorithm in order to achieve accurate restoration results. By applying the mixed Poisson–Gaussian likelihood function as the reward function of a reinforcement learning algorithm, an agent is able to form the restored image that achieves the maximum value of the complex likelihood function through the Markov Decision Process (MDP). In order to provide the appropriate parameter settings of the denoising model, the key hyperparameters of the model and their influences on denoising results are tested through simulated experiments. The model is then compared with two existing star image denoising methods so as to verify its performance. The experiment results indicate that this algorithm based on reinforcement learning is able to suppress the mixed Poisson–Gaussian noise in the star image more accurately than the traditional MLE method, as well as the method based on the deep convolutional neural network (DCNN).

Download Full-text

Three-Dimensional Path-Following Control of a Robotic Airship with Reinforcement Learning

International Journal of Aerospace Engineering ◽

10.1155/2019/7854173 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 7

Author(s):

Chunyu Nie ◽

Zewei Zheng ◽

Ming Zhu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Three Dimensional ◽

Path Following ◽

Cerebellar Model Articulation Controller ◽

Intelligent Decision Making ◽

Cmac Neural Network ◽

Path Following Control ◽

Markov Decision ◽

Robotic Airship

This paper proposed an adaptive three-dimensional (3D) path-following control design for a robotic airship based on reinforcement learning. The airship 3D path-following control is decomposed into the altitude control and the planar path-following control, and the Markov decision process (MDP) models of the control problems are established, in which the scale of the state space is reduced by parameter simplification and coordinate transformation. To ensure the control adaptability without dependence on an accurate airship dynamic model, a Q-Learning algorithm is directly adopted for learning the action policy of actuator commands, and the controller is trained online based on actual motion. A cerebellar model articulation controller (CMAC) neural network is employed for experience generalization to accelerate the training process. Simulation results demonstrate that the proposed controllers can achieve comparable performance to the well-tuned proportion integral differential (PID) controllers and have a more intelligent decision-making ability.

Download Full-text

An Overview of Inverse Reinforcement Learning Techniques

Intelligent Environments 2021 - Ambient Intelligence and Smart Environments ◽

10.3233/aise210097 ◽

2021 ◽

Author(s):

Syed Ihtesham Hussain Shah ◽

Giuseppe De Pietro

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Decision Process ◽

Autonomous Agents ◽

Theoretical Background ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Learning Techniques ◽

Markov Decision ◽

Potential Use

In decision-making problems reward function plays an important role in finding the best policy. Reinforcement Learning (RL) provides a solution for decision-making problems under uncertainty in an Intelligent Environment (IE). However, it is difficult to specify the reward function for RL agents in large and complex problems. To counter these problems an extension of RL problem named Inverse Reinforcement Learning (IRL) is introduced, where reward function is learned from expert demonstrations. IRL is appealing for its potential use to build autonomous agents, capable of modeling others, deprived of compromising in performance of the task. This approach of learning by demonstrations relies on the framework of Markov Decision Process (MDP). This article elaborates original IRL algorithms along with their close variants to mitigate challenges. The purpose of this paper is to highlight an overview and theoretical background of IRL in the field of Machine Learning (ML) and Artificial Intelligence (AI). We presented a brief comparison between different variants of IRL in this article.

Download Full-text

Joint Selection using Deep Reinforcement Learning for Skeleton-based Activity Recognition

10.36227/techrxiv.14887869.v1 ◽

2021 ◽

Author(s):

Bahareh Nikpour ◽

Narges Armanfard

Keyword(s):

Reinforcement Learning ◽

Activity Recognition ◽

Human Activity ◽

Three Dimensional ◽

Human Activity Recognition ◽

Selection Method ◽

The Body ◽

Wide Range ◽

Markov Decision ◽

Body Joints

<div>Skeleton based human activity recognition has attracted lots of attention due to its wide range of applications. Skeleton data includes two or three dimensional coordinates of body joints. All of the body joints are not effective in recognizing different activities, so finding key joints within a video and across different activities has a significant role in improving the performance. In this paper we propose a novel framework that performs joint selection in skeleton video frames for the purpose of human activity recognition. To this end, we formulate the joint selection problem as a Markov Decision Process (MDP) where we employ deep reinforcement learning to find the most informative joints per frame. The proposed joint selection method is a general framework that can be employed to improve human activity classification methods. Experimental results on two benchmark activity recognition data sets using three different classifiers demonstrate effectiveness of the proposed joint selection method.</div>

Download Full-text