scholarly journals Multiconstrained Gliding Guidance Based on Optimal and Reinforcement Learning Method

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Luo Zhe ◽  
Li Xinsan ◽  
Wang Lixin ◽  
Shen Qiang

In order to improve the autonomy of gliding guidance for complex flight missions, this paper proposes a multiconstrained intelligent gliding guidance strategy based on optimal guidance and reinforcement learning (RL). Three-dimensional optimal guidance is introduced to meet the terminal latitude, longitude, altitude, and flight-path-angle constraints. A velocity control strategy through lateral sinusoidal maneuver is proposed, and an analytical terminal velocity prediction method considering maneuvering flight is studied. Aiming at the problem that the maneuvering amplitude in velocity control cannot be determined offline, an intelligent parameter adjustment method based on RL is studied. This method considers parameter determination as a Markov Decision Process (MDP) and designs a state space via terminal speed and an action space with maneuvering amplitude. In addition, it constructs a reward function that integrates terminal velocity error and gliding guidance tasks and uses Q-Learning to achieve the online intelligent adjustment of maneuvering amplitude. The simulation results show that the intelligent gliding guidance method can meet various terminal constraints with high accuracy and can improve the autonomous decision-making ability under complex tasks effectively.

2021 ◽  
Author(s):  
Stav Belogolovsky ◽  
Philip Korsunsky ◽  
Shie Mannor ◽  
Chen Tessler ◽  
Tom Zahavy

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.


Author(s):  
Huiqiao Fu ◽  
Kaiqiang Tang ◽  
Peng Li ◽  
Wenqi Zhang ◽  
Xinpeng Wang ◽  
...  

Legged locomotion in a complex environment requires careful planning of the footholds of legged robots. In this paper, a novel Deep Reinforcement Learning (DRL) method is proposed to implement multi-contact motion planning for hexapod robots moving on uneven plum-blossom piles. First, the motion of hexapod robots is formulated as a Markov Decision Process (MDP) with a specified reward function. Second, a transition feasibility model is proposed for hexapod robots, which describes the feasibility of the state transition under the condition of satisfying kinematics and dynamics, and in turn determines the rewards. Third, the footholds and Center-of-Mass (CoM) sequences are sampled from a diagonal Gaussian distribution and the sequences are optimized through learning the optimal policies using the designed DRL algorithm. Both of the simulation and experimental results on physical systems demonstrate the feasibility and efficiency of the proposed method. Videos are shown at https://videoviewpage.wixsite.com/mcrl.


2019 ◽  
Vol 59 (5) ◽  
pp. 518-526
Author(s):  
Michael Vetter

Finding potential security weaknesses in any complex IT system is an important and often challenging task best started in the early stages of the development process. We present a method that transforms this task for FPGA designs into a reinforcement learning (RL) problem. This paper introduces a method to generate a Markov Decision Process based RL model from a formal, high-level system description (formulated in the domain-specific language) of the system under review and different, quantified assumptions about the system’s security. Probabilistic transitions and the reward function can be used to model the varying resilience of different elements against attacks and the capabilities of an attacker. This information is then used to determine a plausible data exfiltration strategy. An example with multiple scenarios illustrates the workflow. A discussion of supplementary techniques like hierarchical learning and deep neural networks concludes this paper.


2013 ◽  
Vol 30 (05) ◽  
pp. 1350014 ◽  
Author(s):  
ZHICONG ZHANG ◽  
WEIPING WANG ◽  
SHOUYAN ZHONG ◽  
KAISHUN HU

Reinforcement learning (RL) is a state or action value based machine learning method which solves large-scale multi-stage decision problems such as Markov Decision Process (MDP) and Semi-Markov Decision Process (SMDP) problems. We minimize the makespan of flow shop scheduling problems with an RL algorithm. We convert flow shop scheduling problems into SMDPs by constructing elaborate state features, actions and the reward function. Minimizing the accumulated reward is equivalent to minimizing the schedule objective function. We apply on-line TD(λ) algorithm with linear gradient-descent function approximation to solve the SMDPs. To examine the performance of the proposed RL algorithm, computational experiments are conducted on benchmarking problems in comparison with other scheduling methods. The experimental results support the efficiency of the proposed algorithm and illustrate that the RL approach is a promising computational approach for flow shop scheduling problems worthy of further investigation.


2018 ◽  
Vol 9 (1) ◽  
pp. 277-294 ◽  
Author(s):  
Rupam Bhattacharyya ◽  
Shyamanta M. Hazarika

Abstract Within human Intent Recognition (IR), a popular approach to learning from demonstration is Inverse Reinforcement Learning (IRL). IRL extracts an unknown reward function from samples of observed behaviour. Traditional IRL systems require large datasets to recover the underlying reward function. Object affordances have been used for IR. Existing literature on recognizing intents through object affordances fall short of utilizing its true potential. In this paper, we seek to develop an IRL system which drives human intent recognition along with the capability to handle high dimensional demonstrations exploiting the capability of object affordances. An architecture for recognizing human intent is presented which consists of an extended Maximum Likelihood Inverse Reinforcement Learning agent. Inclusion of Symbolic Conceptual Abstraction Engine (SCAE) along with an advisor allows the agent to work on Conceptually Abstracted Markov Decision Process. The agent recovers object affordance based reward function from high dimensional demonstrations. This function drives a Human Intent Recognizer through identification of probable intents. Performance of the resulting system on the standard CAD-120 dataset shows encouraging result.


Author(s):  
Alessandro Ronca ◽  
Giuseppe De Giacomo

Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.


Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 5983
Author(s):  
Ming Xie ◽  
Zhenduo Zhang ◽  
Wenbo Zheng ◽  
Ying Li ◽  
Kai Cao

Mixed Poisson–Gaussian noise exists in the star images and is difficult to be effectively suppressed via maximum likelihood estimation (MLE) method due to its complicated likelihood function. In this article, the MLE method is incorporated with a state-of-the-art machine learning algorithm in order to achieve accurate restoration results. By applying the mixed Poisson–Gaussian likelihood function as the reward function of a reinforcement learning algorithm, an agent is able to form the restored image that achieves the maximum value of the complex likelihood function through the Markov Decision Process (MDP). In order to provide the appropriate parameter settings of the denoising model, the key hyperparameters of the model and their influences on denoising results are tested through simulated experiments. The model is then compared with two existing star image denoising methods so as to verify its performance. The experiment results indicate that this algorithm based on reinforcement learning is able to suppress the mixed Poisson–Gaussian noise in the star image more accurately than the traditional MLE method, as well as the method based on the deep convolutional neural network (DCNN).


2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Chunyu Nie ◽  
Zewei Zheng ◽  
Ming Zhu

This paper proposed an adaptive three-dimensional (3D) path-following control design for a robotic airship based on reinforcement learning. The airship 3D path-following control is decomposed into the altitude control and the planar path-following control, and the Markov decision process (MDP) models of the control problems are established, in which the scale of the state space is reduced by parameter simplification and coordinate transformation. To ensure the control adaptability without dependence on an accurate airship dynamic model, a Q-Learning algorithm is directly adopted for learning the action policy of actuator commands, and the controller is trained online based on actual motion. A cerebellar model articulation controller (CMAC) neural network is employed for experience generalization to accelerate the training process. Simulation results demonstrate that the proposed controllers can achieve comparable performance to the well-tuned proportion integral differential (PID) controllers and have a more intelligent decision-making ability.


Author(s):  
Syed Ihtesham Hussain Shah ◽  
Giuseppe De Pietro

In decision-making problems reward function plays an important role in finding the best policy. Reinforcement Learning (RL) provides a solution for decision-making problems under uncertainty in an Intelligent Environment (IE). However, it is difficult to specify the reward function for RL agents in large and complex problems. To counter these problems an extension of RL problem named Inverse Reinforcement Learning (IRL) is introduced, where reward function is learned from expert demonstrations. IRL is appealing for its potential use to build autonomous agents, capable of modeling others, deprived of compromising in performance of the task. This approach of learning by demonstrations relies on the framework of Markov Decision Process (MDP). This article elaborates original IRL algorithms along with their close variants to mitigate challenges. The purpose of this paper is to highlight an overview and theoretical background of IRL in the field of Machine Learning (ML) and Artificial Intelligence (AI). We presented a brief comparison between different variants of IRL in this article.


2021 ◽  
Author(s):  
Bahareh Nikpour ◽  
Narges Armanfard

<div>Skeleton based human activity recognition has attracted lots of attention due to its wide range of applications. Skeleton data includes two or three dimensional coordinates of body joints. All of the body joints are not effective in recognizing different activities, so finding key joints within a video and across different activities has a significant role in improving the performance. In this paper we propose a novel framework that performs joint selection in skeleton video frames for the purpose of human activity recognition. To this end, we formulate the joint selection problem as a Markov Decision Process (MDP) where we employ deep reinforcement learning to find the most informative joints per frame. The proposed joint selection method is a general framework that can be employed to improve human activity classification methods. Experimental results on two benchmark activity recognition data sets using three different classifiers demonstrate effectiveness of the proposed joint selection method.</div>


Sign in / Sign up

Export Citation Format

Share Document