scholarly journals Dynamic regret convergence analysis and an adaptive regularization algorithm for on-policy robot imitation learning

2021 ◽  
pp. 027836492098587
Author(s):  
Jonathan N. Lee ◽  
Michael Laskey ◽  
Ajay Kumar Tanwani ◽  
Anil Aswani ◽  
Ken Goldberg

On-policy imitation learning algorithms such as DAgger evolve a robot control policy by executing it, measuring performance (loss), obtaining corrective feedback from a supervisor, and generating the next policy. As the loss between iterations can vary unpredictably, a fundamental question is under what conditions this process will eventually achieve a converged policy. If one assumes the underlying trajectory distribution is static (stationary), it is possible to prove convergence for DAgger. However, in more realistic models for robotics, the underlying trajectory distribution is dynamic because it is a function of the policy. Recent results show it is possible to prove convergence of DAgger when a regularity condition on the rate of change of the trajectory distributions is satisfied. In this article, we reframe this result using dynamic regret theory from the field of online optimization and show that dynamic regret can be applied to any on-policy algorithm to analyze its convergence and optimality. These results inspire a new algorithm, Adaptive On-Policy Regularization (Aor), that ensures the conditions for convergence. We present simulation results with cart–pole balancing and locomotion benchmarks that suggest Aor can significantly decrease dynamic regret and chattering as the robot learns. To the best of the authors’ knowledge, this is the first application of dynamic regret theory to imitation learning.

2019 ◽  
Vol 39 (2-3) ◽  
pp. 286-302 ◽  
Author(s):  
Yunpeng Pan ◽  
Ching-An Cheng ◽  
Kamil Saigol ◽  
Keuntaek Lee ◽  
Xinyan Yan ◽  
...  

We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost on-board sensors. By imitating a model predictive controller equipped with advanced sensors, we train a deep neural network control policy to map raw, high-dimensional observations to continuous steering and throttle commands. Compared with recent approaches to similar tasks, our method requires neither state estimation nor on-the-fly planning to navigate the vehicle. Our approach relies on, and experimentally validates, recent imitation learning theory. Empirically, we show that policies trained with online imitation learning overcome well-known challenges related to covariate shift and generalize better than policies trained with batch imitation learning. Built on these insights, our autonomous driving system demonstrates successful high-speed off-road driving, matching the state-of-the-art performance.


2020 ◽  
Author(s):  
Gabriel Moraes Barros ◽  
Esther Colombini

In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt, and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. Reinforcement Learning (RL) aims at addressing this problem by enabling a robot to learn behaviors through trial-and-error. With RL, a Neural Network can be trained as a function approximator to directly map states to actuator commands making any predefined control structure not-needed for training. However, the knowledge required to converge these methods is usually built from scratch. Learning may take a long time, not to mention that RL algorithms need a stated reward function. Sometimes, it is not trivial to define one. Often it is easier for a teacher, human or intelligent agent, do demonstrate the desired behavior or how to accomplish a given task. Humans and other animals have a natural ability to learn skills from observation, often from merely seeing these skills’ effects: without direct knowledge of the underlying actions. The same principle exists in Imitation Learning, a practical approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. In this scenario, this work’s primary objective is to design an agent that can successfully imitate a prior acquired control policy using Imitation Learning. The chosen algorithm is GAIL since we consider that it is the proper algorithm to tackle this problem by utilizing expert (state, action) trajectories. As reference expert trajectories, we implement state-of-the-art on and off-policy methods PPO and SAC. Results show that the learned policies for all three methods can solve the task of low-level control of a quadrotor and that all can account for generalization on the original tasks.


Author(s):  
M Rizk

Hydraulic pumps/motors can be connected to different systems which utilize non-conventional energy. These systems produce unsteady output energy even when a control policy is adopted to stabilize it. Therefore, in such systems, hydraulic pumps/motors work under dynamic conditions. This paper presents examples of wind energy conversion systems with different control policies. The available daily wind-power distributions at El-Minia, Wadi El-Natrun and Hurghada in Egypt are calculated. A suitable cut-in wind power is chosen for each site. The maximum rate of change of the available power is determined. The starting input power required to drive a hydraulic gear pump and a hydraulic gear motor, experimentally measured, is compared with the cut-in wind power at the studied sites. An example of the speed dynamic response of the pump and motor is illustrated. The study in this paper reveals that specifications of hydraulic pumps/motors should include, in addition to the static performance, both their frictional torque and dynamic behaviour as well.


2008 ◽  
Vol 17 (3) ◽  
pp. 87-92
Author(s):  
Leonard L. LaPointe

Abstract Loss of implicit linguistic competence assumes a loss of linguistic rules, necessary linguistic computations, or representations. In aphasia, the inherent neurological damage is frequently assumed by some to be a loss of implicit linguistic competence that has damaged or wiped out neural centers or pathways that are necessary for maintenance of the language rules and representations needed to communicate. Not everyone agrees with this view of language use in aphasia. The measurement of implicit language competence, although apparently necessary and satisfying for theoretic linguistics, is complexly interwoven with performance factors. Transience, stimulability, and variability in aphasia language use provide evidence for an access deficit model that supports performance loss. Advances in understanding linguistic competence and performance may be informed by careful study of bilingual language acquisition and loss, the language of savants, the language of feral children, and advances in neuroimaging. Social models of aphasia treatment, coupled with an access deficit view of aphasia, can salve our restless minds and allow pursuit of maximum interactive communication goals even without a comfortable explanation of implicit linguistic competence in aphasia.


Sign in / Sign up

Export Citation Format

Share Document