scholarly journals Compositional RL Agents That Follow Language Commands in Temporal Logic

2021 ◽  
Vol 8 ◽  
Author(s):  
Yen-Ling Kuo ◽  
Boris Katz ◽  
Andrei Barbu

We demonstrate how a reinforcement learning agent can use compositional recurrent neural networks to learn to carry out commands specified in linear temporal logic (LTL). Our approach takes as input an LTL formula, structures a deep network according to the parse of the formula, and determines satisfying actions. This compositional structure of the network enables zero-shot generalization to significantly more complex unseen formulas. We demonstrate this ability in multiple problem domains with both discrete and continuous state-action spaces. In a symbolic domain, the agent finds a sequence of letters that satisfy a specification. In a Minecraft-like environment, the agent finds a sequence of actions that conform to a formula. In the Fetch environment, the robot finds a sequence of arm configurations that move blocks on a table to fulfill the commands. While most prior work can learn to execute one formula reliably, we develop a novel form of multi-task learning for RL agents that allows them to learn from a diverse set of tasks and generalize to a new set of diverse tasks without any additional training. The compositional structures presented here are not specific to LTL, thus opening the path to RL agents that perform zero-shot generalization in other compositional domains.

Author(s):  
Wonryong Ryou ◽  
Jiayu Chen ◽  
Mislav Balunovic ◽  
Gagandeep Singh ◽  
Andrei Dan ◽  
...  

AbstractWe present a scalable and precise verifier for recurrent neural networks, called Prover based on two novel ideas: (i) a method to compute a set of polyhedral abstractions for the non-convex and non-linear recurrent update functions by combining sampling, optimization, and Fermat’s theorem, and (ii) a gradient descent based algorithm for abstraction refinement guided by the certification problem that combines multiple abstractions for each neuron. Using Prover, we present the first study of certifying a non-trivial use case of recurrent neural networks, namely speech classification. To achieve this, we additionally develop custom abstractions for the non-linear speech preprocessing pipeline. Our evaluation shows that Prover successfully verifies several challenging recurrent models in computer vision, speech, and motion sensor data classification beyond the reach of prior work.


2012 ◽  
Vol 45 ◽  
pp. 515-564 ◽  
Author(s):  
J. Garcia ◽  
F. Fernandez

In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in damage to the learning system (or any other system). Consequently, when an agent begins an interaction with a dangerous and high-dimensional state-action space, an important question arises; namely, that of how to avoid (or at least minimize) damage caused by the exploration of the state-action space. We introduce the PI-SRL algorithm which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. We evaluate the proposed method in four complex tasks: automatic car parking, pole-balancing, helicopter hovering, and business management.


2021 ◽  
Author(s):  
Nanda Kishore Sreenivas ◽  
Shrisha Rao

In toy environments like video games, a reinforcement learning agent is deployed and operates within the same state space in which it was trained. However, in robotics applications such as industrial systems or autonomous vehicles, this cannot be guaranteed. A robot can be pushed out of its training space by some unforeseen perturbation, which may cause it to go into an unknown state from which it has not been trained to move towards its goal. While most prior work in the area of RL safety focuses on ensuring safety in the training phase, this paper focuses on ensuring the safe deployment of a robot that has already been trained to operate within a safe space. This work defines a condition on the state and action spaces, that if satisfied, guarantees the robot's recovery to safety independently. We also propose a strategy and design that facilitate this recovery within a finite number of steps after perturbation. This is implemented and tested against a standard RL model, and the results indicate a much-improved performance.


Algorithms ◽  
2018 ◽  
Vol 11 (10) ◽  
pp. 148 ◽  
Author(s):  
Panagiotis Kofinas ◽  
Anastasios I. Dounis

This paper proposes a hybrid Zeigler-Nichols (Z-N) reinforcement learning approach for online tuning of the parameters of the Proportional Integral Derivative (PID) for controlling the speed of a DC motor. The PID gains are set by the Z-N method, and are then adapted online through the fuzzy Q-Learning agent. The fuzzy Q-Learning agent is used instead of the conventional Q-Learning, in order to deal with the continuous state-action space. The fuzzy Q-Learning agent defines its state according to the value of the error. The output signal of the agent consists of three output variables, in which each one defines the percentage change of each gain. Each gain can be increased or decreased from 0% to 50% of its initial value. Through this method, the gains of the controller are adjusted online via the interaction of the environment. The knowledge of the expert is not a necessity during the setup process. The simulation results highlight the performance of the proposed control strategy. After the exploration phase, the settling time is reduced in the steady states. In the transient states, the response has less amplitude oscillations and reaches the equilibrium point faster than the conventional PID controller.


Sign in / Sign up

Export Citation Format

Share Document