Compositional RL Agents That Follow Language Commands in Temporal Logic

We demonstrate how a reinforcement learning agent can use compositional recurrent neural networks to learn to carry out commands specified in linear temporal logic (LTL). Our approach takes as input an LTL formula, structures a deep network according to the parse of the formula, and determines satisfying actions. This compositional structure of the network enables zero-shot generalization to significantly more complex unseen formulas. We demonstrate this ability in multiple problem domains with both discrete and continuous state-action spaces. In a symbolic domain, the agent finds a sequence of letters that satisfy a specification. In a Minecraft-like environment, the agent finds a sequence of actions that conform to a formula. In the Fetch environment, the robot finds a sequence of arm configurations that move blocks on a table to fulfill the commands. While most prior work can learn to execute one formula reliably, we develop a novel form of multi-task learning for RL agents that allows them to learn from a diverse set of tasks and generalize to a new set of diverse tasks without any additional training. The compositional structures presented here are not specific to LTL, thus opening the path to RL agents that perform zero-shot generalization in other compositional domains.

Download Full-text

Scalable Polyhedral Verification of Recurrent Neural Networks

Computer Aided Verification - Lecture Notes in Computer Science ◽

10.1007/978-3-030-81685-8_10 ◽

2021 ◽

pp. 225-248

Author(s):

Wonryong Ryou ◽

Jiayu Chen ◽

Mislav Balunovic ◽

Gagandeep Singh ◽

Andrei Dan ◽

...

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Recurrent Neural Networks ◽

Sensor Data ◽

Motion Sensor ◽

Use Case ◽

Prior Work ◽

Non Linear ◽

Speech Classification ◽

Sampling Optimization

AbstractWe present a scalable and precise verifier for recurrent neural networks, called Prover based on two novel ideas: (i) a method to compute a set of polyhedral abstractions for the non-convex and non-linear recurrent update functions by combining sampling, optimization, and Fermat’s theorem, and (ii) a gradient descent based algorithm for abstraction refinement guided by the certification problem that combines multiple abstractions for each neuron. Using Prover, we present the first study of certifying a non-trivial use case of recurrent neural networks, namely speech classification. To achieve this, we additionally develop custom abstractions for the non-linear speech preprocessing pipeline. Our evaluation shows that Prover successfully verifies several challenging recurrent models in computer vision, speech, and motion sensor data classification beyond the reach of prior work.

Download Full-text

Continuous State-Action Space Advantage-Learning Using Interval Analysis and Neural Networks

AIAA Guidance, Navigation and Control Conference and Exhibit ◽

10.2514/6.2007-6522 ◽

2007 ◽

Cited By ~ 2

Author(s):

E. Weerdt ◽

Q.P. Chu ◽

J.A. Mulder

Keyword(s):

Neural Networks ◽

Interval Analysis ◽

Action Space ◽

State Action ◽

Continuous State

Download Full-text

Safe control from signal temporal logic specifications using recurrent neural networks

Proceedings of the 1st International Workshop on Verification of Autonomous & Robotic Systems ◽

10.1145/3459086.3459629 ◽

2021 ◽

Author(s):

Calin Belta

Keyword(s):

Neural Networks ◽

Temporal Logic ◽

Recurrent Neural Networks ◽

Safe Control

Download Full-text

Automated Driving Highway Traffic Merging using Deep Multi-Agent Reinforcement Learning in Continuous State-Action Spaces

10.1109/iv48863.2021.9575676 ◽

2021 ◽

Author(s):

Larry Schester ◽

Luis E. Ortiz

Keyword(s):

Reinforcement Learning ◽

Highway Traffic ◽

Automated Driving ◽

State Action ◽

Continuous State ◽

Multi Agent ◽

Action Spaces

Download Full-text

Grid approximations of MDPs with continuous state/action spaces

Approximate Iterative Algorithms ◽

10.1201/b16613-21 ◽

2014 ◽

pp. 345-352

Keyword(s):

State Action ◽

Continuous State ◽

Action Spaces

Download Full-text

Safe Exploration of State and Action Spaces in Reinforcement Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.3761 ◽

2012 ◽

Vol 45 ◽

pp. 515-564 ◽

Cited By ~ 20

Author(s):

J. Garcia ◽

F. Fernandez

Keyword(s):

Reinforcement Learning ◽

Learning System ◽

Action Space ◽

High Dimensional ◽

State Action ◽

Continuous State ◽

Additional Challenge ◽

Efficient Exploration ◽

Action Spaces ◽

Selection Of

In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in damage to the learning system (or any other system). Consequently, when an agent begins an interaction with a dangerous and high-dimensional state-action space, an important question arises; namely, that of how to avoid (or at least minimize) damage caused by the exploration of the state-action space. We introduce the PI-SRL algorithm which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. We evaluate the proposed method in four complex tasks: automatic car parking, pole-balancing, helicopter hovering, and business management.

Download Full-text

Safe Deployment of a Reinforcement Learning Robot Using Self Stabilization

10.36227/techrxiv.14842245.v1 ◽

2021 ◽

Author(s):

Nanda Kishore Sreenivas ◽

Shrisha Rao

Keyword(s):

Reinforcement Learning ◽

Finite Number ◽

Autonomous Vehicles ◽

Safe Space ◽

Training Phase ◽

Prior Work ◽

Industrial Systems ◽

Learning Agent ◽

Improved Performance ◽

Action Spaces

In toy environments like video games, a reinforcement learning agent is deployed and operates within the same state space in which it was trained. However, in robotics applications such as industrial systems or autonomous vehicles, this cannot be guaranteed. A robot can be pushed out of its training space by some unforeseen perturbation, which may cause it to go into an unknown state from which it has not been trained to move towards its goal. While most prior work in the area of RL safety focuses on ensuring safety in the training phase, this paper focuses on ensuring the safe deployment of a robot that has already been trained to operate within a safe space. This work defines a condition on the state and action spaces, that if satisfied, guarantees the robot's recovery to safety independently. We also propose a strategy and design that facilitate this recovery within a finite number of steps after perturbation. This is implemented and tested against a standard RL model, and the results indicate a much-improved performance.

Download Full-text

Local Stability Analysis of Discrete-Time, Continuous-State, Complex-Valued Recurrent Neural Networks With Inner State Feedback

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2013.2281217 ◽

2014 ◽

Vol 25 (4) ◽

pp. 830-836 ◽

Cited By ~ 10

Author(s):

Mohamad Mostafa ◽

Werner G. Teich ◽

Jurgen Lindner

Keyword(s):

Neural Networks ◽

Stability Analysis ◽

Discrete Time ◽

Recurrent Neural Networks ◽

Local Stability ◽

State Feedback ◽

Local Stability Analysis ◽

Continuous State ◽

Complex Valued

Download Full-text

1A1-M14 Reinforcement Learning in Continuous State and Action Spaces : Action Value Functions Expressed by Sigmoid Neural Networks and CMAC(Evolution and Learning for Robotics)

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) ◽

10.1299/jsmermd.2011._1a1-m14_1 ◽

2011 ◽

Vol 2011 (0) ◽

pp. _1A1-M14_1-_1A1-M14_4

Author(s):

Kazuaki YAMADA

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Value Functions ◽

Continuous State ◽

Action Value ◽

Action Spaces

Download Full-text

Fuzzy Q-Learning Agent for Online Tuning of PID Controller for DC Motor Speed Control

Algorithms ◽

10.3390/a11100148 ◽

2018 ◽

Vol 11 (10) ◽

pp. 148 ◽

Cited By ~ 2

Author(s):

Panagiotis Kofinas ◽

Anastasios I. Dounis

Keyword(s):

Pid Controller ◽

Dc Motor ◽

Proportional Integral Derivative ◽

Motor Speed ◽

Initial Value ◽

Q Learning ◽

State Action ◽

Learning Agent ◽

Continuous State ◽

Online Tuning

This paper proposes a hybrid Zeigler-Nichols (Z-N) reinforcement learning approach for online tuning of the parameters of the Proportional Integral Derivative (PID) for controlling the speed of a DC motor. The PID gains are set by the Z-N method, and are then adapted online through the fuzzy Q-Learning agent. The fuzzy Q-Learning agent is used instead of the conventional Q-Learning, in order to deal with the continuous state-action space. The fuzzy Q-Learning agent defines its state according to the value of the error. The output signal of the agent consists of three output variables, in which each one defines the percentage change of each gain. Each gain can be increased or decreased from 0% to 50% of its initial value. Through this method, the gains of the controller are adjusted online via the interaction of the environment. The knowledge of the expert is not a necessity during the setup process. The simulation results highlight the performance of the proposed control strategy. After the exploration phase, the settling time is reduced in the steady states. In the transient states, the response has less amplitude oscillations and reaches the equilibrium point faster than the conventional PID controller.

Download Full-text