Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12440 ◽

2022 ◽

Vol 73 ◽

pp. 173-208

Author(s):

Rodrigo Toro Icarte ◽

Toryn Q. Klassen ◽

Richard Valenzano ◽

Sheila A. McIlraith

Keyword(s):

Reinforcement Learning ◽

Finite State Machine ◽

Expressive Power ◽

State Machine ◽

Function Structure ◽

Efficient Manner ◽

Reward Function ◽

Optimal Policies ◽

Finite State ◽

Reward Functions

Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible – to show the reward function’s code to the RL agent so it can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner. In this paper, we show how to accomplish this idea in two steps. First, we propose reward machines, a type of finite state machine that supports the specification of reward functions while exposing reward function structure. We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning. Experiments on tabular and continuous domains, across different tasks and RL agents, show the benefits of exploiting reward structure with respect to sample efficiency and the quality of resultant policies. Finally, by virtue of being a form of finite state machine, reward machines have the expressive power of a regular language and as such support loops, sequences and conditionals, as well as the expression of temporally extended properties typical of linear temporal logic and non-Markovian reward specification.

Download Full-text

LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/840 ◽

2019 ◽

Cited By ~ 4

Author(s):

Alberto Camacho ◽

Rodrigo Toro Icarte ◽

Toryn Q. Klassen ◽

Richard Valenzano ◽

Sheila A. McIlraith

Keyword(s):

Reinforcement Learning ◽

Normal Form ◽

State Of The Art ◽

Formal Languages ◽

Function Structure ◽

Q Learning ◽

Reward Function ◽

Form Representation ◽

Reward Shaping ◽

Reward Functions

In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatically translated into RMs, easing the burden of complex reward function specification. We then show how the exposed structure of the reward function can be exploited by tailored q-learning algorithms and automated reward shaping techniques in order to improve the sample efficiency of reinforcement learning methods. Experiments show that these RM-tailored techniques significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise cannot reasonably be solved by existing approaches.

Download Full-text

Simulated autonomous driving in a realistic driving environment using deep reinforcement learning and a deterministic finite state machine

Proceedings of the 2nd International Conference on Applications of Intelligent Systems - APPIS '19 ◽

10.1145/3309772.3309802 ◽

2019 ◽

Author(s):

Patrick Klose ◽

Rudolf Mester

Keyword(s):

Reinforcement Learning ◽

Finite State Machine ◽

Autonomous Driving ◽

State Machine ◽

Finite State

Download Full-text

MARKOV MATHEMATICAL MODEL OF DYNAMIC ADAPTIVE TESTING OF AN ACTIVE AGENT

Informatics and Education ◽

10.32517/0234-0453-2018-33-10-29-35 ◽

2018 ◽

pp. 29-35

Author(s):

N. V. Brovka ◽

P. P. Dyachuk ◽

M. V. Noskov ◽

I. P. Peregudova

Keyword(s):

Mathematical Model ◽

Finite State Machine ◽

Active Agent ◽

Adaptive Testing ◽

State Machine ◽

Learning Activities ◽

Independent Learning ◽

Complex Object ◽

Total Reward ◽

Finite State

The problem and the goal.The urgency of the problem of mathematical description of dynamic adaptive testing is due to the need to diagnose the cognitive abilities of students for independent learning activities. The goal of the article is to develop a Markov mathematical model of the interaction of an active agent (AA) with the Liquidator state machine, canceling incorrect actions, which will allow mathematically describe dynamic adaptive testing with an estimated feedback.The research methodologyconsists of an analysis of the results of research by domestic and foreign scientists on dynamic adaptive testing in education, namely: an activity approach that implements AA developmental problem-solving training; organizational and technological approach to managing the actions of AA in terms of evaluative feedback; Markow’s theory of cement and reinforcement learning.Results.On the basis of the theory of Markov processes, a Markov mathematical model of the interaction of an active agent with a finite state machine, canceling incorrect actions, was developed. This allows you to develop a model for diagnosing the procedural characteristics of students ‘learning activities, including: building axiograms of total reward for students’ actions; probability distribution of states of the solution of the problem of identifying elements of the structure of a complex object calculate the number of AA actions required to achieve the target state depending on the number of elements that need to be identified; construct a scatter plot of active agents by target states in space (R, k), where R is the total reward AA, k is the number of actions performed.Conclusion.Markov’s mathematical model of the interaction of an active agent with a finite state machine, canceling wrong actions allows you to design dynamic adaptive tests and diagnostics of changes in the procedural characteristics of educational activities. The results and conclusions allow to formulate the principles of dynamic adaptive testing based on the estimated feedback.

Download Full-text

PENERAPAN FINITE STATE MACHINE UNTUK PENGENDALIAN ANIMASI PADA VIDEO GAME RPG NUSANTARA LEGACY

Jurnal Sistem Komputer Musirawas (JUSIKOM) ◽

10.32767/jusikom.v3i1.251 ◽

2018 ◽

Vol 3 (1) ◽

pp. 1

Author(s):

Mustofa Mustofa ◽

Sidiq Sidiq ◽

Eva Rahmawati

Keyword(s):

Life Cycle ◽

Video Game ◽

Finite State Machine ◽

Role Playing ◽

State Machine ◽

Game Development ◽

Development Life Cycle ◽

Finite State ◽

Role Playing Game

Perkembangan dunia yang dinamis mendorong percepatan perkembangan teknologi dan informasi. Dengan dorongan tersebut komputer yang dulunya dibuat hanya untuk membantu pekerjaan manusia sekarang berkembang menjadi sarana hiburan, permainan, komunikasi dan lain sebagainya. Dalam sektor hiburan salah satu industri yang sedang menjadi pusat perhatian adalah industri video game. Begitu banyaknya produk video game asing yang masuk ke dalam negeri ini memberikan tantangan kepada bangsa ini. Tentunya video game asing yang masuk ke negara ini membawa banyak unsur kebudayaan negara lain. Ini semakin membuat kebudayaan nusantara semakin tergeserkan dengan serangan kebudayaan asing melalui berbagai media. Maka dari itu peneliti mencoba untuk menerapkan Finite State Machine dalam merancang sebuah video game RPG (Role-Playing game) yang memperkenalkan kebudayaan. Dalam perancangan video game ini peneliti menggunakan metode GDLC(Game Development Life Cycle) agar penelitian ini berjalan secara sistematis. Dalam suatu perancangan video game tedapat banyak elemen, pada penelitian ini penulis lebih fokus pada pengendalian animasi karakter yang dimainkan pada video game ini. Dari perancangan yang dilakukan, disimpulkan bahwa Finite State Machine dapat digunakan untuk pengendalian animasi yang baik pada video game RPG. Diharapkan video game ini dapat menjadi salah satu media untuk mengenalkan kebudayaan nusantara

Download Full-text

Human action recognition using simple geometric features and a finite state machine

Image Processing & Communications ◽

10.2478/v10248-012-0079-y ◽

2013 ◽

Vol 18 (2-3) ◽

pp. 49-60 ◽

Cited By ~ 2

Author(s):

Damian Dudzńiski ◽

Tomasz Kryjak ◽

Zbigniew Mikrut

Keyword(s):

Action Recognition ◽

Finite State Machine ◽

Recognition Rate ◽

Human Action Recognition ◽

Human Action ◽

Video Stream ◽

State Machine ◽

Recognition Algorithm ◽

Finite State ◽

Correct Recognition Rate

Abstract In this paper a human action recognition algorithm, which uses background generation with shadow elimination, silhouette description based on simple geometrical features and a finite state machine for recognizing particular actions is described. The performed tests indicate that this approach obtains a 81 % correct recognition rate allowing real-time image processing of a 360 X 288 video stream.

Download Full-text

User permission isolation model based on finite state machine

Journal of Computer Applications ◽

10.3724/sp.j.1087.2013.00149 ◽

2013 ◽

Vol 33 (1) ◽

pp. 149-152

Author(s):

Jianjun LI ◽

Yixiang JIANG ◽

Jie QIAN ◽

Wei LI ◽

Yu LI

Keyword(s):

Finite State Machine ◽

State Machine ◽

Model Based ◽

Finite State

Download Full-text

Optimization of Stochastic Computing Based Deep Learning Systems with Parallel Finite State Machine Implementation

Proceedings of the 2020 4th International Conference on Algorithms, Computing and Systems ◽

10.1145/3423390.3426727 ◽

2020 ◽

Author(s):

Jinjie Liu

Keyword(s):

Deep Learning ◽

Finite State Machine ◽

State Machine ◽

Learning Systems ◽

Stochastic Computing ◽

Finite State

Download Full-text

Adaptive Learning using Finite State Machine Logic

Proceedings of the Seventh ACM Conference on Learning @ Scale ◽

10.1145/3386527.3406720 ◽

2020 ◽

Author(s):

Muffie Wiebe Waterman ◽

Dennis C. Frezzo ◽

Michael X. Wang

Keyword(s):

Adaptive Learning ◽

Finite State Machine ◽

State Machine ◽

Finite State

Download Full-text

Data Driven Modelling of Nuclear Power Plant Performance Data as Finite State Machines

Modelling ◽

10.3390/modelling2010003 ◽

2021 ◽

Vol 2 (1) ◽

pp. 43-62

Author(s):

Kshirasagar Naik ◽

Mahesh D. Pandey ◽

Anannya Panda ◽

Abdurhman Albasir ◽

Kunal Taneja

Keyword(s):

Power Plant ◽

Nuclear Power Plant ◽

Nuclear Power ◽

Finite State Machine ◽

State Machine ◽

Plant Performance ◽

Series Data ◽

Cyberphysical Systems ◽

Finite State ◽

Machine Representation

Accurate modelling and simulation of a nuclear power plant are important factors in the strategic planning and maintenance of the plant. Several nonlinearities and multivariable couplings are associated with real-world plants. Therefore, it is quite challenging to model such cyberphysical systems using conventional mathematical equations. A visual analytics approach which addresses these limitations and models both short term as well as long term behaviour of the system is introduced. Principal Component Analysis (PCA) followed by Linear Discriminant Analysis (LDA) is used to extract features from the data, k-means clustering is applied to label the data instances. Finite state machine representation formulated from the clustered data is then used to model the behaviour of cyberphysical systems using system states and state transitions. In this paper, the indicated methodology is deployed over time-series data collected from a nuclear power plant for nine years. It is observed that this approach of combining the machine learning principles with the finite state machine capabilities facilitates feature exploration, visual analysis, pattern discovery, and effective modelling of nuclear power plant data. In addition, finite state machine representation supports identification of normal and abnormal operation of the plant, thereby suggesting that the given approach captures the anomalous behaviour of the plant.

Download Full-text

Fault diagnosis and prognosis of steer-by-wire system based on finite state machine and extreme learning machine

Neural Computing and Applications ◽

10.1007/s00521-021-06028-0 ◽

2021 ◽

Author(s):

Dun Lan ◽

Ming Yu ◽

Yunzhi Huang ◽

Zhaowu Ping ◽

Jie Zhang

Keyword(s):

Fault Diagnosis ◽

Extreme Learning Machine ◽

Finite State Machine ◽

State Machine ◽

Diagnosis And Prognosis ◽

Steer By Wire ◽

Finite State ◽

Learning Machine ◽

Fault Diagnosis And Prognosis ◽

Wire System

Download Full-text