scholarly journals Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

2022 ◽  
Vol 73 ◽  
pp. 173-208
Author(s):  
Rodrigo Toro Icarte ◽  
Toryn Q. Klassen ◽  
Richard Valenzano ◽  
Sheila A. McIlraith

Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible – to show the reward function’s code to the RL agent so it can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner. In this paper, we show how to accomplish this idea in two steps. First, we propose reward machines, a type of finite state machine that supports the specification of reward functions while exposing reward function structure. We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning. Experiments on tabular and continuous domains, across different tasks and RL agents, show the benefits of exploiting reward structure with respect to sample efficiency and the quality of resultant policies. Finally, by virtue of being a form of finite state machine, reward machines have the expressive power of a regular language and as such support loops, sequences and conditionals, as well as the expression of temporally extended properties typical of linear temporal logic and non-Markovian reward specification.

Author(s):  
Alberto Camacho ◽  
Rodrigo Toro Icarte ◽  
Toryn Q. Klassen ◽  
Richard Valenzano ◽  
Sheila A. McIlraith

In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatically translated into RMs, easing the burden of complex reward function specification. We then show how the exposed structure of the reward function can be exploited by tailored q-learning algorithms and automated reward shaping techniques in order to improve the sample efficiency of reinforcement learning methods. Experiments show that these RM-tailored techniques significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise cannot reasonably be solved by existing approaches.


Author(s):  
N. V. Brovka ◽  
P. P. Dyachuk ◽  
M. V. Noskov ◽  
I. P. Peregudova

The problem and the goal.The urgency of the problem of mathematical description of dynamic adaptive testing is due to the need to diagnose the cognitive abilities of students for independent learning activities. The goal of the article is to develop a Markov mathematical model of the interaction of an active agent (AA) with the Liquidator state machine, canceling incorrect actions, which will allow mathematically describe dynamic adaptive testing with an estimated feedback.The research methodologyconsists of an analysis of the results of research by domestic and foreign scientists on dynamic adaptive testing in education, namely: an activity approach that implements AA developmental problem-solving training; organizational and technological approach to managing the actions of AA in terms of evaluative feedback; Markow’s theory of cement and reinforcement learning.Results.On the basis of the theory of Markov processes, a Markov mathematical model of the interaction of an active agent with a finite state machine, canceling incorrect actions, was developed. This allows you to develop a model for diagnosing the procedural characteristics of students ‘learning activities, including: building axiograms of total reward for students’ actions; probability distribution of states of the solution of the problem of identifying elements of the structure of a complex object calculate the number of AA actions required to achieve the target state depending on the number of elements that need to be identified; construct a scatter plot of active agents by target states in space (R, k), where R is the total reward AA, k is the number of actions performed.Conclusion.Markov’s mathematical model of the interaction of an active agent with a finite state machine, canceling wrong actions allows you to design dynamic adaptive tests and diagnostics of changes in the procedural characteristics of educational activities. The results and conclusions allow to formulate the principles of dynamic adaptive testing based on the estimated feedback.


2018 ◽  
Vol 3 (1) ◽  
pp. 1
Author(s):  
Mustofa Mustofa ◽  
Sidiq Sidiq ◽  
Eva Rahmawati

Perkembangan dunia yang dinamis mendorong percepatan perkembangan teknologi dan informasi. Dengan dorongan tersebut komputer yang dulunya dibuat hanya untuk membantu pekerjaan manusia sekarang berkembang menjadi sarana hiburan, permainan, komunikasi dan lain sebagainya. Dalam sektor hiburan salah satu industri yang sedang menjadi pusat perhatian adalah industri video game. Begitu banyaknya produk video game asing yang masuk ke dalam negeri ini memberikan tantangan kepada bangsa ini. Tentunya video game asing yang masuk ke negara ini membawa banyak unsur kebudayaan negara lain. Ini semakin membuat kebudayaan nusantara semakin tergeserkan dengan serangan kebudayaan asing melalui berbagai media. Maka dari itu peneliti mencoba untuk menerapkan Finite State Machine dalam merancang sebuah video game RPG (Role-Playing game) yang memperkenalkan kebudayaan. Dalam perancangan video game ini peneliti menggunakan metode GDLC(Game Development Life Cycle) agar penelitian ini berjalan secara sistematis. Dalam suatu perancangan video game tedapat banyak elemen, pada penelitian ini penulis lebih fokus pada pengendalian animasi karakter yang dimainkan pada video game ini. Dari perancangan yang dilakukan, disimpulkan bahwa Finite State Machine dapat digunakan untuk pengendalian animasi yang baik pada video game RPG. Diharapkan video game ini dapat menjadi salah satu media untuk mengenalkan kebudayaan nusantara


2013 ◽  
Vol 18 (2-3) ◽  
pp. 49-60 ◽  
Author(s):  
Damian Dudzńiski ◽  
Tomasz Kryjak ◽  
Zbigniew Mikrut

Abstract In this paper a human action recognition algorithm, which uses background generation with shadow elimination, silhouette description based on simple geometrical features and a finite state machine for recognizing particular actions is described. The performed tests indicate that this approach obtains a 81 % correct recognition rate allowing real-time image processing of a 360 X 288 video stream.


2013 ◽  
Vol 33 (1) ◽  
pp. 149-152
Author(s):  
Jianjun LI ◽  
Yixiang JIANG ◽  
Jie QIAN ◽  
Wei LI ◽  
Yu LI

Author(s):  
Muffie Wiebe Waterman ◽  
Dennis C. Frezzo ◽  
Michael X. Wang

Modelling ◽  
2021 ◽  
Vol 2 (1) ◽  
pp. 43-62
Author(s):  
Kshirasagar Naik ◽  
Mahesh D. Pandey ◽  
Anannya Panda ◽  
Abdurhman Albasir ◽  
Kunal Taneja

Accurate modelling and simulation of a nuclear power plant are important factors in the strategic planning and maintenance of the plant. Several nonlinearities and multivariable couplings are associated with real-world plants. Therefore, it is quite challenging to model such cyberphysical systems using conventional mathematical equations. A visual analytics approach which addresses these limitations and models both short term as well as long term behaviour of the system is introduced. Principal Component Analysis (PCA) followed by Linear Discriminant Analysis (LDA) is used to extract features from the data, k-means clustering is applied to label the data instances. Finite state machine representation formulated from the clustered data is then used to model the behaviour of cyberphysical systems using system states and state transitions. In this paper, the indicated methodology is deployed over time-series data collected from a nuclear power plant for nine years. It is observed that this approach of combining the machine learning principles with the finite state machine capabilities facilitates feature exploration, visual analysis, pattern discovery, and effective modelling of nuclear power plant data. In addition, finite state machine representation supports identification of normal and abnormal operation of the plant, thereby suggesting that the given approach captures the anomalous behaviour of the plant.


Sign in / Sign up

Export Citation Format

Share Document