Restraining Bolts for Reinforcement Learning Agents

Giuseppe De Giacomo; Luca Iocchi; Marco Favorito; Fabio Patrizi

doi:10.1609/aaai.v34i09.7114

Restraining Bolts for Reinforcement Learning Agents

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i09.7114 ◽

2020 ◽

Vol 34 (09) ◽

pp. 13659-13662

Author(s):

Giuseppe De Giacomo ◽

Luca Iocchi ◽

Marco Favorito ◽

Fabio Patrizi

Keyword(s):

Reinforcement Learning ◽

Science Fiction ◽

Linear Time ◽

Learning Agents ◽

Low Level ◽

Learning Agent ◽

Time Logic ◽

The World ◽

High Level

In this work we have investigated the concept of “restraining bolt”, inspired by Science Fiction. We have two distinct sets of features extracted from the world, one by the agent and one by the authority imposing some restraining specifications on the behaviour of the agent (the “restraining bolt”). The two sets of features and, hence the model of the world attainable from them, are apparently unrelated since of interest to independent parties. However they both account for (aspects of) the same world. We have considered the case in which the agent is a reinforcement learning agent on a set of low-level (subsymbolic) features, while the restraining bolt is specified logically using linear time logic on finite traces f/f over a set of high-level symbolic features. We show formally, and illustrate with examples, that, under general circumstances, the agent can learn while shaping its goals to suitably conform (as much as possible) to the restraining bolt specifications.1

Download Full-text

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text

Designing Internal Reward of Reinforcement Learning Agents in Multi-Step Dilemma Problem

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2013.p0926 ◽

2013 ◽

Vol 17 (6) ◽

pp. 926-931 ◽

Cited By ~ 5

Author(s):

Yoshihiro Ichikawa ◽

◽

Keiki Takadama

Keyword(s):

Reinforcement Learning ◽

Local Convergence ◽

Average Reward ◽

Q Value ◽

Learning Agents ◽

Learning Agent ◽

Simulation Results ◽

External Reward

This paper proposes the reinforcement learning agent that estimates internal rewards using external rewards in order to avoid conflict in multi-step dilemma problem. Intensive simulation results have revealed that the agent succeeds in avoiding local convergence and obtains a behavior policy for reaching a higher reward by updating the Q-value using the value that is subtracted the average reward from an external reward.

Download Full-text

Development of Mobile Application for Faculty Time out Information

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1421.029420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 168-174

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

Software Development ◽

Mobile Applications ◽

Mobile Application ◽

Low Level ◽

Time Out ◽

The World ◽

Faculty Time ◽

High Level

In the trend of mobile applications, the world is surfing through many applications for various personal and professional purposes. In every domain including the cutting-edge technology such as Machine learning, IoT (Internet of Things), representing the data to the user in a proper and understanding manner is very important. This is where mobile applications come to use. Mobile applications can be used to resolve many communication issues especially when communication is between low level to high level and vice versa. This application is made to serve as one of the best ways of communication between faculty and students especially when the faculty is not available in the cabin and the student is willing to meet the faculty at the same time. The mobile application uses Dart Language with Flutter UI Software Development

Download Full-text

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018465 ◽

2019 ◽

Vol 33 ◽

pp. 8465-8472 ◽

Cited By ~ 8

Author(s):

Qiuyuan Huang ◽

Zhe Gan ◽

Asli Celikyilmaz ◽

Dapeng Wu ◽

Jianfeng Wang ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Approach ◽

Semantic Concept ◽

Sentence Generation ◽

Visual Storytelling ◽

Empirical Results ◽

Low Level ◽

Story Generation ◽

End To End ◽

High Level

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task. Within our framework, the task of generating a story given a sequence of images is divided across a two-level hierarchical decoder. The high-level decoder constructs a plan by generating a semantic concept (i.e., topic) for each image in sequence. The low-level decoder generates a sentence for each image using a semantic compositional network, which effectively grounds the sentence generation conditioned on the topic. The two decoders are jointly trained end-to-end using reinforcement learning. We evaluate our model on the visual storytelling (VIST) dataset. Empirical results from both automatic and human evaluations demonstrate that the proposed hierarchically structured reinforced training achieves significantly better performance compared to a strong flat deep reinforcement learning baseline.

Download Full-text

Contracts for Difference: A Reinforcement Learning Approach

Journal of Risk and Financial Management ◽

10.3390/jrfm13040078 ◽

2020 ◽

Vol 13 (4) ◽

pp. 78

Author(s):

Nico Zengeler ◽

Uwe Handmann

Keyword(s):

Reinforcement Learning ◽

Short Term Memory ◽

Learning Agents ◽

Learning Framework ◽

Learning Agent ◽

Markov Decision ◽

Economic Trends ◽

Model Size ◽

Contracts For Difference ◽

Partially Observable

We present a deep reinforcement learning framework for an automatic trading of contracts for difference (CfD) on indices at a high frequency. Our contribution proves that reinforcement learning agents with recurrent long short-term memory (LSTM) networks can learn from recent market history and outperform the market. Usually, these approaches depend on a low latency. In a real-world example, we show that an increased model size may compensate for a higher latency. As the noisy nature of economic trends complicates predictions, especially in speculative assets, our approach does not predict courses but instead uses a reinforcement learning agent to learn an overall lucrative trading policy. Therefore, we simulate a virtual market environment, based on historical trading data. Our environment provides a partially observable Markov decision process (POMDP) to reinforcement learners and allows the training of various strategies.

Download Full-text

Leveraging Human Guidance for Deep Reinforcement Learning Tasks

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/884 ◽

2019 ◽

Author(s):

Ruohan Zhang ◽

Faraz Torabi ◽

Lin Guan ◽

Dana H. Ballard ◽

Peter Stone

Keyword(s):

Reinforcement Learning ◽

Future Research ◽

Sequential Decision ◽

Research Directions ◽

Learning Agents ◽

Learning Tasks ◽

Future Research Directions ◽

Human Effort ◽

High Level ◽

Learning Frameworks

Reinforcement learning agents can learn to solve sequential decision tasks by interacting with the environment. Human knowledge of how to solve these tasks can be incorporated using imitation learning, where the agent learns to imitate human demonstrated decisions. However, human guidance is not limited to the demonstrations. Other types of guidance could be more suitable for certain tasks and require less human effort. This survey provides a high-level overview of five recent learning frameworks that primarily rely on human guidance other than conventional, step-by-step action demonstrations. We review the motivation, assumption, and implementation of each framework. We then discuss possible future research directions.

Download Full-text

Has Robert Parker lost his hegemony as a prescriptor in the wine World? A preliminar inquiry through Twitter

Proceedings of the 2nd International Conference on Advanced Research Methods and Analytics (CARMA 2018) ◽

10.4995/carma2018.2018.8320 ◽

2018 ◽

Author(s):

Cristina I. Font-Julian ◽

Raúl Compés-López ◽

Enrique Orduna-Malea

Keyword(s):

United States ◽

Comparative Analysis ◽

Social Activity ◽

The United States ◽

Prima Facie ◽

Low Level ◽

The World ◽

English Speaking ◽

Online Strategy ◽

High Level

The aim of this work is to determine to what extent Robert Parker has lost his influence as a prescriber in the world of wine through a webometric analysis based on a comparative analysis of Parker’s web influence and that of a competitor who represents an anthitetical vision of the world of wine (Alice Feiring). To do this, we carried out a comparative analysis for Parker’s (@wine_advocate) and Alice Feiring’s (@alicefeiring) official Twitter accounts, including a broad set of metrics (productivity, age, Social Activity, number of followees, etc.), paying special attention to specific followers’ features (age, gender, location, and bios text). The results show that Parker’s twitter profile exhibits an overall higher impact, which denotes not only a different online strategy but also a high level of engagement and popularity. The low level of shared followers by Parker and Feiring (1,898 users) offer prima facie evidence of an online gap between these followers, which can indicate the existence of a divided group of supporters corresponding with the visions that Parker and Feiring represent. Finally, special features are notice for Feiring in gender (more women followers), language (more English-speaking followers) and country (more followers from the United States).

Download Full-text

CURIOSITY-DRIVEN REINFORCEMENT LEARNING AGENT FOR MAPPING UNKNOWN INDOOR ENVIRONMENTS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-1-2021-129-2021 ◽

2021 ◽

Vol V-1-2021 ◽

pp. 129-136

Author(s):

N. Botteghi ◽

R. Schulte ◽

B. Sirmacek ◽

M. Poel ◽

C. Brune

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Mobile Robot ◽

Indoor Environments ◽

Reward Function ◽

Learning Framework ◽

Learning Agent ◽

The World ◽

Reward Functions ◽

Slam Algorithm

Abstract. Autonomously exploring and mapping is one of the open challenges of robotics and artificial intelligence. Especially when the environments are unknown, choosing the optimal navigation directive is not straightforward. In this paper, we propose a reinforcement learning framework for navigating, exploring, and mapping unknown environments. The reinforcement learning agent is in charge of selecting the commands for steering the mobile robot, while a SLAM algorithm estimates the robot pose and maps the environments. The agent, to select optimal actions, is trained to be curious about the world. This concept translates into the introduction of a curiosity-driven reward function that encourages the agent to steer the mobile robot towards unknown and unseen areas of the world and the map. We test our approach in explorations challenges in different indoor environments. The agent trained with the proposed reward function outperforms the agents trained with reward functions commonly used in the literature for solving such tasks.

Download Full-text

A Comparison of Non-human Primate and Deep Reinforcement Learning Agent Performance in a Virtual Pursuit-Avoidance Task

10.1101/567545 ◽

2019 ◽

Author(s):

Theodore L. Willke ◽

Seng Bum M. Yoo ◽

Mihai Capotă ◽

Sebastian Musslick ◽

Benjamin Y. Hayden ◽

...

Keyword(s):

Reinforcement Learning ◽

Video Game ◽

Escape Behavior ◽

Peak Performance ◽

Avoidance Task ◽

Basic Principles ◽

Newtonian Physics ◽

Learning Agents ◽

Learning Agent ◽

Human Primate

AbstractWe compare the performance of non-human primates and deep reinforcement learning agents in a virtual pursuit-avoidance task, as part of an effort to understand the role that cognitive control plays in the deeply evolved skill of chase and escape behavior. Here we train two agents, a deep Q network and an actor-critic model, on a video game in which the player must capture a prey while avoiding a predator. A previously trained rhesus macaque performed well on this task, and in a manner that obeyed basic principles of Newtonian physics. We sought to compare the principles learned by artificial agents with those followed by the animal, as determined by the ability of one to predict the other. Our findings suggest that the agents learn primarily 1st order physics of motion, while the animal exhibited abilities consistent with the 2nd order physics of motion. We identify scenarios in which the actions taken by the animal and agents were consistent as well as ones in which they differed, including some surprising strategies exhibited by the agents. Finally, we remark on how the differences between how the agents and the macaque learn the task may affect their peak performance as well as their ability to generalize to other tasks.

Download Full-text

A formal methods approach to interpretable reinforcement learning for robotic planning

Science Robotics ◽

10.1126/scirobotics.aay6276 ◽

2019 ◽

Vol 4 (37) ◽

pp. eaay6276 ◽

Cited By ~ 6

Author(s):

Xiao Li ◽

Zachary Serlin ◽

Guang Yang ◽

Calin Belta

Keyword(s):

Reinforcement Learning ◽

Formal Methods ◽

Learning Algorithm ◽

A Priori ◽

Generation Process ◽

Learning Approaches ◽

Learning Agent ◽

Domain Specific Knowledge ◽

Robotic Planning ◽

High Level

Growing interest in reinforcement learning approaches to robotic planning and control raises concerns of predictability and safety of robot behaviors realized solely through learned control policies. In addition, formally defining reward functions for complex tasks is challenging, and faulty rewards are prone to exploitation by the learning agent. Here, we propose a formal methods approach to reinforcement learning that (i) provides a formal specification language that integrates high-level, rich, task specifications with a priori, domain-specific knowledge; (ii) makes the reward generation process easily interpretable; (iii) guides the policy generation process according to the specification; and (iv) guarantees the satisfaction of the (critical) safety component of the specification. The main ingredients of our computational framework are a predicate temporal logic specifically tailored for robotic tasks and an automaton-guided, safe reinforcement learning algorithm based on control barrier functions. Although the proposed framework is quite general, we motivate it and illustrate it experimentally for a robotic cooking task, in which two manipulators worked together to make hot dogs.

Download Full-text