scholarly journals Restraining Bolts for Reinforcement Learning Agents

2020 ◽  
Vol 34 (09) ◽  
pp. 13659-13662
Author(s):  
Giuseppe De Giacomo ◽  
Luca Iocchi ◽  
Marco Favorito ◽  
Fabio Patrizi

In this work we have investigated the concept of “restraining bolt”, inspired by Science Fiction. We have two distinct sets of features extracted from the world, one by the agent and one by the authority imposing some restraining specifications on the behaviour of the agent (the “restraining bolt”). The two sets of features and, hence the model of the world attainable from them, are apparently unrelated since of interest to independent parties. However they both account for (aspects of) the same world. We have considered the case in which the agent is a reinforcement learning agent on a set of low-level (subsymbolic) features, while the restraining bolt is specified logically using linear time logic on finite traces f/f over a set of high-level symbolic features. We show formally, and illustrate with examples, that, under general circumstances, the agent can learn while shaping its goals to suitably conform (as much as possible) to the restraining bolt specifications.1

Biomimetics ◽  
2021 ◽  
Vol 6 (1) ◽  
pp. 13
Author(s):  
Adam Bignold ◽  
Francisco Cruz ◽  
Richard Dazeley ◽  
Peter Vamplew ◽  
Cameron Foale

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.


Author(s):  
Yoshihiro Ichikawa ◽  
◽  
Keiki Takadama

This paper proposes the reinforcement learning agent that estimates internal rewards using external rewards in order to avoid conflict in multi-step dilemma problem. Intensive simulation results have revealed that the agent succeeds in avoiding local convergence and obtains a behavior policy for reaching a higher reward by updating the Q-value using the value that is subtracted the average reward from an external reward.


In the trend of mobile applications, the world is surfing through many applications for various personal and professional purposes. In every domain including the cutting-edge technology such as Machine learning, IoT (Internet of Things), representing the data to the user in a proper and understanding manner is very important. This is where mobile applications come to use. Mobile applications can be used to resolve many communication issues especially when communication is between low level to high level and vice versa. This application is made to serve as one of the best ways of communication between faculty and students especially when the faculty is not available in the cabin and the student is willing to meet the faculty at the same time. The mobile application uses Dart Language with Flutter UI Software Development


Author(s):  
Qiuyuan Huang ◽  
Zhe Gan ◽  
Asli Celikyilmaz ◽  
Dapeng Wu ◽  
Jianfeng Wang ◽  
...  

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task. Within our framework, the task of generating a story given a sequence of images is divided across a two-level hierarchical decoder. The high-level decoder constructs a plan by generating a semantic concept (i.e., topic) for each image in sequence. The low-level decoder generates a sentence for each image using a semantic compositional network, which effectively grounds the sentence generation conditioned on the topic. The two decoders are jointly trained end-to-end using reinforcement learning. We evaluate our model on the visual storytelling (VIST) dataset. Empirical results from both automatic and human evaluations demonstrate that the proposed hierarchically structured reinforced training achieves significantly better performance compared to a strong flat deep reinforcement learning baseline.


2020 ◽  
Vol 13 (4) ◽  
pp. 78
Author(s):  
Nico Zengeler ◽  
Uwe Handmann

We present a deep reinforcement learning framework for an automatic trading of contracts for difference (CfD) on indices at a high frequency. Our contribution proves that reinforcement learning agents with recurrent long short-term memory (LSTM) networks can learn from recent market history and outperform the market. Usually, these approaches depend on a low latency. In a real-world example, we show that an increased model size may compensate for a higher latency. As the noisy nature of economic trends complicates predictions, especially in speculative assets, our approach does not predict courses but instead uses a reinforcement learning agent to learn an overall lucrative trading policy. Therefore, we simulate a virtual market environment, based on historical trading data. Our environment provides a partially observable Markov decision process (POMDP) to reinforcement learners and allows the training of various strategies.


Author(s):  
Ruohan Zhang ◽  
Faraz Torabi ◽  
Lin Guan ◽  
Dana H. Ballard ◽  
Peter Stone

Reinforcement learning agents can learn to solve sequential decision tasks by interacting with the environment. Human knowledge of how to solve these tasks can be incorporated using imitation learning, where the agent learns to imitate human demonstrated decisions. However, human guidance is not limited to the demonstrations. Other types of guidance could be more suitable for certain tasks and require less human effort. This survey provides a high-level overview of five recent learning frameworks that primarily rely on human guidance other than conventional, step-by-step action demonstrations. We review the motivation, assumption, and implementation of each framework. We then discuss possible future research directions.


Author(s):  
Cristina I. Font-Julian ◽  
Raúl Compés-López ◽  
Enrique Orduna-Malea

The aim of this work is to determine to what extent Robert Parker has lost his influence as a prescriber in the world of wine through a webometric analysis based on a comparative analysis of Parker’s web influence and that of a competitor who represents an anthitetical vision of the world of wine (Alice Feiring). To do this, we carried out a comparative analysis for Parker’s (@wine_advocate) and Alice Feiring’s (@alicefeiring) official Twitter accounts, including a broad set of metrics (productivity, age, Social Activity, number of followees, etc.), paying special attention to specific followers’ features (age, gender, location, and bios text). The results show that Parker’s twitter profile exhibits an overall higher impact, which denotes not only a different online strategy but also a high level of engagement and popularity. The low level of shared followers by Parker and Feiring (1,898 users) offer prima facie evidence of an online gap between these followers, which can indicate the existence of a divided group of supporters corresponding with the visions that Parker and Feiring represent. Finally, special features are notice for Feiring in gender (more women followers), language (more English-speaking followers) and country (more followers from the United States).


Author(s):  
N. Botteghi ◽  
R. Schulte ◽  
B. Sirmacek ◽  
M. Poel ◽  
C. Brune

Abstract. Autonomously exploring and mapping is one of the open challenges of robotics and artificial intelligence. Especially when the environments are unknown, choosing the optimal navigation directive is not straightforward. In this paper, we propose a reinforcement learning framework for navigating, exploring, and mapping unknown environments. The reinforcement learning agent is in charge of selecting the commands for steering the mobile robot, while a SLAM algorithm estimates the robot pose and maps the environments. The agent, to select optimal actions, is trained to be curious about the world. This concept translates into the introduction of a curiosity-driven reward function that encourages the agent to steer the mobile robot towards unknown and unseen areas of the world and the map. We test our approach in explorations challenges in different indoor environments. The agent trained with the proposed reward function outperforms the agents trained with reward functions commonly used in the literature for solving such tasks.


2019 ◽  
Author(s):  
Theodore L. Willke ◽  
Seng Bum M. Yoo ◽  
Mihai Capotă ◽  
Sebastian Musslick ◽  
Benjamin Y. Hayden ◽  
...  

AbstractWe compare the performance of non-human primates and deep reinforcement learning agents in a virtual pursuit-avoidance task, as part of an effort to understand the role that cognitive control plays in the deeply evolved skill of chase and escape behavior. Here we train two agents, a deep Q network and an actor-critic model, on a video game in which the player must capture a prey while avoiding a predator. A previously trained rhesus macaque performed well on this task, and in a manner that obeyed basic principles of Newtonian physics. We sought to compare the principles learned by artificial agents with those followed by the animal, as determined by the ability of one to predict the other. Our findings suggest that the agents learn primarily 1st order physics of motion, while the animal exhibited abilities consistent with the 2nd order physics of motion. We identify scenarios in which the actions taken by the animal and agents were consistent as well as ones in which they differed, including some surprising strategies exhibited by the agents. Finally, we remark on how the differences between how the agents and the macaque learn the task may affect their peak performance as well as their ability to generalize to other tasks.


2019 ◽  
Vol 4 (37) ◽  
pp. eaay6276 ◽  
Author(s):  
Xiao Li ◽  
Zachary Serlin ◽  
Guang Yang ◽  
Calin Belta

Growing interest in reinforcement learning approaches to robotic planning and control raises concerns of predictability and safety of robot behaviors realized solely through learned control policies. In addition, formally defining reward functions for complex tasks is challenging, and faulty rewards are prone to exploitation by the learning agent. Here, we propose a formal methods approach to reinforcement learning that (i) provides a formal specification language that integrates high-level, rich, task specifications with a priori, domain-specific knowledge; (ii) makes the reward generation process easily interpretable; (iii) guides the policy generation process according to the specification; and (iv) guarantees the satisfaction of the (critical) safety component of the specification. The main ingredients of our computational framework are a predicate temporal logic specifically tailored for robotic tasks and an automaton-guided, safe reinforcement learning algorithm based on control barrier functions. Although the proposed framework is quite general, we motivate it and illustrate it experimentally for a robotic cooking task, in which two manipulators worked together to make hot dogs.


Sign in / Sign up

Export Citation Format

Share Document