scholarly journals CURIOSITY-DRIVEN REINFORCEMENT LEARNING AGENT FOR MAPPING UNKNOWN INDOOR ENVIRONMENTS

Author(s):  
N. Botteghi ◽  
R. Schulte ◽  
B. Sirmacek ◽  
M. Poel ◽  
C. Brune

Abstract. Autonomously exploring and mapping is one of the open challenges of robotics and artificial intelligence. Especially when the environments are unknown, choosing the optimal navigation directive is not straightforward. In this paper, we propose a reinforcement learning framework for navigating, exploring, and mapping unknown environments. The reinforcement learning agent is in charge of selecting the commands for steering the mobile robot, while a SLAM algorithm estimates the robot pose and maps the environments. The agent, to select optimal actions, is trained to be curious about the world. This concept translates into the introduction of a curiosity-driven reward function that encourages the agent to steer the mobile robot towards unknown and unseen areas of the world and the map. We test our approach in explorations challenges in different indoor environments. The agent trained with the proposed reward function outperforms the agents trained with reward functions commonly used in the literature for solving such tasks.

Author(s):  
N. Botteghi ◽  
B. Sirmacek ◽  
R. Schulte ◽  
M. Poel ◽  
C. Brune

Abstract. In this research, we investigate the use of Reinforcement Learning (RL) for an effective and robust solution for exploring unknown and indoor environments and reconstructing their maps. We benefit from a Simultaneous Localization and Mapping (SLAM) algorithm for real-time robot localization and mapping. Three different reward functions are compared and tested in different environments with growing complexity. The performances of the three different RL-based path planners are assessed not only on the training environments, but also on an a priori unseen environment to test the generalization properties of the policies. The results indicate that RL-based planners trained to maximize the coverage of the map are able to consistently explore and construct the maps of different indoor environments.


Author(s):  
Thomas Recchia ◽  
Jae Chung ◽  
Kishore Pochiraju

As robotic systems become more prevalent, it is highly desirable for them to be able to operate in highly dynamic environments. A common approach is to use reinforcement learning to allow an agent controlling the robot to learn and adapt its behavior based on a reward function. This paper presents a novel multi-agent system that cooperates to control a single robot battle tank in a melee battle scenario, with no prior knowledge of its opponents’ strategies. The agents learn through reinforcement learning, and are loosely coupled by their reward functions. Each agent controls a different aspect of the robot’s behavior. In addition, the problem of delayed reward is addressed through a time-averaged reward applied to several sequential actions at once. This system was evaluated in a simulated melee combat scenario and was shown to learn to improve its performance over time. This was accomplished by each agent learning to pick specific battle strategies for each different opponent it faced.


2020 ◽  
Vol 34 (05) ◽  
pp. 7969-7976
Author(s):  
Junjie Hu ◽  
Yu Cheng ◽  
Zhe Gan ◽  
Jingjing Liu ◽  
Jianfeng Gao ◽  
...  

Previous storytelling approaches mostly focused on optimizing traditional metrics such as BLEU, ROUGE and CIDEr. In this paper, we re-examine this problem from a different angle, by looking deep into what defines a natural and topically-coherent story. To this end, we propose three assessment criteria: relevance, coherence and expressiveness, which we observe through empirical analysis could constitute a “high-quality” story to the human eye. We further propose a reinforcement learning framework, ReCo-RL, with reward functions designed to capture the essence of these quality criteria. Experiments on the Visual Storytelling Dataset (VIST) with both automatic and human evaluation demonstrate that our ReCo-RL model achieves better performance than state-of-the-art baselines on both traditional metrics and the proposed new criteria.


Author(s):  
K. A. A. Mustafa ◽  
N. Botteghi ◽  
B. Sirmacek ◽  
M. Poel ◽  
S. Stramigioli

<p><strong>Abstract.</strong> We introduce a new autonomous path planning algorithm for mobile robots for reaching target locations in an unknown environment where the robot relies on its on-board sensors. In particular, we describe the design and evaluation of a deep reinforcement learning motion planner with continuous linear and angular velocities to navigate to a desired target location based on deep deterministic policy gradient (DDPG). Additionally, the algorithm is enhanced by making use of the available knowledge of the environment provided by a grid-based SLAM with Rao-Blackwellized particle filter algorithm in order to shape the reward function in an attempt to improve the convergence rate, escape local optima and reduce the number of collisions with the obstacles. A comparison is made between a reward function shaped based on the map provided by the SLAM algorithm and a reward function when no knowledge of the map is available. Results show that the required learning time has been decreased in terms of number of episodes required to converge, which is 560 episodes compared to 1450 episodes in the standard RL algorithm, after adopting the proposed approach and the number of obstacle collision is reduced as well with a success ratio of 83% compared to 56% in the standard RL algorithm. The results are validated in a simulated experiment on a skid-steering mobile robot.</p>


Author(s):  
Alberto Camacho ◽  
Rodrigo Toro Icarte ◽  
Toryn Q. Klassen ◽  
Richard Valenzano ◽  
Sheila A. McIlraith

In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatically translated into RMs, easing the burden of complex reward function specification. We then show how the exposed structure of the reward function can be exploited by tailored q-learning algorithms and automated reward shaping techniques in order to improve the sample efficiency of reinforcement learning methods. Experiments show that these RM-tailored techniques significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise cannot reasonably be solved by existing approaches.


2020 ◽  
Vol 13 (4) ◽  
pp. 78
Author(s):  
Nico Zengeler ◽  
Uwe Handmann

We present a deep reinforcement learning framework for an automatic trading of contracts for difference (CfD) on indices at a high frequency. Our contribution proves that reinforcement learning agents with recurrent long short-term memory (LSTM) networks can learn from recent market history and outperform the market. Usually, these approaches depend on a low latency. In a real-world example, we show that an increased model size may compensate for a higher latency. As the noisy nature of economic trends complicates predictions, especially in speculative assets, our approach does not predict courses but instead uses a reinforcement learning agent to learn an overall lucrative trading policy. Therefore, we simulate a virtual market environment, based on historical trading data. Our environment provides a partially observable Markov decision process (POMDP) to reinforcement learners and allows the training of various strategies.


2020 ◽  
Vol 34 (01) ◽  
pp. 1250-1257 ◽  
Author(s):  
Haoxi Zhong ◽  
Yuzhong Wang ◽  
Cunchao Tu ◽  
Tianyang Zhang ◽  
Zhiyuan Liu ◽  
...  

Legal Judgment Prediction (LJP) aims to predict judgment results according to the facts of cases. In recent years, LJP has drawn increasing attention rapidly from both academia and the legal industry, as it can provide references for legal practitioners and is expected to promote judicial justice. However, the research to date usually suffers from the lack of interpretability, which may lead to ethical issues like inconsistent judgments or gender bias. In this paper, we present QAjudge, a model based on reinforcement learning to visualize the prediction process and give interpretable judgments. QAjudge follows two essential principles in legal systems across the world: Presumption of Innocence and Elemental Trial. During inference, a Question Net will select questions from the given set and an Answer Net will answer the question according to the fact description. Finally, a Predict Net will produce judgment results based on the answers. Reward functions are designed to minimize the number of questions asked. We conduct extensive experiments on several real-world datasets. Experimental results show that QAjudge can provide interpretable judgments while maintaining comparable performance with other state-of-the-art LJP models. The codes can be found from https://github.com/thunlp/QAjudge.


2012 ◽  
pp. 695-703
Author(s):  
George Tzanis ◽  
Christos Berberidis ◽  
Ioannis Vlahavas

Machine learning is one of the oldest subfields of artificial intelligence and is concerned with the design and development of computational systems that can adapt themselves and learn. The most common machine learning algorithms can be either supervised or unsupervised. Supervised learning algorithms generate a function that maps inputs to desired outputs, based on a set of examples with known output (labeled examples). Unsupervised learning algorithms find patterns and relationships over a given set of inputs (unlabeled examples). Other categories of machine learning are semi-supervised learning, where an algorithm uses both labeled and unlabeled examples, and reinforcement learning, where an algorithm learns a policy of how to act given an observation of the world.


Author(s):  
George Tzanis ◽  
Christos Berberidis ◽  
Ioannis Vlahavas

Machine learning is one of the oldest subfields of artificial intelligence and is concerned with the design and development of computational systems that can adapt themselves and learn. The most common machine learning algorithms can be either supervised or unsupervised. Supervised learning algorithms generate a function that maps inputs to desired outputs, based on a set of examples with known output (labeled examples). Unsupervised learning algorithms find patterns and relationships over a given set of inputs (unlabeled examples). Other categories of machine learning are semi-supervised learning, where an algorithm uses both labeled and unlabeled examples, and reinforcement learning, where an algorithm learns a policy of how to act given an observation of the world.


Sign in / Sign up

Export Citation Format

Share Document