scholarly journals E xploration E xploitation Problem in Policy Based Deep Reinforcement Learning for Episodic and Continuous Environments

Author(s):  
Vedang Naik ◽  
◽  
Rohit Sahoo ◽  
Sameer Mahajan ◽  
Saurabh Singh ◽  
...  

Reinforcement learning is an artificial intelligence paradigm that enables intelligent agents to accrue environmental incentives to get superior results. It is concerned with sequential decision-making problems which offer limited feedback. Reinforcement learning has roots in cybernetics and research in statistics, psychology, neurology, and computer science. It has piqued the interest of the machine learning and artificial intelligence groups in the last five to ten years. It promises that it allows you to train agents using rewards and penalties without explaining how the task will be completed. The RL issue may be described as an agent that must make decisions in a given environment to maximize a specified concept of cumulative rewards. The learner is not taught which actions to perform but must experiment to determine which acts provide the greatest reward. Thus, the learner has to actively choose between exploring its environment or exploiting it based on its knowledge. The exploration-exploitation paradox is one of the most common issues encountered while dealing with Reinforcement Learning algorithms. Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. We describe how to utilize several deep reinforcement learning (RL) algorithms for managing a Cartpole system used to represent episodic environments and Stock Market Trading, which is used to describe continuous environments in this study. We explain and demonstrate the effects of different RL ideas such as Deep Q Networks (DQN), Double DQN, and Dueling DQN on learning performance. We also look at the fundamental distinctions between episodic and continuous activities and how the exploration-exploitation issue is addressed in their context.

AI Magazine ◽  
2016 ◽  
Vol 37 (2) ◽  
pp. 85-90 ◽  
Author(s):  
Nisar Ahmed ◽  
Paul Bello ◽  
Selmer Bringsjord ◽  
Micah Clark ◽  
Bradley Hayes ◽  
...  

The Association for the Advancement of Artificial Intelligence presented the 2015 Fall Symposium Series, on Thursday through Saturday, November 12-14, at the Westin Arlington Gateway in Arlington, Virginia. The titles of the six symposia were as follows: AI for Human-Robot Interaction, Cognitive Assistance in Government and Public Sector Applications, Deceptive and Counter-Deceptive Machines, Embedded Machine Learning, Self-Confidence in Autonomous Systems, and Sequential Decision Making for Intelligent Agents. This article contains the reports from four of the symposia.


Author(s):  
Ming-Sheng Ying ◽  
Yuan Feng ◽  
Sheng-Gang Ying

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.


Author(s):  
Grzegorz Musiolik

Artificial intelligence evolves rapidly and will have a great impact on the society in the future. One important question which still cannot be addressed with satisfaction is whether the decision of an intelligent agent can be predicted. As a consequence of this, the general question arises if such agents can be controllable and future robotic applications can be safe. This chapter shows that unpredictable systems are very common in mathematics and physics although the underlying mathematical structure can be very simple. It also shows that such unpredictability can also emerge for intelligent agents in reinforcement learning, especially for complex tasks with various input parameters. An observer would not be capable to distinguish this unpredictability from a free will of the agent. This raises ethical questions and safety issues which are briefly presented.


Author(s):  
Rey Pocius ◽  
Lawrence Neal ◽  
Alan Fern

Commonly used sequential decision making tasks such as the games in the Arcade Learning Environment (ALE) provide rich observation spaces suitable for deep reinforcement learning. However, they consist mostly of low-level control tasks which are of limited use for the development of explainable artificial intelligence(XAI) due to the fine temporal resolution of the tasks. Many of these domains also lack built-in high level abstractions and symbols. Existing tasks that provide for both strategic decision-making and rich observation spaces are either difficult to simulate or are intractable. We provide a set of new strategic decision-making tasks specialized for the development and evaluation of explainable AI methods, built as constrained mini-games within the StarCraft II Learning Environment.


2017 ◽  
Vol 29 (12) ◽  
pp. 2103-2113 ◽  
Author(s):  
Samuel J. Gershman ◽  
Jimmy Zhou ◽  
Cody Kommers

Imagination enables us not only to transcend reality but also to learn about it. In the context of reinforcement learning, an agent can rationally update its value estimates by simulating an internal model of the environment, provided that the model is accurate. In a series of sequential decision-making experiments, we investigated the impact of imaginative simulation on subsequent decisions. We found that imagination can cause people to pursue imagined paths, even when these paths are suboptimal. This bias is systematically related to participants' optimism about how much reward they expect to receive along imagined paths; providing feedback strongly attenuates the effect. The imagination effect can be captured by a reinforcement learning model that includes a bonus added onto imagined rewards. Using fMRI, we show that a network of regions associated with valuation is predictive of the imagination effect. These results suggest that imagination, although a powerful tool for learning, is also susceptible to motivational biases.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Marco P Lehmann ◽  
He A Xu ◽  
Vasiliki Liakoni ◽  
Michael H Herzog ◽  
Wulfram Gerstner ◽  
...  

In many daily tasks, we make multiple decisions before reaching a goal. In order to learn such sequences of decisions, a mechanism to link earlier actions to later reward is necessary. Reinforcement learning (RL) theory suggests two classes of algorithms solving this credit assignment problem: In classic temporal-difference learning, earlier actions receive reward information only after multiple repetitions of the task, whereas models with eligibility traces reinforce entire sequences of actions from a single experience (one-shot). Here, we show one-shot learning of sequences. We developed a novel paradigm to directly observe which actions and states along a multi-step sequence are reinforced after a single reward. By focusing our analysis on those states for which RL with and without eligibility trace make qualitatively distinct predictions, we find direct behavioral (choice probability) and physiological (pupil dilation) signatures of reinforcement learning with eligibility trace across multiple sensory modalities.


2021 ◽  
Author(s):  
Amjad Yousef Majid ◽  
Serge Saaybi ◽  
Tomas van Rietbergen ◽  
Vincent Francois-Lavet ◽  
R Venkatesha Prasad ◽  
...  

<div>Deep Reinforcement Learning (DRL) and Evolution Strategies (ESs) have surpassed human-level control in many sequential decision-making problems, yet many open challenges still exist.</div><div>To get insights into the strengths and weaknesses of DRL versus ESs, an analysis of their respective capabilities and limitations is provided. </div><div>After presenting their fundamental concepts and algorithms, a comparison is provided on key aspects such as scalability, exploration, adaptation to dynamic environments, and multi-agent learning. </div><div>Then, the benefits of hybrid algorithms that combine concepts from DRL and ESs are highlighted. </div><div>Finally, to have an indication about how they compare in real-world applications, a survey of the literature for the set of applications they support is provided.</div>


Author(s):  
Dongliang He ◽  
Xiang Zhao ◽  
Jizhou Huang ◽  
Fu Li ◽  
Xiao Liu ◽  
...  

The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos. Existing studies have adopted strategies of sliding window over the entire video or exhaustively ranking all possible clip-sentence pairs in a presegmented video, which inevitably suffer from exhaustively enumerated candidates. To alleviate this problem, we formulate this task as a problem of sequential decision making by learning an agent which regulates the temporal grounding boundaries progressively based on its policy. Specifically, we propose a reinforcement learning based framework improved by multi-task learning and it shows steady performance gains by considering additional supervised boundary information during training. Our proposed framework achieves state-of-the-art performance on ActivityNet’18 DenseCaption dataset (Krishna et al. 2017) and Charades-STA dataset (Sigurdsson et al. 2016; Gao et al. 2017) while observing only 10 or less clips per video.


Sign in / Sign up

Export Citation Format

Share Document