temporal abstractions
Recently Published Documents


TOTAL DOCUMENTS

37
(FIVE YEARS 6)

H-INDEX

10
(FIVE YEARS 0)

2020 ◽  
Vol 34 (10) ◽  
pp. 13777-13778
Author(s):  
Akshay Dharmavaram ◽  
Matthew Riemer ◽  
Shalabh Bhatnagar

Option-critic learning is a general-purpose reinforcement learning (RL) framework that aims to address the issue of long term credit assignment by leveraging temporal abstractions. However, when dealing with extended timescales, discounting future rewards can lead to incorrect credit assignments. In this work, we address this issue by extending the hierarchical option-critic policy gradient theorem for the average reward criterion. Our proposed framework aims to maximize the long-term reward obtained in the steady-state of the Markov chain defined by the agent's policy. Furthermore, we use an ordinary differential equation based approach for our convergence analysis and prove that the parameters of the intra-option policies, termination functions, and value functions, converge to their corresponding optimal values, with probability one. Finally, we illustrate the competitive advantage of learning options, in the average reward setting, on a grid-world environment with sparse rewards.


2020 ◽  
Vol 34 (04) ◽  
pp. 4444-4451
Author(s):  
Khimya Khetarpal ◽  
Martin Klissarov ◽  
Maxime Chevalier-Boisvert ◽  
Pierre-Luc Bacon ◽  
Doina Precup

Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments.


Author(s):  
Khimya Khetarpal

Learning temporal abstractions which are partial solutions to a task and could be reused for other similar or even more complicated tasks is intuitively an ingredient which can help agents to plan, learn and reason efficiently at multiple resolutions of perceptions and time. Just like humans acquire skills and build on top of already existing skills to solve more complicated tasks, AI agents should be able to learn and develop skills continually, hierarchically and incrementally over time. In my research, I aim to answer the following question: How should an agent efficiently represent, learn and use knowledge of the world in continual tasks? My work builds on the options framework, but provides novel extensions driven by this question. We introduce the notion of interest functions. Analogous to temporally extended actions, we propose learning temporally extended perception. The key idea is to learn temporal abstractions unifying both action and perception.


Author(s):  
Khimya Khetarpal ◽  
Doina Precup

Learning temporal abstractions which are partial solutions to a task and could be reused for solving other tasks is an ingredient that can help agents to plan and learn efficiently. In this work, we tackle this problem in the options framework. We aim to autonomously learn options which are specialized in different state space regions by proposing a notion of interest functions, which generalizes initiation sets from the options framework for function approximation. We build on the option-critic framework to derive policy gradient theorems for interest functions, leading to a new interest-option-critic architecture.


2019 ◽  
Vol 22 (02) ◽  
pp. 1950001
Author(s):  
ALPER DEMİR ◽  
ERKİN ÇİLDEN ◽  
FARUK POLAT

Options framework is one of the prominent models serving as a basis to improve learning speed by means of temporal abstractions. An option is mainly composed of three elements: initiation set, option’s local policy and termination condition. Although various attempts exist that focus on how to derive high-quality termination conditions for a given problem, the impact of initiation set generation is relatively unexplored. In this work, we propose an effective goal-oriented heuristic method to derive useful initiation set elements via an analysis of the recent history of events. Effectiveness of the method is experimented on various benchmark problems, and the results are discussed.


AI Magazine ◽  
2018 ◽  
Vol 39 (1) ◽  
pp. 39-50 ◽  
Author(s):  
Pierre-Luc Bacon ◽  
Doina Precup

The idea of temporal abstraction, i.e. learning, planning and representing the world at multiple time scales, has been a constant thread in AI research, spanning sub-fields from classical planning and search to control and reinforcement learning. For example, programming a robot typically involves making decisions over a set of controllers, rather than working at the level of motor torques. While temporal abstraction is a very natural concept, learning such abstractions with no human input has proved quite daunting. In this paper, we present a general architecture, called option-critic, which allows learning temporal abstractions automatically, end-to-end, simply from the agent’s experience. This approach allows continual learning and provides interesting qualitative and quantitative results in several tasks.


2017 ◽  
Vol 7 (1.1) ◽  
pp. 150 ◽  
Author(s):  
K. Gomathi ◽  
D. Shanmuga Priyaa

Several Data mining techniques have been developed to enhance the prediction accuracy and analyze several events in Coronary Heart Disease (CHD).  One among them was Extended Dynamic Bayesian Network (EDBN) which integrates   temporal abstractions with DBN. Then EDBN was extended as Optimized Semi parametric Extended Dynamic Bayesian Network (OSEDBN) to handle Complex temporal abstractions in irregular interval time series data. The deep learning network is generated the various time points in the next level to improve the analysis and prediction of CHD. In this paper, Optimized Semi parametric Extended Deep Dynamic Bayesian Network (OSEDDBN) is proposed by integrating deep learning architecture with OSEDBN to improve the ability of extracting more important data and support complex structures from various types of input sources. Additionally the Fuzzy Analytic Hierarchy Process (FAHP) approach is used to compute the global weights for the attributes based on their individual contribution. The global weights of the attributes obtained by FAHP are utilized for training OSEDDBN to further improve the prediction of Coronary Heart Disease (CHD) risks. The performance of EDBN, OSEDBN, OSEDDBN, and OSEDDBN-FAHP are evaluated in terms of Precision, Recall and F-Measure.


Sign in / Sign up

Export Citation Format

Share Document