scholarly journals Exploiting the Structural Properties of the Underlying Markov Decision Problem in the Q-Learning Algorithm

2008 ◽  
Vol 20 (2) ◽  
pp. 288-301 ◽  
Author(s):  
Sumit Kunnumkal ◽  
Huseyin Topaloglu
2017 ◽  
Vol 7 (1.5) ◽  
pp. 269
Author(s):  
D. Ganesha ◽  
Vijayakumar Maragal Venkatamuni

This research introduces a self learning modified (Q-Learning) techniques in a EMCAP (Enhanced Mind Cognitive Architecture of pupils). Q-learning is a modelless reinforcement learning (RL) methodology technique. In Specific, Q-learning can be applied to establish an optimal action-selection strategy for any respective Markov decision process. In this research introduces the modified Q-learning in a EMCAP (Enhanced Mind Cognitive Architecture of pupils). EMCAP architecture [1] enables and presents various agent control strategies for static and dynamic environment.  Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance.his modified q learning algorithm can be more suitable in EMCAP architecture.  The experiments are conducted the modified Q-Learning system gets more rewards compare to existing Q-learning.


2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Agostino Nuzzolo ◽  
Antonio Comi

A behavioural modelling framework with a dynamic travel strategy path choice approach is presented for unreliable multiservice transit networks. The modelling framework is especially suitable for dynamic run-oriented simulation models that use subjective strategy-based path choice models. After an analysis of the travel strategy approach in unreliable transit networks with the related hyperpaths, the search for the optimal strategy as a Markov decision problem solution is considered. The new modelling framework is then presented and applied to a real network. The paper concludes with an overview of the benefits of the new behavioural framework and outlines scope for further research.


1995 ◽  
Vol 4 (1) ◽  
pp. 3-28 ◽  
Author(s):  
Mance E. Harmon ◽  
Leemon C. Baird ◽  
A. Harry Klopf

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations. Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small.


2012 ◽  
Vol 39 (4) ◽  
pp. 38-38
Author(s):  
Eric V. Denardo ◽  
Eugene A. Feinberg ◽  
Uriel G. Rothblum

1979 ◽  
Vol 16 (2) ◽  
pp. 305-318 ◽  
Author(s):  
P. Whittle

A simple condition (the ‘bridging condition') is given for a Markov decision problem with non-negative costs to enjoy the regularity properties enunciated in Theorem 1. The bridging condition is sufficient for regularity, and is not far from being necessary, in a sense explained in Section 2. In Section 8 we consider the different classes of terminal loss functions (domains of attraction) associated with different solutions of (14). Some conjectures concerning these domains of attraction are either proved, or disproved by counter-example.


Sign in / Sign up

Export Citation Format

Share Document