Implementation of modified Q learning technique  in EMCAP control architecture

This research introduces a self learning modified (Q-Learning) techniques in a EMCAP (Enhanced Mind Cognitive Architecture of pupils). Q-learning is a modelless reinforcement learning (RL) methodology technique. In Specific, Q-learning can be applied to establish an optimal action-selection strategy for any respective Markov decision process. In this research introduces the modified Q-learning in a EMCAP (Enhanced Mind Cognitive Architecture of pupils). EMCAP architecture [1] enables and presents various agent control strategies for static and dynamic environment. Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance.his modified q learning algorithm can be more suitable in EMCAP architecture. The experiments are conducted the modified Q-Learning system gets more rewards compare to existing Q-learning.

Download Full-text

Reinforcement Learning Applied to a Differential Game

Adaptive Behavior ◽

10.1177/105971239500400102 ◽

1995 ◽

Vol 4 (1) ◽

pp. 3-28 ◽

Cited By ~ 15

Author(s):

Mance E. Harmon ◽

Leemon C. Baird ◽

A. Harry Klopf

Keyword(s):

Reinforcement Learning ◽

Differential Game ◽

Learning Algorithm ◽

Learning System ◽

Test Bed ◽

Linear Quadratic ◽

Time Step ◽

Q Learning ◽

Step Duration ◽

Markov Decision

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations. Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small.

Download Full-text

Autonomous Learning Based on Cost Assumptions: Theoretical Studies And Experiments in Robot Control

International Journal of Neural Systems ◽

10.1142/s0129065799000241 ◽

1999 ◽

Vol 09 (03) ◽

pp. 243-249 ◽

Cited By ~ 1

Author(s):

CARLOS H.C. RIBEIRO ◽

ELDER M. HEMERLY

Keyword(s):

Learning Algorithm ◽

Learning System ◽

Autonomous Learning ◽

Sensor Reading ◽

Q Learning ◽

Additional Information ◽

Learning Techniques ◽

Real Robot ◽

New Formulation ◽

Actuator Control

Autonomous learning techniques are based on experience acquisition. In most realistic applications, experience is time-consuming: it implies sensor reading, actuator control and algorithmic update, constrained by the learning system dynamics. The information crudeness upon which classical learning algorithms operate make such problems too difficult and unrealistic, Nonetheless, additional information for facilitating the learning process ideally should be embedded in such a way that the structural, well-studied characteristics of these fundamental algorithms are maintained. We investigate in this article a more general formulation of the Q-learning method that allows for a spreading of information derived from single updates towards a neighbourhood of the instantly visited state and converges to optimality. We show how this new formulation can be used as a mechanism to safely embed prior knowledge about the structure of the state space, and demonstrate it in a modified implementation of a reinforcement learning algorithm in a real robot navigation task.

Download Full-text

Errata: "Autonomous Learning Based on Cost Assumptions: Theoretical Studies and Experiments in Robot Control"

International Journal of Neural Systems ◽

10.1142/s0129065700000077 ◽

2000 ◽

Vol 10 (01) ◽

pp. 71-78

Author(s):

CARLOS H. C. RIBEIRO ◽

ELDER M. HEMERLY

Keyword(s):

Learning Algorithm ◽

Learning System ◽

Autonomous Learning ◽

Sensor Reading ◽

Q Learning ◽

Additional Information ◽

Learning Techniques ◽

Real Robot ◽

New Formulation ◽

Actuator Control

Autonomous learning techniques are based on experience acquisition. In most realistic applications, experience is time-consuming: it implies sensor reading, actuator control and algorithmic update, constrained by the learning system dynamics. The information crudeness upon which classical learning algorithms operate make such problems too difficult and unrealistic. Nonetheless, additional information for facilitating the learning process ideally should be embedded in such a way that the structural, well-studied characteristics of these fundamental algorithms are maintained. We investigate in this article a more general formulation of the Q-learning method that allows for a spreading of information derived from single updates towards a neighbourhood of the instantly visited state and converges to optimality. We show how this new formulation can be used as a mechanism to safely embed prior knowledge about the structure of the state space, and demonstrate it in a modified implementation of a reinforcement learning algorithm in a real robot navigation task.

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text

Implementation of modified SARSA learning technique in EMCAP

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.5.9161 ◽

2017 ◽

Vol 7 (1.5) ◽

pp. 274

Author(s):

D. Ganesha ◽

Vijayakumar Maragal Venkatamuni

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Decision Process ◽

Learning Algorithm ◽

Research Work ◽

Learning System ◽

State Action ◽

Learning Technique ◽

Markov Decision ◽

Experiment Analysis

This research work presents analysis of Modified Sarsa learning algorithm. Modified Sarsa algorithm. State-Action-Reward-State-Action (SARSA) is an technique for learning a Markov decision process (MDP) strategy, used in for reinforcement learning int the field of artificial intelligence (AI) and machine learning (ML). The Modified SARSA Algorithm makes better actions to get better rewards. Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance. This modified SARSA learning algorithm can be more suitable in EMCAP architecture. The experiments are conducted the modified SARSA Learning system gets more rewards compare to existing SARSA algorithm.

Download Full-text

Adaptive Object Tracking via Multi-Angle Analysis Collaboration

Sensors ◽

10.3390/s18113606 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3606 ◽

Cited By ~ 1

Author(s):

Wanli Xue ◽

Zhiyong Feng ◽

Chao Xu ◽

Zhaopeng Meng ◽

Chengwei Zhang

Keyword(s):

Object Tracking ◽

Learning Algorithm ◽

Action Space ◽

Selection Strategy ◽

Multiple Perspectives ◽

Strategic Framework ◽

Practical Applications ◽

Q Learning ◽

State Action ◽

Speed And Accuracy

Although tracking research has achieved excellent performance in mathematical angles, it is still meaningful to analyze tracking problems from multiple perspectives. This motivation not only promotes the independence of tracking research but also increases the flexibility of practical applications. This paper presents a significant tracking framework based on the multi-dimensional state–action space reinforcement learning, termed as multi-angle analysis collaboration tracking (MACT). MACT is comprised of a basic tracking framework and a strategic framework which assists the former. Especially, the strategic framework is extensible and currently includes feature selection strategy (FSS) and movement trend strategy (MTS). These strategies are abstracted from the multi-angle analysis of tracking problems (observer’s attention and object’s motion). The content of the analysis corresponds to the specific actions in the multidimensional action space. Concretely, the tracker, regarded as an agent, is trained with Q-learning algorithm and ϵ -greedy exploration strategy, where we adopt a customized rewarding function to encourage robust object tracking. Numerous contrast experimental evaluations on the OTB50 benchmark demonstrate the effectiveness of the strategies and improvement in speed and accuracy of MACT tracker.

Download Full-text

A novel Q-learning algorithm with function approximation for constrained Markov decision processes

2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) ◽

10.1109/allerton.2012.6483246 ◽

2012 ◽

Cited By ~ 3

Author(s):

K. Lakshmanan ◽

Shalabh Bhatnagar

Keyword(s):

Markov Decision Processes ◽

Function Approximation ◽

Learning Algorithm ◽

Decision Processes ◽

Q Learning ◽

Constrained Markov Decision Processes ◽

Markov Decision

Download Full-text

A Dynamic Holding Approach to Stabilizing a Bus Line Based on the Q-Learning Algorithm with Multistage Look-Ahead

Transportation Science ◽

10.1287/trsc.2021.1048 ◽

2021 ◽

Author(s):

Sheng-Xue He ◽

Jian-Jia He ◽

Shi-Dong Liang ◽

June Qiong Dong ◽

Peng-Cheng Yuan

Keyword(s):

Learning Algorithm ◽

Control Strategies ◽

Waiting Times ◽

Line System ◽

Q Learning ◽

Unstable Phase ◽

Look Ahead ◽

Bus Bunching ◽

Detailed Simulation ◽

Past Experiences

The unreliable service and the unstable operation of a high-frequency bus line are shown as bus bunching and the uneven distribution of headways along the bus line. Although many control strategies, such as the static and dynamic holding strategies, have been proposed to solve the above problems, many of them take on some oversimplified assumptions about the real bus line operation. So it is hard for them to continuously adapt to the evolving complex system. In view of this dynamic setting, we present an adaptive holding method that combines the classic approximate dynamic programming (ADP) with the multistage look-ahead mechanism. The holding time, the only control means used in this study, will be determined by estimating its impact on the operation stability of the bus line system in the remaining observation period. The multistage look-ahead mechanism introduced into the classic Q-learning algorithm of the ADP model makes it easy that the algorithm gets through its earlier unstable phase more quickly and easily. During the implementation of the new holding approach, the past experiences of holding operations can be cumulated effectively into an artificial neural network used to approximate the unavailable Q-factor. The use of a detailed simulation system in the new approach makes it possible to take into account most of the possible causes of instability. The numerical experiments show that the new holding approach can stabilize the system by producing evenly distributed headway and removing bus bunching thoroughly. Compared with the terminal station holding strategies, the new method brings a more reliable bus line with shorter waiting times for passengers.

Download Full-text