A Q-Learning Hyperheuristic Binarization Framework to Balance Exploration and Exploitation

Variational Bayesian Exploration-Based Active Sarsa Algorithm

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419510054 ◽

2019 ◽

Vol 33 (10) ◽

pp. 1951005 ◽

Cited By ~ 1

Author(s):

Qiming Fu ◽

Zhengxia Yang ◽

You Lu ◽

Hongjie Wu ◽

Fuyuan Hu ◽

...

Keyword(s):

Learning Process ◽

Bayesian Method ◽

Value Function ◽

Information Gain ◽

Learning Algorithm ◽

Exploration And Exploitation ◽

Variational Bayesian ◽

Q Learning ◽

Reward Function ◽

The Value Function

We proposed an improved variational Bayesian exploration-based active Sarsa (VBE-ASAR) algorithm, which tries to balance the exploration and exploitation dilemma, and speeds up the convergence rate. First, in the learning process, variational Bayesian method is adopted to measure the information gain, which is used as an exploration factor to construct an internal reward function for heuristic exploration. In addition, before the learning process, in order to improve the exploration performance, transfer learning is used to initialize the value function, where Bisimulation metric is introduced to measure the distance between two states from the source MDP and the target MDP, respectively. Finally, we apply the proposed algorithm to the cliff walking problem, and compare with the Sarsa algorithm, the Q-Learning algorithm, the VFT-Sarsa algorithm and the Bayesian Sarsa (BS) algorithm. Experimental results show that the VBE-ASAR algorithm has a faster learning rate.

Download Full-text

A Memory-Greedy Policy With Guaranteed Convergence for Accelerating Reinforcement Learning

Journal of Autonomous Vehicles and Systems ◽

10.1115/1.4049539 ◽

2021 ◽

Vol 1 (1) ◽

Author(s):

Xinglin Yu ◽

Yuhu Wu ◽

Xi-Ming Sun ◽

Wenya Zhou

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

Memory Storage ◽

Exploration And Exploitation ◽

Q Learning ◽

Greedy Policy

Abstract Balancing the exploration and exploitation in reinforcement learning is a commonly dilemma and time-wasting work. In this paper, a novel exploration policy used in Q-Learning, called Memory-greedy policy, is proposed to accelerate learning. By memory storage and playback, the probability of random action selecting can be effectively dealt with or reduced, which hence speeds up learning. The principle of this policy is analyzed by maze scene, and the theoretical convergence is given according to dynamic programming.

Download Full-text

An Advance Q Learning (AQL) Approach for Path Planning and Obstacle Avoidance of a Mobile Robot

International Journal of Intelligent Mechatronics and Robotics ◽

10.4018/ijimr.2013010105 ◽

2013 ◽

Vol 3 (1) ◽

pp. 53-73 ◽

Cited By ~ 9

Author(s):

Arpita Chakraborty ◽

Jyoti Sekhar Banerjee

Keyword(s):

Path Planning ◽

Learning Algorithm ◽

Challenging Problem ◽

Q Value ◽

Exploration And Exploitation ◽

Planning Time ◽

Q Learning ◽

Reward Function ◽

Learning Technique ◽

Fuzzy Parameters

The goal of this paper is to improve the performance of the well known Q learning algorithm, the robust technique of Machine learning to facilitate path planning in an environment. Until this time the Q learning algorithms like Classical Q learning(CQL)algorithm and Improved Q learning (IQL) algorithm deal with an environment without obstacles, while in a real environment an agent has to face obstacles very frequently. Hence this paper considers an environment with number of obstacles and has coined a new parameter, called ‘immediate penalty’ due to collision with an obstacle. Further the proposed technique has replaced the scalar ‘immediate reward’ function by ‘effective immediate reward’ function which consists of two fuzzy parameters named as, ‘immediate reward’ and ‘immediate penalty’. The fuzzification of these two important parameters not only improves the learning technique, it also strikes a balance between exploration and exploitation, the most challenging problem of Reinforcement Learning. The proposed algorithm stores the Q value for the best possible action at a state; as well it saves significant path planning time by suggesting the best action to adopt at each state to move to the next state. Eventually, the agent becomes more intelligent as it can smartly plan a collision free path avoiding obstacles from distance. The validation of the algorithm is studied through computer simulation in a maze like environment and also on KheperaII platform in real time. An analysis reveals that the Q Table, obtained by the proposed Advanced Q learning (AQL) algorithm, when used for path-planning application of mobile robots outperforms the classical and improved Q-learning.

Download Full-text