Exploration Entropy for Reinforcement Learning

The training process analysis and termination condition of the training process of a Reinforcement Learning (RL) system have always been the key issues to train an RL agent. In this paper, a new approach based on State Entropy and Exploration Entropy is proposed to analyse the training process. The concept of State Entropy is used to denote the uncertainty for an RL agent to select the action at every state that the agent will traverse, while the Exploration Entropy denotes the action selection uncertainty of the whole system. Actually, the action selection uncertainty of a certain state or the whole system reflects the degree of exploration and the stage of the learning process for an agent. The Exploration Entropy is a new criterion to analyse and manage the training process of RL. The theoretical analysis and experiment results illustrate that the curve of Exploration Entropy contains more information than the existing analytical methods.

Download Full-text

A New Criterion Using Information Gain for Action Selection Strategy in Reinforcement Learning

IEEE Transactions on Neural Networks ◽

10.1109/tnn.2004.828760 ◽

2004 ◽

Vol 15 (4) ◽

pp. 792-799 ◽

Cited By ~ 12

Author(s):

K. Iwata ◽

K. Ikeda ◽

H. Sakai

Keyword(s):

Reinforcement Learning ◽

Information Gain ◽

Action Selection ◽

Selection Strategy ◽

New Criterion

Download Full-text

Deep Reinforcement Learning via Past-Success Directed Exploration

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019979 ◽

2019 ◽

Vol 33 ◽

pp. 9979-9980

Author(s):

Xiaoming Liu ◽

Zhixiong Xu ◽

Lei Cao ◽

Xiliang Chen ◽

Kai Kang

Keyword(s):

Online Learning ◽

Adaptive Control ◽

Reinforcement Learning ◽

Learning Process ◽

Control Method ◽

Learning Algorithms ◽

Action Selection ◽

Experimental Results ◽

Continuous Control ◽

Exploration And Exploitation

The balance between exploration and exploitation has always been a core challenge in reinforcement learning. This paper proposes “past-success exploration strategy combined with Softmax action selection”(PSE-Softmax) as an adaptive control method for taking advantage of the characteristics of the online learning process of the agent to adapt exploration parameters dynamically. The proposed strategy is tested on OpenAI Gym with discrete and continuous control tasks, and the experimental results show that PSE-Softmax strategy delivers better performance than deep reinforcement learning algorithms with basic exploration strategies.

Download Full-text

Applying and Augmenting Deep Reinforcement Learning in Serious Games through Interaction

Periodica Polytechnica Electrical Engineering and Computer Science ◽

10.3311/ppee.10313 ◽

2017 ◽

Vol 61 (2) ◽

pp. 198 ◽

Cited By ~ 4

Author(s):

Aline Dobrovsky ◽

Uwe M. Borghoff ◽

Marko Hofmann

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Serious Games ◽

Adaptive Behaviour ◽

Cognitive Systems ◽

New Approach ◽

Domain Experts ◽

E Learning ◽

Short Horizon ◽

And Training

Serious games belong to the most important future e-learning trends and are frequently used in recruitment and training. Their development, however, is still a demanding and tedious process, especially when regarding reasonable non-player character behaviour. Serious games can generally profit from diverse, adaptive behaviour to increase learning effectiveness. Deep reinforcement learning has already shown considerable results in automatically generating successful AI behaviour, but its past applications were mainly focused on optimization and short-horizon games. To expand the underlying ideas to serious games, we introduce a new approach of augmenting the application of deep reinforcement learning methods by interactively making use of domain experts’ knowledge to guide the learning process. Thereby, we aim to establish a synergistic combination of experts and emergent cognitive systems to create adaptive and more human behaviour. We call this approach interactive deep reinforcement learning and point out important aspects regarding realization within a novel framework.

Download Full-text

Forced ε-Greedy, an Expansion to the ε-Greedy Action Selection Method

10.3233/faia210070 ◽

2021 ◽

Author(s):

George Angelopoulos ◽

Dimitris Metafas

Keyword(s):

Reinforcement Learning ◽

Action Selection ◽

Selection Method ◽

Board Game ◽

Selection Methods ◽

Training Process ◽

Greedy Method ◽

Learning Methods ◽

Q Learning ◽

Time Required

Reinforcement Learning methods such as Q Learning, make use of action selection methods, in order to train an agent to perform a task. As the complexity of the task grows, so does the time required to train the agent. In this paper Q Learning is applied onto the board game Dominion, and Forced ε-greedy, an expansion to the ε-greedy action selection method is introduced. As shown in this paper the Forced ε-greedy method achieves to accelerate the training process and optimize its results, especially as the complexity of the task grows.

Download Full-text

Goal-driven active learning

Autonomous Agents and Multi-Agent Systems ◽

10.1007/s10458-021-09527-5 ◽

2021 ◽

Vol 35 (2) ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Process ◽

Real World ◽

Imitation Learning ◽

Learning Approaches ◽

Wide Range ◽

Fixed Set ◽

Complex Decision Making ◽

Complex Decision

AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.

Download Full-text

Diversity oriented Deep Reinforcement Learning for targeted molecule generation

Journal of Cheminformatics ◽

10.1186/s13321-021-00498-z ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Tiago Pereira ◽

Maryam Abbasi ◽

Bernardete Ribeiro ◽

Joel P. Arrais

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Reinforcement Learning ◽

Deep Neural Networks ◽

Chemical Space ◽

Biological Properties ◽

Training Process ◽

Training Strategy ◽

Inhibitory Power ◽

Exploratory Strategy

AbstractIn this work, we explore the potential of deep learning to streamline the process of identifying new potential drugs through the computational generation of molecules with interesting biological properties. Two deep neural networks compose our targeted generation framework: the Generator, which is trained to learn the building rules of valid molecules employing SMILES strings notation, and the Predictor which evaluates the newly generated compounds by predicting their affinity for the desired target. Then, the Generator is optimized through Reinforcement Learning to produce molecules with bespoken properties. The innovation of this approach is the exploratory strategy applied during the reinforcement training process that seeks to add novelty to the generated compounds. This training strategy employs two Generators interchangeably to sample new SMILES: the initially trained model that will remain fixed and a copy of the previous one that will be updated during the training to uncover the most promising molecules. The evolution of the reward assigned by the Predictor determines how often each one is employed to select the next token of the molecule. This strategy establishes a compromise between the need to acquire more information about the chemical space and the need to sample new molecules, with the experience gained so far. To demonstrate the effectiveness of the method, the Generator is trained to design molecules with an optimized coefficient of partition and also high inhibitory power against the Adenosine $$A_{2A}$$ A 2 A and $$\kappa$$ κ opioid receptors. The results reveal that the model can effectively adjust the newly generated molecules towards the wanted direction. More importantly, it was possible to find promising sets of unique and diverse molecules, which was the main purpose of the newly implemented strategy.

Download Full-text

Methods for acceleration of learning process of Reinforcement Learning Neuro-Fuzzy Hierarchical Politree model

2010 International Conference on Autonomous and Intelligent Systems, AIS 2010 ◽

10.1109/ais.2010.5547027 ◽

2010 ◽

Author(s):

Fabio Martins ◽

Karla Figueiredo ◽

Marley Vellasco

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Neuro Fuzzy

Download Full-text

A new approach for structural credit assignment in distributed reinforcement learning systems

2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422) ◽

10.1109/robot.2003.1241758 ◽

2004 ◽

Author(s):

Zhong Yu ◽

Gu Guochang ◽

Zhang Rubo

Keyword(s):

Reinforcement Learning ◽

Learning Systems ◽

Credit Assignment ◽

New Approach ◽

Distributed Reinforcement

Download Full-text

A strategy learning model for autonomous agents based on classification

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2015-0035 ◽

2015 ◽

Vol 25 (3) ◽

pp. 471-482 ◽

Cited By ~ 7

Author(s):

Bartłomiej Śnieżyński

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Learning Process ◽

Autonomous Agents ◽

Good Alternative ◽

Learning Model ◽

Learning Method ◽

Complex Environments ◽

Agent Based ◽

Proposed Model

AbstractIn this paper we propose a strategy learning model for autonomous agents based on classification. In the literature, the most commonly used learning method in agent-based systems is reinforcement learning. In our opinion, classification can be considered a good alternative. This type of supervised learning can be used to generate a classifier that allows the agent to choose an appropriate action for execution. Experimental results show that this model can be successfully applied for strategy generation even if rewards are delayed. We compare the efficiency of the proposed model and reinforcement learning using the farmer-pest domain and configurations of various complexity. In complex environments, supervised learning can improve the performance of agents much faster that reinforcement learning. If an appropriate knowledge representation is used, the learned knowledge may be analyzed by humans, which allows tracking the learning process

Download Full-text

Simulation and Research of Dynamic Voltage Detection and Regulator Based on MATLAB

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.536-537.1527 ◽

2014 ◽

Vol 536-537 ◽

pp. 1527-1531

Author(s):

Ya Feng Li ◽

Zi Wei Zheng

Keyword(s):

Theoretical Analysis ◽

Reactive Power ◽

Voltage Regulator ◽

New Approach ◽

Dynamic Voltage ◽

Harmonic Voltage ◽

Simulation Results ◽

Harmonic Source ◽

Detecting Method ◽

Dynamic Voltage Regulator

The Series Dynamic Voltage Regulator can compensate the harmonics distortion caused by voltage type harmonic source This paper presents a new approach of detecting harmonic voltage in dq0 coordinates, based on the generalized instantaneous reactive power ,and used in the series dynamic voltage regulator successfully. It is demonstrated by theoretical analysis and simulation results that the proposed detecting method of harmonic voltage is correct and valid.

Download Full-text