Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation

10.36227/techrxiv.15105492 ◽

2021 ◽

Author(s):

Tiantian Zhang ◽

Xueqian Wang ◽

Bin Liang ◽

Bo Yuan

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Training Data ◽

Learning Ability ◽

High Dimensional ◽

Online Clustering ◽

Computational Overhead ◽

Knowledge Distillation ◽

The Stability ◽

Catastrophic Interference

The powerful learning ability of deep neural networks enables reinforcement learning (RL) agents to learn competent control policies directly from high-dimensional and continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general RL paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of "catastrophic interference" (a.k.a. "catastrophic forgetting") and the collapse in performance as later training is likely to overwrite and interfer with previously learned good policies. In this paper, we introduce the concept of "context" into the single-task RL and develop a novel scheme, termed as Context Division and Knowledge Distillation (CDaKD) driven RL, to divide all states experienced during training into a series of contexts. Its motivation is to mitigate the challenge of aforementioned catastrophic interference in deep RL, thereby improving the stability and plasticity of RL models. At the heart of CDaKD is a value function, parameterized by a neural network feature extractor shared across all contexts, and a set of output heads, each specializing on an individual context. In CDaKD, we exploit online clustering to achieve context division, and interference is further alleviated by a knowledge distillation regularization term on the output layers for learned contexts. In addition, to effectively obtain the context division in high-dimensional state spaces (e.g., image inputs), we perform clustering in the lower-dimensional representation space of a randomly initialized convolutional encoder, which is fixed throughout training. Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms on classic OpenAI Gym tasks and the more complex high-dimensional Atari tasks, incurring only moderate computational overhead.

Download Full-text

Generating stable molecules using imitation and reinforcement learning

Machine Learning: Science and Technology ◽

10.1088/2632-2153/ac3eb4 ◽

2021 ◽

Author(s):

Søren Ager Meldgaard ◽

Jonas Köhler ◽

Henrik Lund Mortensen ◽

Mads-Peter Verner Christiansen ◽

Frank Noé ◽

...

Keyword(s):

Reinforcement Learning ◽

Chemical Space ◽

Training Data ◽

Graph Representation ◽

Imitation Learning ◽

Training Set ◽

Machine Learning Methods ◽

Multiple Copies ◽

The Stability ◽

3D Information

Abstract Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a reinforcement learning setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how reinforcement learning further refines the imitation learning model in domains far from the training data.

Download Full-text

Deep Neural Networks for High Dimension, Low Sample Size Data

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/318 ◽

2017 ◽

Cited By ~ 24

Author(s):

Bo Liu ◽

Ying Wei ◽

Yu Zhang ◽

Qiang Yang

Keyword(s):

Neural Networks ◽

Sample Size ◽

High Dimension ◽

Deep Neural Networks ◽

Genetic Data ◽

High Dimensional ◽

Large Sample Size ◽

Prediction Problem ◽

The Stability ◽

Size Data

Deep neural networks (DNN) have achieved breakthroughs in applications with large sample size. However, when facing high dimension, low sample size (HDLSS) data, such as the phenotype prediction problem using genetic data in bioinformatics, DNN suffers from overfitting and high-variance gradients. In this paper, we propose a DNN model tailored for the HDLSS data, named Deep Neural Pursuit (DNP). DNP selects a subset of high dimensional features for the alleviation of overfitting and takes the average over multiple dropouts to calculate gradients with low variance. As the first DNN method applied on the HDLSS data, DNP enjoys the advantages of the high nonlinearity, the robustness to high dimensionality, the capability of learning from a small number of samples, the stability in feature selection, and the end-to-end training. We demonstrate these advantages of DNP via empirical results on both synthetic and real-world biological datasets.

Download Full-text

Graph-Free Knowledge Distillation for Graph Neural Networks

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/320 ◽

2021 ◽

Author(s):

Xiang Deng ◽

Zhongfei Zhang

Keyword(s):

Neural Networks ◽

Back Propagation ◽

Multinomial Distribution ◽

Large Data ◽

Training Data ◽

Grid Data ◽

Continuous Space ◽

Graph Data ◽

Knowledge Distillation ◽

Graph Neural Networks

Knowledge distillation (KD) transfers knowledge from a teacher network to a student by enforcing the student to mimic the outputs of the pretrained teacher on training data. However, data samples are not always accessible in many cases due to large data sizes, privacy, or confidentiality. Many efforts have been made on addressing this problem for convolutional neural networks (CNNs) whose inputs lie in a grid domain within a continuous space such as images and videos, but largely overlook graph neural networks (GNNs) that handle non-grid data with different topology structures within a discrete space. The inherent differences between their inputs make these CNN-based approaches not applicable to GNNs. In this paper, we propose to our best knowledge the first dedicated approach to distilling knowledge from a GNN without graph data. The proposed graph-free KD (GFKD) learns graph topology structures for knowledge transfer by modeling them with multinomial distribution. We then introduce a gradient estimator to optimize this framework. Essentially, the gradients w.r.t. graph structures are obtained by only using GNN forward-propagation without back-propagation, which means that GFKD is compatible with modern GNN libraries such as DGL and Geometric. Moreover, we provide the strategies for handling different types of prior knowledge in the graph data or the GNNs. Extensive experiments demonstrate that GFKD achieves the state-of-the-art performance for distilling knowledge from GNNs without training data.

Download Full-text

Deep-Reinforcement Learning-Based Co-Evolution in a Predator–Prey System

Entropy ◽

10.3390/e21080773 ◽

2019 ◽

Vol 21 (8) ◽

pp. 773 ◽

Cited By ~ 2

Author(s):

Xueting Wang ◽

Jun Cheng ◽

Lei Wang

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Behavior Pattern ◽

Learning Ability ◽

Predator Prey ◽

Complex Processes ◽

Proposed Model ◽

The Stability ◽

Reasonable Behavior ◽

Evolutionary Networks

Understanding or estimating the co-evolution processes is critical in ecology, but very challenging. Traditional methods are difficult to deal with the complex processes of evolution and to predict their consequences on nature. In this paper, we use the deep-reinforcement learning algorithms to endow the organism with learning ability, and simulate their evolution process by using the Monte Carlo simulation algorithm in a large-scale ecosystem. The combination of the two algorithms allows organisms to use experiences to determine their behavior through interaction with that environment, and to pass on experience to their offspring. Our research showed that the predators’ reinforcement learning ability contributed to the stability of the ecosystem and helped predators obtain a more reasonable behavior pattern of coexistence with its prey. The reinforcement learning effect of prey on its own population was not as good as that of predators and increased the risk of extinction of predators. The inconsistent learning periods and speed of prey and predators aggravated that risk. The co-evolution of the two species had resulted in fewer numbers of their populations due to their potentially antagonistic evolutionary networks. If the learnable predators and prey invade an ecosystem at the same time, prey had an advantage. Thus, the proposed model illustrates the influence of learning mechanism on a predator–prey ecosystem and demonstrates the feasibility of predicting the behavior evolution in a predator–prey ecosystem using AI approaches.

Download Full-text

A reinforcement learning algorithm for neural networks with incremental learning ability

Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02. ◽

10.1109/iconip.2002.1201958 ◽

2002 ◽

Cited By ~ 8

Author(s):

N. Shiraga ◽

S. Ozawa ◽

S. Abe

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Incremental Learning ◽

Learning Algorithm ◽

Learning Ability ◽

Reinforcement Learning Algorithm

Download Full-text

Neural Networks with Online Sequential Learning Ability for a Reinforcement Learning Algorithm

Smart Innovation, Systems and Technologies - Advanced Computing, Networking and Informatics- Volume 1 ◽

10.1007/978-3-319-07353-8_11 ◽

2014 ◽

pp. 87-99

Author(s):

Hitesh Shah ◽

Madan Gopal

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Learning Ability ◽

Sequential Learning ◽

Online Sequential Learning ◽

Reinforcement Learning Algorithm

Download Full-text

Feedforward Neural Networks in Reinforcement Learning Applied to High-Dimensional Motor Control

Lecture Notes in Computer Science - Algorithmic Learning Theory ◽

10.1007/3-540-36169-3_32 ◽

2002 ◽

pp. 403-413 ◽

Cited By ~ 7

Author(s):

Rémi Coulom

Keyword(s):

Neural Networks ◽

Motor Control ◽

Reinforcement Learning ◽

Feedforward Neural Networks ◽

High Dimensional

Download Full-text

Averaged Soft Actor-Critic for Deep Reinforcement Learning

Complexity ◽

10.1155/2021/6658724 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Feng Ding ◽

Guanfeng Ma ◽

Zhikui Chen ◽

Jing Gao ◽

Peng Li

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Large Scale ◽

Experimental Results ◽

High Dimensional ◽

Training Process ◽

Value Network ◽

Q Learning ◽

Important Impact ◽

The Stability

With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged-SAC. By averaging the previously learned action-state estimates, it reduces the overestimation problem of soft Q-learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged-SAC through some games in the MuJoCo environment. The experimental results show that the Averaged-SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process.

Download Full-text