An Active Exploration Method for Data Efficient Reinforcement Learning

Abstract Reinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge. One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference for learning control (PILCO) is a state-of-the-art data-efficient framework that uses a Gaussian process to model dynamic systems. However, it only focuses on optimizing cumulative rewards and does not consider the accuracy of a dynamic model, which is an important factor for controller learning. To further improve the data efficiency of PILCO, we propose its active exploration version (AEPILCO) that utilizes information entropy to describe samples. In the policy evaluation stage, we incorporate an information entropy criterion into long-term sample prediction. Through the informative policy evaluation function, our algorithm obtains informative policy parameters in the policy improvement stage. Using the policy parameters in the actual execution produces an informative sample set; this is helpful in learning an accurate dynamic model. Thus, the AEPILCO algorithm improves data efficiency by learning an accurate dynamic model by actively selecting informative samples based on the information entropy criterion. We demonstrate the validity and efficiency of the proposed algorithm for several challenging controller problems involving a cart pole, a pendubot, a double pendulum, and a cart double pendulum. The AEPILCO algorithm can learn a controller using fewer trials compared to PILCO. This is verified through theoretical analysis and experimental results.

Download Full-text

AN INFORMATION ENTROPY CRITERION OF RADIO SIGNAL RECEPTION QUALITY

Telecommunications and Radio Engineering ◽

10.1615/telecomradeng.v70.i5.30 ◽

2011 ◽

Vol 70 (5) ◽

pp. 413-423

Author(s):

E. V. Kravtsov ◽

S. N. Panychev

Keyword(s):

Information Entropy ◽

Radio Signal ◽

Entropy Criterion

Download Full-text

Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee

Automatica ◽

10.1016/j.automatica.2021.109689 ◽

2021 ◽

Vol 129 ◽

pp. 109689

Author(s):

Minghao Han ◽

Yuan Tian ◽

Lixian Zhang ◽

Jun Wang ◽

Wei Pan

Keyword(s):

Reinforcement Learning ◽

Dynamic Systems ◽

Learning Control ◽

Ultimate Boundedness ◽

Uniformly Ultimate Boundedness

Download Full-text

Dynamic Model and Equilibrium Stability of an Inverted Double Pendulum System

Journal of Physics Conference Series ◽

10.1088/1742-6596/1858/1/012004 ◽

2021 ◽

Vol 1858 (1) ◽

pp. 012004

Author(s):

Erwin Susanto ◽

Sigit Yuwono

Keyword(s):

Dynamic Model ◽

Double Pendulum ◽

Equilibrium Stability ◽

Pendulum System

Download Full-text

Reinforcement Learning Tracking Control for Unknown Continuous Dynamic Systems

2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS) ◽

10.1109/ddcls52934.2021.9455473 ◽

2021 ◽

Author(s):

Linqi Ye ◽

Jiayi Li ◽

Changliang Wang ◽

Houde Liu ◽

Bin Liang

Keyword(s):

Reinforcement Learning ◽

Dynamic Systems ◽

Tracking Control ◽

Continuous Dynamic

Download Full-text

Reinforcement Learning Tracking Control for Robotic Manipulator With Kernel-Based Dynamic Model

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2019.2945019 ◽

2020 ◽

Vol 31 (9) ◽

pp. 3570-3578 ◽

Cited By ~ 1

Author(s):

Yazhou Hu ◽

Wenxue Wang ◽

Hao Liu ◽

Lianqing Liu

Keyword(s):

Reinforcement Learning ◽

Dynamic Model ◽

Tracking Control ◽

Robotic Manipulator

Download Full-text

Active exploration for robot parameter selection in episodic reinforcement learning

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) ◽

10.1109/adprl.2011.5967378 ◽

2011 ◽

Cited By ~ 3

Author(s):

Oliver Kroemer ◽

Jan Peters

Keyword(s):

Reinforcement Learning ◽

Parameter Selection ◽

Active Exploration

Download Full-text

Instance-Dependent ℓ∞-Bounds for Policy Evaluation in Tabular Reinforcement Learning

IEEE Transactions on Information Theory ◽

10.1109/tit.2020.3027316 ◽

2021 ◽

Vol 67 (1) ◽

pp. 566-585

Author(s):

Ashwin Pananjady ◽

Martin J. Wainwright

Keyword(s):

Reinforcement Learning ◽

Policy Evaluation

Download Full-text

Logic-Based Sequential Decision-Making

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019995 ◽

2019 ◽

Vol 33 ◽

pp. 9995-9996

Author(s):

Daoming Lyu ◽

Fangkai Yang ◽

Bo Liu ◽

Daesub Yoon

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

High Dimensional ◽

Great Success ◽

Sequential Decision ◽

Sensory Inputs ◽

Hierarchical Decision ◽

High Level ◽

Data Efficiency ◽

Symbolic Planning

Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options. This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.

Download Full-text

A Reinforcement Learning Strategy for the Swing-Up of the Double Pendulum on a Cart

Procedia Manufacturing ◽

10.1016/j.promfg.2018.06.004 ◽

2018 ◽

Vol 24 ◽

pp. 15-20 ◽

Cited By ~ 1

Author(s):

Michael Hesse ◽

Julia Timmermann ◽

Eyke Hüllermeier ◽

Ansgar Trächtler

Keyword(s):

Reinforcement Learning ◽

Learning Strategy ◽

Double Pendulum

Download Full-text

JAZZ MELODY GENERATION USING RECURRENT NETWORKS AND REINFORCEMENT LEARNING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213006002849 ◽

2006 ◽

Vol 15 (04) ◽

pp. 623-650

Author(s):

JUDY A. FRANKLIN

Keyword(s):

Reinforcement Learning ◽

Dynamic Systems ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

State Of The Art ◽

Recurrent Network ◽

Recurrent Networks ◽

Short Term ◽

Long Short Term Memory ◽

Lstm Network

Recurrent (neural) networks have been deployed as models for learning musical processes, by computational scientists who study processes such as dynamic systems. Over time, more intricate music has been learned as the state of the art in recurrent networks improves. One particular recurrent network, the Long Short-Term Memory (LSTM) network shows promise for learning long songs, and generating new songs. We are experimenting with a module containing two inter-recurrent LSTM networks to cooperatively learn several human melodies, based on the songs' harmonic structures, and on the feedback inherent in the network. We show that these networks can learn to reproduce four human melodies. We then present as input new harmonizations, so as to generate new songs. We describe the reharmonizations, and show the new melodies that result. We also present a hierarchical structure for using reinforcement learning to choose LSTM modules during the course of melody generation.

Download Full-text