Deep Reinforcement Learning via Past-Success Directed Exploration

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019979 ◽

2019 ◽

Vol 33 ◽

pp. 9979-9980

Author(s):

Xiaoming Liu ◽

Zhixiong Xu ◽

Lei Cao ◽

Xiliang Chen ◽

Kai Kang

Keyword(s):

Online Learning ◽

Adaptive Control ◽

Reinforcement Learning ◽

Learning Process ◽

Control Method ◽

Learning Algorithms ◽

Action Selection ◽

Experimental Results ◽

Continuous Control ◽

Exploration And Exploitation

The balance between exploration and exploitation has always been a core challenge in reinforcement learning. This paper proposes “past-success exploration strategy combined with Softmax action selection”(PSE-Softmax) as an adaptive control method for taking advantage of the characteristics of the online learning process of the agent to adapt exploration parameters dynamically. The proposed strategy is tested on OpenAI Gym with discrete and continuous control tasks, and the experimental results show that PSE-Softmax strategy delivers better performance than deep reinforcement learning algorithms with basic exploration strategies.

Download Full-text

Reinforcement Learning for Cloud Computing Digital Library

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.105 ◽

2014 ◽

Vol 571-572 ◽

pp. 105-108

Author(s):

Lin Xu

Keyword(s):

Artificial Intelligence ◽

Cloud Computing ◽

Reinforcement Learning ◽

Digital Library ◽

Learning Algorithms ◽

Experimental Results ◽

Current Status ◽

Self Learning ◽

New Framework

This paper proposes a new framework of combining reinforcement learning with cloud computing digital library. Unified self-learning algorithms, which includes reinforcement learning, artificial intelligence and etc, have led to many essential advances. Given the current status of highly-available models, analysts urgently desire the deployment of write-ahead logging. In this paper we examine how DNS can be applied to the investigation of superblocks, and introduce the reinforcement learning to improve the quality of current cloud computing digital library. The experimental results show that the method works more efficiency.

Download Full-text

Deterministic Value-Policy Gradients

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5732 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3316-3323

Author(s):

Qingpeng Cai ◽

Ling Pan ◽

Pingzhong Tang

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Infinite Horizon ◽

Gradient Algorithm ◽

Continuous Control ◽

Model Bias ◽

Model Free ◽

Policy Gradient ◽

Analytical Gradients

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.

Download Full-text

The Research and Design of the Adaptive Pulse Demagnetizer

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.182-183.427 ◽

2012 ◽

Vol 182-183 ◽

pp. 427-430

Author(s):

Li Feng Wei ◽

Liang Cheng ◽

Xing Man Yang

Keyword(s):

Adaptive Control ◽

Power Consumption ◽

Low Power ◽

Control Method ◽

Experimental Results ◽

Low Power Consumption ◽

Reliable Operation ◽

Charge Current ◽

Long Life ◽

Research And Design

A adaptive control method of the pulse demagnetizer was presented, Can adjust the strength of the charge current automatically according to the changes of the magnetic content to ensure the constant of the magnetic field.The experimental results have shown that it has the advantages of low power consumption, strong anti-interference capability, stable and reliable operation, long life and good demagnetizing effect, when compared to the conventional demagnetizers.

Download Full-text

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013387 ◽

2019 ◽

Vol 33 ◽

pp. 3387-3395 ◽

Cited By ~ 21

Author(s):

Richard Cheng ◽

Gábor Orosz ◽

Richard M. Murray ◽

Joel W. Burdick

Keyword(s):

Reinforcement Learning ◽

System Dynamics ◽

Learning Process ◽

High Performance ◽

Continuous Control ◽

Barrier Functions ◽

Synthesis Algorithm ◽

Vehicle Communication ◽

Model Free ◽

Vehicle To Vehicle

Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) online learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous carfollowing with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.

Download Full-text

An Adaptive Control Method for Arterial Signal Coordination Based on Deep Reinforcement Learning*

2019 IEEE Intelligent Transportation Systems Conference (ITSC) ◽

10.1109/itsc.2019.8917051 ◽

2019 ◽

Cited By ~ 1

Author(s):

Peng Chen ◽

Zemao Zhu ◽

Guangquan Lu

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

Control Method ◽

Signal Coordination

Download Full-text

QUANTUM COMPUTATION FOR ACTION SELECTION USING REINFORCEMENT LEARNING

International Journal of Quantum Information ◽

10.1142/s0219749906002419 ◽

2006 ◽

Vol 04 (06) ◽

pp. 1071-1083 ◽

Cited By ~ 13

Author(s):

C. L. CHEN ◽

D. Y. DONG ◽

Z. H. CHEN

Keyword(s):

Decision Making ◽

Quantum Theory ◽

Reinforcement Learning ◽

Quantum Computation ◽

Action Selection ◽

Selection Method ◽

Superposition State ◽

Exploration And Exploitation ◽

Quantum Superposition ◽

State Action

This paper proposes a novel action selection method based on quantum computation and reinforcement learning (RL). Inspired by the advantages of quantum computation, the state/action in a RL system is represented with quantum superposition state. The probability of action eigenvalue is denoted by probability amplitude, which is updated according to rewards. And the action selection is carried out by observing quantum state according to collapse postulate of quantum measurement. The results of simulated experiments show that quantum computation can be effectively used to action selection and decision making through speeding up learning. This method also makes a good tradeoff between exploration and exploitation for RL using probability characteristics of quantum theory.

Download Full-text

Deep submergence rescue vehicle docking based on parameter adaptive control with acoustic and visual guidance

International Journal of Advanced Robotic Systems ◽

10.1177/1729881420919955 ◽

2020 ◽

Vol 17 (2) ◽

pp. 172988142091995 ◽

Cited By ~ 1

Author(s):

Yushan Sun ◽

Xiangrui Ran ◽

Jian Cao ◽

Yueming Li

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

Attitude Determination ◽

Control Method ◽

Adaptive Controller ◽

Control Effect ◽

Visual Guidance ◽

Docking Method ◽

Parameter Adaptive

In view of the difficulties in the attitude determination of wrecked submarine and the automatic attitude matching of deep submergence rescue vehicles during the docking and guidance of a submarine rescue vehicle, this study proposes a docking method based on parameter adaptive control with acoustic and visual guidance. This study omits the process of obtaining the information of the wrecked submarine in advance, thus saving considerable detection time and improving rescue efficiency. A parameter adaptive controller based on reinforcement learning is designed. The S-plane and proportional integral derivative controllers are trained through reinforcement learning to obtain the control parameters in the improvement of the environmental adaptability and anti-current ability of deep submarine rescue vehicles. The effectiveness of the proposed method is proved by simulation and pool tests. The comparison experiment shows that the parameter adaptive controller based on reinforcement learning has better control effect, accuracy, and stability than the untrained control method.

Download Full-text

Deep Reinforcement Learning with Adaptive Update Target Combination

The Computer Journal ◽

10.1093/comjnl/bxz066 ◽

2019 ◽

Vol 63 (7) ◽

pp. 995-1003

Author(s):

Z Xu ◽

L Cao ◽

X Chen

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Domain Knowledge ◽

Empirical Evaluation ◽

Superior Performance ◽

High Dimensional ◽

Exploration And Exploitation ◽

Novel Method ◽

Efficient Exploration ◽

Target Combination

Abstract Simple and efficient exploration remains a core challenge in deep reinforcement learning. While many exploration methods can be applied to high-dimensional tasks, these methods manually adjust exploration parameters according to domain knowledge. This paper proposes a novel method that can automatically balance exploration and exploitation, as well as combine on-policy and off-policy update targets through a dynamic weighted way based on value difference. The proposed method does not directly affect the probability of a selected action but utilizes the value difference produced during the learning process to adjust update target for guiding the direction of agent’s learning. We demonstrate the performance of the proposed method on CartPole-v1, MountainCar-v0, and LunarLander-v2 classic control tasks from the OpenAI Gym. Empirical evaluation results show that by integrating on-policy and off-policy update targets dynamically, this method exhibits superior performance and stability than does the exclusive use of the update target.

Download Full-text

Autonomous learning of features for control: Experiments with embodied and situated agents

PLoS ONE ◽

10.1371/journal.pone.0250040 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0250040

Author(s):

Nicola Milano ◽

Stefano Nolfi

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Dimensionality Reduction ◽

Sequence Learning ◽

Learning Algorithms ◽

Alternative Methods ◽

Control Network ◽

Continuous Control ◽

Feature Extracting ◽

Control Optimization

The efficacy of evolutionary or reinforcement learning algorithms for continuous control optimization can be enhanced by including an additional neural network dedicated to features extraction trained through self-supervision. In this paper we introduce a method that permits to continue the training of the features extracting network during the training of the control network. We demonstrate that the parallel training of the two networks is crucial in the case of agents that operate on the basis of egocentric observations and that the extraction of features provides an advantage also in problems that do not benefit from dimensionality reduction. Finally, we compare different feature extracting methods and we show that sequence-to-sequence learning outperforms the alternative methods considered in previous studies.

Download Full-text

Improving reinforcement learning algorithms by the use of data mining techniques for feature and action selection

2010 IEEE International Conference on Systems, Man and Cybernetics ◽

10.1109/icsmc.2010.5642280 ◽

2010 ◽

Cited By ~ 1

Author(s):

Davi C. de L. Vieira ◽

Paulo J. L. Adeodato ◽

Paulo M. Goncalves

Keyword(s):

Data Mining ◽

Reinforcement Learning ◽

Learning Algorithms ◽

Action Selection ◽

Data Mining Techniques ◽

Use Of Data

Download Full-text