Controlling Agents by Constrained Policy Updates

Mónika Farsang; Luca Szegletes

doi:10.52846/stccj.2021.1.2.24

Controlling Agents by Constrained Policy Updates

SYSTEM THEORY, CONTROL AND COMPUTING JOURNAL ◽

10.52846/stccj.2021.1.2.24 ◽

2021 ◽

Vol 1 (2) ◽

pp. 33-39

Author(s):

Mónika Farsang ◽

Luca Szegletes

Keyword(s):

Gradient Methods ◽

Poor Performance ◽

High Dimensional ◽

Complex Behavior ◽

Clear Trend ◽

Learned Behavior ◽

Optimal Behavior ◽

Policy Gradient ◽

Low Dimensional ◽

Policy Optimization

Learning the optimal behavior is the ultimate goal in reinforcement learning. This can be achieved by many different approaches, the most successful of them are policy gradient methods. However, they can suffer from undesirably large updates of policies, leading to poor performance. In recent years there has been a clear trend toward designing more reliable algorithms. This paper addresses to examine different restriction strategies applied to the widely used Proximal Policy Optimization (PPO-Clip) technique. We also question whether the analyzed methods are able to adapt not only to low-dimensional tasks but also to complex, high-dimensional problems in control and robotic domains. The analysis of the learned behavior shows that these methods can lead to better performance compared to the original PPO-Clip algorithm, moreover, they are also able to achieve complex behavior and policies in high-dimensional environments.

Download Full-text

Accelerating the training of deep reinforcement learning in autonomous driving

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i3.pp649-656 ◽

2021 ◽

Vol 10 (3) ◽

pp. 649

Author(s):

Emmanuel Ifeanyi Iroegbu ◽

Devaraj Madhavi

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicle ◽

Autonomous Driving ◽

High Dimensional ◽

Training Time ◽

Learning Agent ◽

Policy Gradient ◽

Low Dimensional ◽

Policy Optimization

Deep reinforcement learning has been successful in solving common autonomous driving tasks such as lane-keeping by simply using pixel data from the front view camera as input. However, raw pixel data contains a very high-dimensional observation that affects the learning quality of the agent due to the complexity imposed by a 'realistic' urban environment. Ergo, we investigate how compressing the raw pixel data from high-dimensional state to low-dimensional latent space offline using a variational autoencoder can significantly improve the training of a deep reinforcement learning agent. We evaluated our method on a simulated autonomous vehicle in car learning to act and compared our results with many baselines including deep deterministic policy gradient, proximal policy optimization, and soft actorcritic. The result shows that the method greatly accelerates the training time and there was a remarkable improvement in the quality of the deep reinforcement learning agent.

Download Full-text

Policy Optimization with Second-Order Advantage Information

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/699 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jiajin Li ◽

Baoxiang Wang ◽

Shengyu Zhang

Keyword(s):

Empirical Studies ◽

Second Order ◽

High Dimensional ◽

Continuous Control ◽

Unified Framework ◽

Performance Improvements ◽

Factorization Structure ◽

Policy Gradient ◽

Policy Optimization ◽

And Control

Policy optimization on high-dimensional continuous control tasks exhibits its difficulty caused by the large variance of the policy gradient estimators. We present the action subspace dependent gradient (ASDG) estimator which incorporates the Rao-Blackwell theorem (RB) and Control Variates (CV) into a unified framework to reduce the variance. To invoke RB, our proposed algorithm (POSA) learns the underlying factorization structure among the action space based on the second-order advantage information. POSA captures the quadratic information explicitly and efficiently by utilizing the wide \& deep architecture. Empirical studies show that our proposed approach demonstrates the performance improvements on high-dimensional synthetic settings and OpenAI Gym's MuJoCo continuous control tasks.

Download Full-text

A randomized block policy gradient algorithm with differential privacy in Content Centric Networks

International Journal of Distributed Sensor Networks ◽

10.1177/15501477211059934 ◽

2021 ◽

Vol 17 (12) ◽

pp. 155014772110599

Author(s):

Lin Wang ◽

Xingang Xu ◽

Xuhui Zhao ◽

Baozhu Li ◽

Ruijuan Zheng ◽

...

Keyword(s):

Privacy Protection ◽

Differential Privacy ◽

Effective Means ◽

High Dimensional Data ◽

Computational Cost ◽

Gradient Methods ◽

Multimedia Data ◽

Gradient Algorithm ◽

High Dimensional ◽

Policy Gradient

Policy gradient methods are effective means to solve the problems of mobile multimedia data transmission in Content Centric Networks. Current policy gradient algorithms impose high computational cost in processing high-dimensional data. Meanwhile, the issue of privacy disclosure has not been taken into account. However, privacy protection is important in data training. Therefore, we propose a randomized block policy gradient algorithm with differential privacy. In order to reduce computational complexity when processing high-dimensional data, we randomly select a block coordinate to update the gradients at each round. To solve the privacy protection problem, we add a differential privacy protection mechanism to the algorithm, and we prove that it preserves the [Formula: see text]-privacy level. We conduct extensive simulations in four environments, which are CartPole, Walker, HalfCheetah, and Hopper. Compared with the methods such as important-sampling momentum-based policy gradient, Hessian-Aided momentum-based policy gradient, REINFORCE, the experimental results of our algorithm show a faster convergence rate than others in the same environment.

Download Full-text

Visual Navigation with Asynchronous Proximal Policy Optimization in Artificial Agents

Journal of Robotics ◽

10.1155/2020/8702962 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7

Author(s):

Fanyu Zeng ◽

Chen Wang

Keyword(s):

Reinforcement Learning ◽

Gradient Descent ◽

Gradient Methods ◽

Visual Navigation ◽

Experimental Results ◽

Artificial Agents ◽

Policy Gradient ◽

Policy Optimization ◽

Navigation Method ◽

Better Than

Vanilla policy gradient methods suffer from high variance, leading to unstable policies during training, where the policy’s performance fluctuates drastically between iterations. To address this issue, we analyze the policy optimization process of the navigation method based on deep reinforcement learning (DRL) that uses asynchronous gradient descent for optimization. A variant navigation (asynchronous proximal policy optimization navigation, appoNav) is presented that can guarantee the policy monotonic improvement during the process of policy optimization. Our experiments are tested in DeepMind Lab, and the experimental results show that the artificial agents with appoNav perform better than the compared algorithm.

Download Full-text

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

Operations Research ◽

10.1287/opre.2021.2151 ◽

2021 ◽

Author(s):

Shicong Cen ◽

Chen Cheng ◽

Yuxin Chen ◽

Yuting Wei ◽

Yuejie Chi

Keyword(s):

Reinforcement Learning ◽

Global Convergence ◽

Policy Evaluation ◽

Gradient Methods ◽

Convergence Result ◽

Learning Rates ◽

Wide Range ◽

Policy Gradient ◽

Markov Decision ◽

Policy Optimization

Preconditioning and Regularization Enable Faster Reinforcement Learning Natural policy gradient (NPG) methods, in conjunction with entropy regularization to encourage exploration, are among the most popular policy optimization algorithms in contemporary reinforcement learning. Despite the empirical success, the theoretical underpinnings for NPG methods remain severely limited. In “Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization”, Cen, Cheng, Chen, Wei, and Chi develop nonasymptotic convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on tabular discounted Markov decision processes. Assuming access to exact policy evaluation, the authors demonstrate that the algorithm converges linearly at an astonishing rate that is independent of the dimension of the state-action space. Moreover, the algorithm is provably stable vis-à-vis inexactness of policy evaluation. Accommodating a wide range of learning rates, this convergence result highlights the role of preconditioning and regularization in enabling fast convergence.

Download Full-text

Classification of Brainwaves for Sleep Stages by High-Dimensional FFT Features from EEG Signals

Applied Sciences ◽

10.3390/app10051797 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1797 ◽

Cited By ~ 2

Author(s):

Mera Kartika Delimayanti ◽

Bedy Purnama ◽

Ngoc Giang Nguyen ◽

Mohammad Reza Faisal ◽

Kunti Robiatul Mahmudah ◽

...

Keyword(s):

Machine Learning ◽

Sleep Stage ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Sleep Stages ◽

Eeg Signals ◽

Stage Classification ◽

Sleep Stage Classification ◽

Low Dimensional

Manual classification of sleep stage is a time-consuming but necessary step in the diagnosis and treatment of sleep disorders, and its automation has been an area of active study. The previous works have shown that low dimensional fast Fourier transform (FFT) features and many machine learning algorithms have been applied. In this paper, we demonstrate utilization of features extracted from EEG signals via FFT to improve the performance of automated sleep stage classification through machine learning methods. Unlike previous works using FFT, we incorporated thousands of FFT features in order to classify the sleep stages into 2–6 classes. Using the expanded version of Sleep-EDF dataset with 61 recordings, our method outperformed other state-of-the art methods. This result indicates that high dimensional FFT features in combination with a simple feature selection is effective for the improvement of automated sleep stage classification.

Download Full-text

A Nonlinear Maximum Correntropy Information Filter for High-Dimensional Neural Decoding

Entropy ◽

10.3390/e23060743 ◽

2021 ◽

Vol 23 (6) ◽

pp. 743

Author(s):

Xi Liu ◽

Shuhang Chen ◽

Xiang Shen ◽

Xiang Zhang ◽

Yiwen Wang

Keyword(s):

State Estimation ◽

Measurement Model ◽

High Dimensional ◽

Neural Firing ◽

The Neural Network ◽

Information Filter ◽

Critical Technology ◽

Dimensional Measurements ◽

Non Gaussian ◽

Low Dimensional

Neural signal decoding is a critical technology in brain machine interface (BMI) to interpret movement intention from multi-neural activity collected from paralyzed patients. As a commonly-used decoding algorithm, the Kalman filter is often applied to derive the movement states from high-dimensional neural firing observation. However, its performance is limited and less effective for noisy nonlinear neural systems with high-dimensional measurements. In this paper, we propose a nonlinear maximum correntropy information filter, aiming at better state estimation in the filtering process for a noisy high-dimensional measurement system. We reconstruct the measurement model between the high-dimensional measurements and low-dimensional states using the neural network, and derive the state estimation using the correntropy criterion to cope with the non-Gaussian noise and eliminate large initial uncertainty. Moreover, analyses of convergence and robustness are given. The effectiveness of the proposed algorithm is evaluated by applying it on multiple segments of neural spiking data from two rats to interpret the movement states when the subjects perform a two-lever discrimination task. Our results demonstrate better and more robust state estimation performance when compared with other filters.

Download Full-text

Policy gradient methods for free-electron laser and terahertz source optimization and stabilization at the FERMI free-electron laser at Elettra

Physical Review Accelerators and Beams ◽

10.1103/physrevaccelbeams.23.122802 ◽

2020 ◽

Vol 23 (12) ◽

Author(s):

F. H. O’Shea ◽

N. Bruchon ◽

G. Gaio

Keyword(s):

Free Electron ◽

Free Electron Laser ◽

Gradient Methods ◽

Terahertz Source ◽

Policy Gradient

Download Full-text

PSS Business Case Map: Supporting Idea Generation in PSS Design

Volume 3: 38th Design Automation Conference, Parts A and B ◽

10.1115/detc2012-70692 ◽

2012 ◽

Cited By ~ 2

Author(s):

Fumiya Akasaka ◽

Kazuki Fujita ◽

Yoshiki Shimomura

Keyword(s):

Idea Generation ◽

Business Case ◽

Literature Survey ◽

The Self ◽

High Dimensional ◽

Self Organizing Map ◽

Two Dimensional ◽

Service Type ◽

Business Cases ◽

Low Dimensional

This paper proposes the PSS Business Case Map as a tool to support designers’ idea generation in PSS design. The map visualizes the similarities among PSS business cases in a two-dimensional diagram. To make the map, PSS business cases are first collected by conducting, for example, a literature survey. The collected business cases are then classified from multiple aspects that characterize each case such as its product type, service type, target customer, and so on. Based on the results of this classification, the similarities among the cases are calculated and visualized by using the Self-Organizing Map (SOM) technique. A SOM is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional) view from high-dimensional data. The visualization result is offered to designers in a form of a two-dimensional map, which is called the PSS Business Case Map. By using the map, designers can figure out the position of their current business and can acquire ideas for the servitization of their business.

Download Full-text

Perspectives of the high-dimensional dynamics of neural microcircuits from the point of view of low-dimensional readouts

Complexity ◽

10.1002/cplx.10089 ◽

2003 ◽

Vol 8 (4) ◽

pp. 39-50 ◽

Cited By ~ 11

Author(s):

Stefan Häusler ◽

Henry Markram ◽

Wolfgang Maass

Keyword(s):

Point Of View ◽

High Dimensional ◽

Low Dimensional

Download Full-text