Inverse design of grating couplers using the policy gradient method from reinforcement learning

Abstract We present a proof-of-concept technique for the inverse design of electromagnetic devices motivated by the policy gradient method in reinforcement learning, named PHORCED (PHotonic Optimization using REINFORCE Criteria for Enhanced Design). This technique uses a probabilistic generative neural network interfaced with an electromagnetic solver to assist in the design of photonic devices, such as grating couplers. We show that PHORCED obtains better performing grating coupler designs than local gradient-based inverse design via the adjoint method, while potentially providing faster convergence over competing state-of-the-art generative methods. As a further example of the benefits of this method, we implement transfer learning with PHORCED, demonstrating that a neural network trained to optimize 8° grating couplers can then be re-trained on grating couplers with alternate scattering angles while requiring >10× fewer simulations than control cases.

Download Full-text

Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method

2021 American Control Conference (ACC) ◽

10.23919/acc50511.2021.9482765 ◽

2021 ◽

Author(s):

Sebastien Gros ◽

Mario Zanon

Keyword(s):

Reinforcement Learning ◽

Gradient Method ◽

Policy Gradient

Download Full-text

Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies

2021 American Control Conference (ACC) ◽

10.23919/acc50511.2021.9483016 ◽

2021 ◽

Author(s):

Sebastien Gros ◽

Mario Zanon

Keyword(s):

Reinforcement Learning ◽

Gradient Method ◽

Bias Correction ◽

Policy Gradient

Download Full-text

Policy Gradient-based Integral Reinforcement Learning for Optimal Control Design of Nonaffine Morphing Aircraft Systems

2020 28th Mediterranean Conference on Control and Automation (MED) ◽

10.1109/med48518.2020.9183024 ◽

2020 ◽

Author(s):

Hanna Lee ◽

Seong-Hun Kim ◽

Youdan Kim

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Control Design ◽

Morphing Aircraft ◽

Aircraft Systems ◽

Policy Gradient ◽

Gradient Based

Download Full-text

Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/475 ◽

2019 ◽

Cited By ~ 1

Author(s):

Wenjie Shi ◽

Shiji Song ◽

Cheng Wu

Keyword(s):

Reinforcement Learning ◽

Maximum Entropy ◽

Bellman Equation ◽

Value Functions ◽

Policy Actor ◽

Model Free ◽

Policy Gradient ◽

Gradient Based ◽

Continuous Actions ◽

Stable Learning

Maximum entropy deep reinforcement learning (RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large off-policy data or cannot scale to tasks with very high state and action dimensionality such as 3D humanoid locomotion. Besides, the optimality of desired Boltzmann policy set for non-optimal soft value function is not persuasive enough. In this paper, we first derive soft policy gradient based on entropy regularized expected reward objective for RL with continuous actions. Then, we present an off-policy actor-critic, model-free maximum entropy deep RL algorithm called deep soft policy gradient (DSPG) by combining soft policy gradient with soft Bellman equation. To ensure stable learning while eliminating the need of two separate critics for soft value functions, we leverage double sampling approach to making the soft Bellman equation tractable. The experimental results demonstrate that our method outperforms in performance over off-policy prior methods.

Download Full-text

Diagnostic Evaluation of Policy-Gradient-Based Ranking

Electronics ◽

10.3390/electronics11010037 ◽

2021 ◽

Vol 11 (1) ◽

pp. 37

Author(s):

Hai-Tao Yu ◽

Degen Huang ◽

Fuji Ren ◽

Lishuang Li

Keyword(s):

Reinforcement Learning ◽

Learning To Rank ◽

Careful Examination ◽

Ranking Methods ◽

Adversarial Learning ◽

Wide Range ◽

Depth Analysis ◽

Policy Gradient ◽

Gradient Based ◽

The Impact

Learning-to-rank has been intensively studied and has shown significantly increasing values in a wide range of domains, such as web search, recommender systems, dialogue systems, machine translation, and even computational biology, to name a few. In light of recent advances in neural networks, there has been a strong and continuing interest in exploring how to deploy popular techniques, such as reinforcement learning and adversarial learning, to solve ranking problems. However, armed with the aforesaid popular techniques, most studies tend to show how effective a new method is. A comprehensive comparison between techniques and an in-depth analysis of their deficiencies are somehow overlooked. This paper is motivated by the observation that recent ranking methods based on either reinforcement learning or adversarial learning boil down to policy-gradient-based optimization. Based on the widely used benchmark collections with complete information (where relevance labels are known for all items), such as MSLRWEB30K and Yahoo-Set1, we thoroughly investigate the extent to which policy-gradient-based ranking methods are effective. On one hand, we analytically identify the pitfalls of policy-gradient-based ranking. On the other hand, we experimentally compare a wide range of representative methods. The experimental results echo our analysis and show that policy-gradient-based ranking methods are, by a large margin, inferior to many conventional ranking methods. Regardless of whether we use reinforcement learning or adversarial learning, the failures are largely attributable to the gradient estimation based on sampled rankings, which significantly diverge from ideal rankings. In particular, the larger the number of documents per query and the more fine-grained the ground-truth labels, the greater the impact policy-gradient-based ranking suffers. Careful examination of this weakness is highly recommended for developing enhanced methods based on policy gradient.

Download Full-text

Application of Neural Network Controller and Policy Gradient Reinforcement Learning on Modular Multilevel Converter (MMC) - a Proof of Concept

10.1109/cieec50170.2021.9511045 ◽

2021 ◽

Author(s):

Haiyang Jiang ◽

Yu Chen ◽

Yong Kang

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Modular Multilevel Converter ◽

Proof Of Concept ◽

Multilevel Converter ◽

Neural Network Controller ◽

Network Controller ◽

Policy Gradient

Download Full-text

Distributed neural network-based policy gradient reinforcement learning for multi-robot formations

2008 International Conference on Information and Automation ◽

10.1109/icinfa.2008.4607978 ◽

2008 ◽

Cited By ~ 2

Author(s):

Wen Shang ◽

Dong Sun

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Policy Gradient ◽

Robot Formations ◽

Multi Robot

Download Full-text

Constrained attractor selection using deep reinforcement learning

Journal of Vibration and Control ◽

10.1177/1077546320930144 ◽

2020 ◽

pp. 107754632093014

Author(s):

Xue-She Wang ◽

James D Turner ◽

Brian P Mann

Keyword(s):

Reinforcement Learning ◽

Gradient Method ◽

Nonlinear Dynamical Systems ◽

Nonlinear Dynamical System ◽

Learning Approaches ◽

Multiple Attractors ◽

Nonlinear Dynamical ◽

Cross Entropy Method ◽

Policy Gradient ◽

Attractor Selection

This study describes an approach for attractor selection (or multistability control) in nonlinear dynamical systems with constrained actuation. Attractor selection is obtained using two different deep reinforcement learning methods: (1) the cross-entropy method and (2) the deep deterministic policy gradient method. The framework and algorithms for applying these control methods are presented. Experiments were performed on a Duffing oscillator, as it is a classic nonlinear dynamical system with multiple attractors. Both methods achieve attractor selection under various control constraints. Although these methods have nearly identical success rates, the deep deterministic policy gradient method has the advantages of a high learning rate, low performance variance, and a smooth control approach. This study demonstrates the ability of two reinforcement learning approaches to achieve constrained attractor selection.

Download Full-text

Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/821 ◽

2019 ◽

Author(s):

Muhammad Masood ◽

Finale Doshi-Velez

Keyword(s):

Reinforcement Learning ◽

Optimization Technique ◽

Gradient Methods ◽

Domain Expert ◽

Learning Methods ◽

Maximum Mean Discrepancy ◽

Optimal Policies ◽

Policy Gradient ◽

Gradient Based ◽

The Difference

Standard reinforcement learning methods aim to master one way of solving a task whereas there may exist multiple near-optimal policies. Being able to identify this collection of near-optimal policies can allow a domain expert to efficiently explore the space of reasonable solutions. Unfortunately, existing approaches that quantify uncertainty over policies are not ultimately relevant to finding policies with qualitatively distinct behaviors. In this work, we formalize the difference between policies as a difference between the distribution of trajectories induced by each policy, which encourages diversity with respect to both state visitation and action choices. We derive a gradient-based optimization technique that can be combined with existing policy gradient methods to now identify diverse collections of well-performing policies. We demonstrate our approach on benchmarks and a healthcare task.

Download Full-text

Reinforcing an Image Caption Generator Using Off-Line Human Feedback

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5655 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2693-2700

Author(s):

Paul Hongsuck Seo ◽

Piyush Sharma ◽

Tomer Levinboim ◽

Bohyung Han ◽

Radu Soricut

Keyword(s):

Reinforcement Learning ◽

Gradient Method ◽

Training Data ◽

Evaluation Procedure ◽

Image Captioning ◽

Human Evaluation ◽

Policy Gradient ◽

Evaluation Dataset ◽

Image Caption

Human ratings are currently the most accurate way to assess the quality of an image captioning model, yet most often the only used outcome of an expensive human rating evaluation is a few overall statistics over the evaluation dataset. In this paper, we show that the signal from instance-level human caption ratings can be leveraged to improve captioning models, even when the amount of caption ratings is several orders of magnitude less than the caption training data. We employ a policy gradient method to maximize the human ratings as rewards in an off-policy reinforcement learning setting, where policy gradients are estimated by samples from a distribution that focuses on the captions in a caption ratings dataset. Our empirical evidence indicates that the proposed method learns to generalize the human raters' judgments to a previously unseen set of images, as judged by a different set of human judges, and additionally on a different, multi-dimensional side-by-side human evaluation procedure.

Download Full-text