Policy Gradient Method based Energy Efficience Task Scheduling in Mobile Edge Blockchain

Abstract We present a proof-of-concept technique for the inverse design of electromagnetic devices motivated by the policy gradient method in reinforcement learning, named PHORCED (PHotonic Optimization using REINFORCE Criteria for Enhanced Design). This technique uses a probabilistic generative neural network interfaced with an electromagnetic solver to assist in the design of photonic devices, such as grating couplers. We show that PHORCED obtains better performing grating coupler designs than local gradient-based inverse design via the adjoint method, while potentially providing faster convergence over competing state-of-the-art generative methods. As a further example of the benefits of this method, we implement transfer learning with PHORCED, demonstrating that a neural network trained to optimize 8° grating couplers can then be re-trained on grating couplers with alternate scattering angles while requiring >10× fewer simulations than control cases.

Download Full-text

An experience-based policy gradient method for smooth manipulation

2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER) ◽

10.1109/cyber46603.2019.9066580 ◽

2019 ◽

Author(s):

Yongchao Wang ◽

Xuguang Lan ◽

Chuzhen Feng ◽

Lipeng Wan ◽

Jin Li ◽

...

Keyword(s):

Gradient Method ◽

Policy Gradient

Download Full-text

Learning CPG-based biped locomotion with a policy gradient method

Robotics and Autonomous Systems ◽

10.1016/j.robot.2006.05.012 ◽

2006 ◽

Vol 54 (11) ◽

pp. 911-920 ◽

Cited By ~ 69

Author(s):

Takamitsu Matsubara ◽

Jun Morimoto ◽

Jun Nakanishi ◽

Masa-aki Sato ◽

Kenji Doya

Keyword(s):

Gradient Method ◽

Biped Locomotion ◽

Policy Gradient

Download Full-text

The importance of variance reduction in policy gradient method

2012 American Control Conference (ACC) ◽

10.1109/acc.2012.6315368 ◽

2012 ◽

Author(s):

Tak Kit Lau ◽

Yun-hui Liu

Keyword(s):

Gradient Method ◽

Variance Reduction ◽

Policy Gradient

Download Full-text

Towards High Level Skill Learning: Learn to Return Table Tennis Ball Using Monte-Carlo Based Policy Gradient Method

2018 IEEE International Conference on Real-time Computing and Robotics (RCAR) ◽

10.1109/rcar.2018.8621776 ◽

2018 ◽

Author(s):

Yifeng Zhu ◽

Yongsheng Zhao ◽

Lisen Jin ◽

Jun Wu ◽

Rong Xiong

Keyword(s):

Monte Carlo ◽

Gradient Method ◽

Skill Learning ◽

Tennis Ball ◽

Table Tennis ◽

Policy Gradient ◽

High Level

Download Full-text

Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge

2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV) ◽

10.1109/icarcv.2018.8581309 ◽

2018 ◽

Cited By ~ 2

Author(s):

Doo Re Song ◽

Chuanyu Yang ◽

Christopher McGreavy ◽

Zhibin Li

Keyword(s):

Gradient Method ◽

Rough Terrain ◽

Bipedal Locomotion ◽

Policy Gradient

Download Full-text

Fair classification via Monte Carlo policy gradient method

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2021.104398 ◽

2021 ◽

Vol 104 ◽

pp. 104398

Author(s):

Andrija Petrović ◽

Mladen Nikolić ◽

Miloš Jovanović ◽

Miloš Bijanić ◽

Boris Delibašić

Keyword(s):

Monte Carlo ◽

Gradient Method ◽

Policy Gradient

Download Full-text

Constrained attractor selection using deep reinforcement learning

Journal of Vibration and Control ◽

10.1177/1077546320930144 ◽

2020 ◽

pp. 107754632093014

Author(s):

Xue-She Wang ◽

James D Turner ◽

Brian P Mann

Keyword(s):

Reinforcement Learning ◽

Gradient Method ◽

Nonlinear Dynamical Systems ◽

Nonlinear Dynamical System ◽

Learning Approaches ◽

Multiple Attractors ◽

Nonlinear Dynamical ◽

Cross Entropy Method ◽

Policy Gradient ◽

Attractor Selection

This study describes an approach for attractor selection (or multistability control) in nonlinear dynamical systems with constrained actuation. Attractor selection is obtained using two different deep reinforcement learning methods: (1) the cross-entropy method and (2) the deep deterministic policy gradient method. The framework and algorithms for applying these control methods are presented. Experiments were performed on a Duffing oscillator, as it is a classic nonlinear dynamical system with multiple attractors. Both methods achieve attractor selection under various control constraints. Although these methods have nearly identical success rates, the deep deterministic policy gradient method has the advantages of a high learning rate, low performance variance, and a smooth control approach. This study demonstrates the ability of two reinforcement learning approaches to achieve constrained attractor selection.

Download Full-text