Research on Proximal Policy Optimization Algorithm Based on N-step Update

Author(s):  
Zhao Guoqing ◽  
Xu Junming ◽  
Liu Aidong ◽  
Yu Jing
2021 ◽  
Vol 1754 (1) ◽  
pp. 012229
Author(s):  
Jinxiu Hou ◽  
Zhihong Yu ◽  
Qingping Zheng ◽  
Huating Xu ◽  
Shufang Li

2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Weiguang Wang ◽  
Hui Li ◽  
Wenjie Zhang ◽  
Shanlin Wei

D2D communication improves the cellular network performance by using proximity-based services between adjacent devices, which considered is an effective way to solve the problem of spectrum scarcity caused by tremendous mobile data traffic. If the cache-enabled users are willing to send the cached file to the requesters, the content delivery traffic can be offloaded through the D2D link. In this paper, we strive to find the maximum energy efficiency of the D2D caching network through the joint optimization of cache policy and content transmit power. Specifically, based on stochastic geometry-aided modeling of the network, we derive the data offloading rate in closed form, which jointly considers the effects of success sensing probability and success transmission probability. According to the data offloading rate, we formulate a joint optimization problem integrating cache policy and transmit power to maximize the system energy efficiency. To solve this problem, we propose two optimization algorithms that the cache policy optimization algorithm based on gradient update and the joint optimization algorithm. The simulation results demonstrate that the joint optimization has twice the superiority in improving the energy efficiency of the D2D caching network compared with other schemes.


Algorithms ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 240
Author(s):  
Zhandos Kegenbekov ◽  
Ilya Jackson

Adaptive and highly synchronized supply chains can avoid a cascading rise-and-fall inventory dynamic and mitigate ripple effects caused by operational failures. This paper aims to demonstrate how a deep reinforcement learning agent based on the proximal policy optimization algorithm can synchronize inbound and outbound flows and support business continuity operating in the stochastic and nonstationary environment if end-to-end visibility is provided. The deep reinforcement learning agent is built upon the Proximal Policy Optimization algorithm, which does not require hardcoded action space and exhaustive hyperparameter tuning. These features, complimented with a straightforward supply chain environment, give rise to a general and task unspecific approach to adaptive control in multi-echelon supply chains. The proposed approach is compared with the base-stock policy, a well-known method in classic operations research and inventory control theory. The base-stock policy is prevalent in continuous-review inventory systems. The paper concludes with the statement that the proposed solution can perform adaptive control in complex supply chains. The paper also postulates fully fledged supply chain digital twins as a necessary infrastructural condition for scalable real-world applications.


2020 ◽  
Vol 13 (3) ◽  
pp. 93
Author(s):  
Shijun Wang ◽  
Baocheng Zhu ◽  
Chen Li ◽  
Mingzhe Wu ◽  
James Zhang ◽  
...  

In this paper, we propose a general Riemannian proximal optimization algorithm with guaranteed convergence to solve Markov decision process (MDP) problems. To model policy functions in MDP, we employ Gaussian mixture model (GMM) and formulate it as a non-convex optimization problem in the Riemannian space of positive semidefinite matrices. For two given policy functions, we also provide its lower bound on policy improvement by using bounds derived from the Wasserstein distance of GMMs. Preliminary experiments show the efficacy of our proposed Riemannian proximal policy optimization algorithm.


2021 ◽  
Author(s):  
Chao Zhang ◽  
Peisi Zhong ◽  
Zhongyuan Liang ◽  
Mei Liu ◽  
Xiao Wang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document