Separated Trust Regions Policy Optimization Method

Author(s):  
Luobao Zou ◽  
Zhiwei Zhuang ◽  
Yin Cheng ◽  
Xuechun Wang ◽  
Weidong Zhang
Energies ◽  
2021 ◽  
Vol 14 (18) ◽  
pp. 5674
Author(s):  
Ágota Bányai

The optimal predictive, preventive, corrective and opportunistic maintenance policies play an important role in the success of sustainable maintenance operations. This study discusses a new energy efficiency-related maintenance policy optimization method, which is based on failure data and status information from both the physical system and the digital twin-based discrete event simulation. The study presents the functional model, the mathematical model and the solution algorithm. The maintenance optimization method proposed in this paper is made up of four main phases: computation of energy consumption based on the levelized cost of energy, computation of GHG emission, computation of value determination equations and application of the Howard’s policy iteration techniques. The approach was tested with a scenario analysis, where different electricity generation sources were taken into consideration. The computational results validated the optimization method and show that optimized maintenance policies can lead to an average of 38% cost reduction regarding energy consumption related costs. Practical implications of the proposed model and method regard the possibility of finding optimal maintenance policies that can affect the energy consumption and emissions from the operation and maintenance of manufacturing systems.


Author(s):  
Feiyang Pan ◽  
Qingpeng Cai ◽  
An-Xiang Zeng ◽  
Chun-Xiang Pan ◽  
Qing Da ◽  
...  

Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high sample complexity. On the other hand, model-based reinforcement learning methods that learn the transition dynamics are more sample efficient, but they often suffer from the bias of the transition estimation. How to make use of both model-based and model-free learning is a central problem in reinforcement learning.In this paper, we present a new technique to address the tradeoff between exploration and exploitation, which regards the difference between model-free and model-based estimations as a measure of exploration value. We apply this new technique to the PPO algorithm and arrive at a new policy optimization method, named Policy Optimization with Modelbased Explorations (POME). POME uses two components to predict the actions’ target values: a model-free one estimated by Monte-Carlo sampling and a model-based one which learns a transition model and predicts the value of the next state. POME adds the error of these two target estimations as the additional exploration value for each state-action pair, i.e, encourages the algorithm to explore the states with larger target errors which are hard to estimate. We compare POME with PPO on Atari 2600 games, and it shows that POME outperforms PPO on 33 games out of 49 games.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Hongyi Li ◽  
Xinrui Che

In recent years, the volume of global video traffic has been increasing rapidly and it is considerably significant to offload the traffic during the process of video transmission and improve the experience of users. In this paper, we propose a novel traffic offloading strategy to provide a feasible and efficient reference for the following 2022 FIFA World Cup held in Qatar. At first, we present the system framework based on the Mobile Edge Computing (MEC) paradigm, which supports transferring the FIFA World Cup traffic to the mobile edge servers. Then, the Deep Reinforcement Learning (DRL) is used to provide the traffic scheduling method and minimize the scheduling time of application programs. Meanwhile, the task scheduling operation is regarded as the process of Markov decision, and the proximal policy optimization method is used to train the Deep Neural Network in the DRL. For the proposed traffic offloading strategy, we do the simulation based on two real datasets, and the experimental results show that it has smaller scheduling time, higher bandwidth utilization, and better experience of user than two baselines.


2020 ◽  
Vol 34 (04) ◽  
pp. 6941-6948
Author(s):  
Qi Zhou ◽  
HouQiang Li ◽  
Jie Wang

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance as model-free methods. In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)—a novel model-based approach—that can effectively improve the asymptotic performance using the uncertainty in Q-values. We derive an upper bound of the uncertainty, based on which we can approximate the uncertainty accurately and efficiently for model-based methods. We further propose an uncertainty-aware policy optimization algorithm that optimizes the policy conservatively to encourage performance improvement with high probability. This can significantly alleviate the overfitting of policy to inaccurate models. Experiments show POMBU can outperform existing state-of-the-art policy optimization algorithms in terms of sample efficiency and asymptotic performance. Moreover, the experiments demonstrate the excellent robustness of POMBU compared to previous model-based approaches.


2020 ◽  
Vol 34 (04) ◽  
pp. 4940-4947 ◽  
Author(s):  
Yongshuai Liu ◽  
Jiaxin Ding ◽  
Xin Liu

In this paper, we study reinforcement learning (RL) algorithms to solve real-world decision problems with the objective of maximizing the long-term reward as well as satisfying cumulative constraints. We propose a novel first-order policy optimization method, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method. Our proposed method is easy to implement with performance guarantees and can handle general types of cumulative multi-constraint settings. We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction.


Processes ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. 1497
Author(s):  
Titus Quah ◽  
Derek Machalek ◽  
Kody M. Powell

One popular method for optimizing systems, referred to as ANN-PSO, uses an artificial neural network (ANN) to approximate the system and an optimization method like particle swarm optimization (PSO) to select inputs. However, with reinforcement learning developments, it is important to compare ANN-PSO to newer algorithms, like Proximal Policy Optimization (PPO). To investigate ANN-PSO’s and PPO’s performance and applicability, we compare their methodologies, apply them on steady-state economic optimization of a chemical process, and compare their results to a conventional first principles modeling with nonlinear programming (FP-NLP). Our results show that ANN-PSO and PPO achieve profits nearly as high as FP-NLP, but PPO achieves slightly higher profits compared to ANN-PSO. We also find PPO has the fastest computational times, 10 and 10,000 times faster than FP-NLP and ANN-PSO, respectively. However, PPO requires more training data than ANN-PSO to converge to an optimal policy. This case study suggests PPO has better performance as it achieves higher profits and faster online computational times. ANN-PSO shows better applicability with its capability to train on historical operational data and higher training efficiency.


CICTP 2019 ◽  
2019 ◽  
Author(s):  
Yuchen Wang ◽  
Tao Lu ◽  
Hongxing Zhao ◽  
Zhiying Bao
Keyword(s):  

Author(s):  
Fachrudin Hunaini ◽  
Imam Robandi ◽  
Nyoman Sutantra

Fuzzy Logic Control (FLC) is a reliable control system for controlling nonlinear systems, but to obtain optimal fuzzy logic control results, optimal Membership Function parameters are needed. Therefore in this paper Particle Swarm Optimization (PSO) is used as a fast and accurate optimization method to determine Membership Function parameters. The optimal control system simulation is carried out on the automatic steering system of the vehicle model and the results obtained are the vehicle's lateral motion error can be minimized so that the movement of the vehicle can always be maintained on the expected trajectory


Sign in / Sign up

Export Citation Format

Share Document