Separated Trust Regions Policy Optimization Method

The optimal predictive, preventive, corrective and opportunistic maintenance policies play an important role in the success of sustainable maintenance operations. This study discusses a new energy efficiency-related maintenance policy optimization method, which is based on failure data and status information from both the physical system and the digital twin-based discrete event simulation. The study presents the functional model, the mathematical model and the solution algorithm. The maintenance optimization method proposed in this paper is made up of four main phases: computation of energy consumption based on the levelized cost of energy, computation of GHG emission, computation of value determination equations and application of the Howard’s policy iteration techniques. The approach was tested with a scenario analysis, where different electricity generation sources were taken into consideration. The computational results validated the optimization method and show that optimized maintenance policies can lead to an average of 38% cost reduction regarding energy consumption related costs. Practical implications of the proposed model and method regard the possibility of finding optimal maintenance policies that can affect the energy consumption and emissions from the operation and maintenance of manufacturing systems.

Download Full-text

Development of Stock Market Price Application to Predict Purchase and Sales Decisions Using Proximal Policy Optimization Method

10.1109/iccsai53272.2021.9609714 ◽

2021 ◽

Author(s):

Alexander A.S Gunawan ◽

S Bilqis Ashifa ◽

Reinert Y. Rumagit ◽

Heri Ngarianto

Keyword(s):

Stock Market ◽

Market Price ◽

Optimization Method ◽

Policy Optimization

Download Full-text

Policy Optimization with Model-Based Explorations

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014675 ◽

2019 ◽

Vol 33 ◽

pp. 4675-4682 ◽

Cited By ~ 2

Author(s):

Feiyang Pan ◽

Qingpeng Cai ◽

An-Xiang Zeng ◽

Chun-Xiang Pan ◽

Qing Da ◽

...

Keyword(s):

Reinforcement Learning ◽

Optimization Method ◽

Monte Carlo Sampling ◽

New Technique ◽

Learning Methods ◽

Model Based ◽

Model Free ◽

Hand Model ◽

Target Values ◽

Policy Optimization

Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high sample complexity. On the other hand, model-based reinforcement learning methods that learn the transition dynamics are more sample efficient, but they often suffer from the bias of the transition estimation. How to make use of both model-based and model-free learning is a central problem in reinforcement learning.In this paper, we present a new technique to address the tradeoff between exploration and exploitation, which regards the difference between model-free and model-based estimations as a measure of exploration value. We apply this new technique to the PPO algorithm and arrive at a new policy optimization method, named Policy Optimization with Modelbased Explorations (POME). POME uses two components to predict the actions’ target values: a model-free one estimated by Monte-Carlo sampling and a model-based one which learns a transition model and predicts the value of the next state. POME adds the error of these two target estimations as the additional exploration value for each state-action pair, i.e, encourages the algorithm to explore the states with larger target errors which are hard to estimate. We compare POME with PPO on Atari 2600 games, and it shows that POME outperforms PPO on 33 games out of 49 games.

Download Full-text

DRL-Based Edge Computing Model to Offload the FIFA World Cup Traffic

Mobile Information Systems ◽

10.1155/2020/8825643 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Hongyi Li ◽

Xinrui Che

Keyword(s):

Video Transmission ◽

Optimization Method ◽

Edge Computing ◽

Bandwidth Utilization ◽

Traffic Scheduling ◽

World Cup ◽

Fifa World Cup ◽

Time Of Application ◽

Traffic Offloading ◽

Policy Optimization

In recent years, the volume of global video traffic has been increasing rapidly and it is considerably significant to offload the traffic during the process of video transmission and improve the experience of users. In this paper, we propose a novel traffic offloading strategy to provide a feasible and efficient reference for the following 2022 FIFA World Cup held in Qatar. At first, we present the system framework based on the Mobile Edge Computing (MEC) paradigm, which supports transferring the FIFA World Cup traffic to the mobile edge servers. Then, the Deep Reinforcement Learning (DRL) is used to provide the traffic scheduling method and minimize the scheduling time of application programs. Meanwhile, the task scheduling operation is regarded as the process of Markov decision, and the proximal policy optimization method is used to train the Deep Neural Network in the DRL. For the proposed traffic offloading strategy, we do the simulation based on two real datasets, and the experimental results show that it has smaller scheduling time, higher bandwidth utilization, and better experience of user than two baselines.

Download Full-text

An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2020.3044196 ◽

2021 ◽

pp. 1-13

Author(s):

Wenjia Meng ◽

Qian Zheng ◽

Yue Shi ◽

Gang Pan

Keyword(s):

Reinforcement Learning ◽

Trust Region ◽

Optimization Method ◽

Policy Optimization

Download Full-text

Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6177 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6941-6948

Author(s):

Qi Zhou ◽

HouQiang Li ◽

Jie Wang

Keyword(s):

Reinforcement Learning ◽

Performance Improvement ◽

Optimization Method ◽

Asymptotic Performance ◽

Model Based ◽

Model Free ◽

Deep Model ◽

Conservative Policy ◽

Policy Optimization ◽

Novel Model

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance as model-free methods. In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)—a novel model-based approach—that can effectively improve the asymptotic performance using the uncertainty in Q-values. We derive an upper bound of the uncertainty, based on which we can approximate the uncertainty accurately and efficiently for model-based methods. We further propose an uncertainty-aware policy optimization algorithm that optimizes the policy conservatively to encourage performance improvement with high probability. This can significantly alleviate the overfitting of policy to inaccurate models. Experiments show POMBU can outperform existing state-of-the-art policy optimization algorithms in terms of sample efficiency and asymptotic performance. Moreover, the experiments demonstrate the excellent robustness of POMBU compared to previous model-based approaches.

Download Full-text

IPO: Interior-Point Policy Optimization under Constraints

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5932 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4940-4947 ◽

Cited By ~ 1

Author(s):

Yongshuai Liu ◽

Jiaxin Ding ◽

Xin Liu

Keyword(s):

Interior Point ◽

Interior Point Method ◽

State Of The Art ◽

Optimization Method ◽

First Order ◽

Performance Guarantees ◽

Policy Optimization ◽

Optimization Under Constraints ◽

Logarithmic Barrier

In this paper, we study reinforcement learning (RL) algorithms to solve real-world decision problems with the objective of maximizing the long-term reward as well as satisfying cumulative constraints. We propose a novel first-order policy optimization method, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method. Our proposed method is easy to implement with performance guarantees and can handle general types of cumulative multi-constraint settings. We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction.

Download Full-text

Comparing Reinforcement Learning Methods for Real-Time Optimization of a Chemical Process

Processes ◽

10.3390/pr8111497 ◽

2020 ◽

Vol 8 (11) ◽

pp. 1497

Author(s):

Titus Quah ◽

Derek Machalek ◽

Kody M. Powell

Keyword(s):

Reinforcement Learning ◽

Chemical Process ◽

Optimization Method ◽

Training Data ◽

Economic Optimization ◽

Artificial Neural Network Ann ◽

Policy Optimization ◽

Real Time Optimization ◽

Operational Data

One popular method for optimizing systems, referred to as ANN-PSO, uses an artificial neural network (ANN) to approximate the system and an optimization method like particle swarm optimization (PSO) to select inputs. However, with reinforcement learning developments, it is important to compare ANN-PSO to newer algorithms, like Proximal Policy Optimization (PPO). To investigate ANN-PSO’s and PPO’s performance and applicability, we compare their methodologies, apply them on steady-state economic optimization of a chemical process, and compare their results to a conventional first principles modeling with nonlinear programming (FP-NLP). Our results show that ANN-PSO and PPO achieve profits nearly as high as FP-NLP, but PPO achieves slightly higher profits compared to ANN-PSO. We also find PPO has the fastest computational times, 10 and 10,000 times faster than FP-NLP and ANN-PSO, respectively. However, PPO requires more training data than ANN-PSO to converge to an optimal policy. This case study suggests PPO has better performance as it achieves higher profits and faster online computational times. ANN-PSO shows better applicability with its capability to train on historical operational data and higher training efficiency.

Download Full-text

Optimization Method of Superelevation Transition for Oval Racing Tracks

CICTP 2019 ◽

10.1061/9780784482292.183 ◽

2019 ◽

Author(s):

Yuchen Wang ◽

Tao Lu ◽

Hongxing Zhao ◽

Zhiying Bao

Keyword(s):

Optimization Method

Download Full-text

Sistem Kontrol Optimal Fuzzy-Particle Swarm Optimization

Jurnal Intake : Jurnal Penelitian Ilmu Teknik dan Terapan ◽

10.32492/jintake.v9i1.720 ◽

2018 ◽

Vol 9 (1) ◽

pp. 1-5

Author(s):

Fachrudin Hunaini ◽

Imam Robandi ◽

Nyoman Sutantra

Keyword(s):

Fuzzy Logic ◽

Particle Swarm Optimization ◽

Control System ◽

Membership Function ◽

Fuzzy Logic Control ◽

Particle Swarm ◽

Optimization Method ◽

Swarm Optimization ◽

Motion Error ◽

Logic Control

Fuzzy Logic Control (FLC) is a reliable control system for controlling nonlinear systems, but to obtain optimal fuzzy logic control results, optimal Membership Function parameters are needed. Therefore in this paper Particle Swarm Optimization (PSO) is used as a fast and accurate optimization method to determine Membership Function parameters. The optimal control system simulation is carried out on the automatic steering system of the vehicle model and the results obtained are the vehicle's lateral motion error can be minimized so that the movement of the vehicle can always be maintained on the expected trajectory

Download Full-text