Multi-objective reinforcement learning for acquiring all Pareto optimal policies simultaneously - Method of determining scalarization weights

Author(s):  
Hitoshi Iima ◽  
Yasuaki Kuroe
Author(s):  
Tomohiro Yamaguchi ◽  
Shota Nagahama ◽  
Yoshihiro Ichikawa ◽  
Yoshimichi Honma ◽  
Keiki Takadama

This chapter describes solving multi-objective reinforcement learning (MORL) problems where there are multiple conflicting objectives with unknown weights. Previous model-free MORL methods take large number of calculations to collect a Pareto optimal set for each V/Q-value vector. In contrast, model-based MORL can reduce such a calculation cost than model-free MORLs. However, previous model-based MORL method is for only deterministic environments. To solve them, this chapter proposes a novel model-based MORL method by a reward occurrence probability (ROP) vector with unknown weights. The experimental results are reported under the stochastic learning environments with up to 10 states, 3 actions, and 3 reward rules. The experimental results show that the proposed method collects all Pareto optimal policies, and it took about 214 seconds (10 states, 3 actions, 3 rewards) for total learning time. In future research directions, the ways to speed up methods and how to use non-optimal policies are discussed.


2021 ◽  
Vol 70 ◽  
pp. 319-349
Author(s):  
Yongcan Cao ◽  
Huixin Zhan

Solving multi-objective optimization problems is important in various applications where users are interested in obtaining optimal policies subject to multiple (yet often conflicting) objectives. A typical approach to obtain the optimal policies is to first construct a loss function based on the scalarization of individual objectives and then derive optimal policies that minimize the scalarized loss function. Albeit simple and efficient, the typical approach provides no insights/mechanisms on the optimization of multiple objectives due to the lack of ability to quantify the inter-objective relationship. To address the issue, we propose to develop a new efficient gradient-based multi-objective reinforcement learning approach that seeks to iteratively uncover the quantitative inter-objective relationship via finding a minimum-norm point in the convex hull of the set of multiple policy gradients when the impact of one objective on others is unknown a priori. In particular, we first propose a new PAOLS algorithm that integrates pruning and approximate optimistic linear support algorithm to efficiently discover the weight-vector sets of multiple gradients that quantify the inter-objective relationship. Then we construct an actor and a multi-objective critic that can co-learn the policy and the multi-objective vector value function. Finally, the weight discovery process and the policy and vector value function learning process can be iteratively executed to yield stable weight-vector sets and policies. To validate the effectiveness of the proposed approach, we present a quantitative evaluation of the approach based on three case studies.


Author(s):  
Yiguang Gong ◽  
Yunping Liu ◽  
Chuanyang Yin

AbstractEdge computing extends traditional cloud services to the edge of the network, closer to users, and is suitable for network services with low latency requirements. With the rise of edge computing, its security issues have also received increasing attention. In this paper, a novel two-phase cycle algorithm is proposed for effective cyber intrusion detection in edge computing based on a multi-objective genetic algorithm (MOGA) and modified back-propagation neural network (MBPNN), namely TPC-MOGA-MBPNN. In the first phase, the MOGA is employed to build a multi-objective optimization model that tries to find the Pareto optimal parameter set for MBPNN. The Pareto optimal parameter set is applied for simultaneous minimization of the average false positive rate (Avg FPR), mean squared error (MSE) and negative average true positive rate (Avg TPR) in the dataset. In the second phase, some MBPNNs are created based on the parameter set obtained by MOGA and are trained to search for a more optimal parameter set locally. The parameter set obtained in the second phase is used as the input of the first phase, and the training process is repeated until the termination criteria are reached. A benchmark dataset, KDD cup 1999, is used to demonstrate and validate the performance of the proposed approach for intrusion detection. The proposed approach can discover a pool of MBPNN-based solutions. Combining these MBPNN solutions can significantly improve detection performance, and a GA is used to find the optimal MBPNN combination. The results show that the proposed approach achieves an accuracy of 98.81% and a detection rate of 98.23% and outperform most systems of previous works found in the literature. In addition, the proposed approach is a generalized classification approach that is applicable to the problem of any field having multiple conflicting objectives.


Sign in / Sign up

Export Citation Format

Share Document