Multi-objective reinforcement learning for acquiring all Pareto optimal policies simultaneously - Method of determining scalarization weights

This chapter describes solving multi-objective reinforcement learning (MORL) problems where there are multiple conflicting objectives with unknown weights. Previous model-free MORL methods take large number of calculations to collect a Pareto optimal set for each V/Q-value vector. In contrast, model-based MORL can reduce such a calculation cost than model-free MORLs. However, previous model-based MORL method is for only deterministic environments. To solve them, this chapter proposes a novel model-based MORL method by a reward occurrence probability (ROP) vector with unknown weights. The experimental results are reported under the stochastic learning environments with up to 10 states, 3 actions, and 3 reward rules. The experimental results show that the proposed method collects all Pareto optimal policies, and it took about 214 seconds (10 states, 3 actions, 3 rewards) for total learning time. In future research directions, the ways to speed up methods and how to use non-optimal policies are discussed.

Download Full-text

Multi-objective reinforcement learning method for acquiring all pareto optimal policies simultaneously

2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/icsmc.2012.6378018 ◽

2012 ◽

Cited By ~ 1

Author(s):

Yusuke Mukai ◽

Yasuaki Kuroe ◽

Hitoshi Iima

Keyword(s):

Reinforcement Learning ◽

Pareto Optimal ◽

Learning Method ◽

Multi Objective ◽

Optimal Policies

Download Full-text

Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12270 ◽

2021 ◽

Vol 70 ◽

pp. 319-349

Author(s):

Yongcan Cao ◽

Huixin Zhan

Keyword(s):

Reinforcement Learning ◽

Loss Function ◽

Value Function ◽

Optimization Problems ◽

A Priori ◽

Weight Vector ◽

Multi Objective ◽

Conflicting Objectives ◽

Optimal Policies ◽

The Impact

Solving multi-objective optimization problems is important in various applications where users are interested in obtaining optimal policies subject to multiple (yet often conflicting) objectives. A typical approach to obtain the optimal policies is to first construct a loss function based on the scalarization of individual objectives and then derive optimal policies that minimize the scalarized loss function. Albeit simple and efficient, the typical approach provides no insights/mechanisms on the optimization of multiple objectives due to the lack of ability to quantify the inter-objective relationship. To address the issue, we propose to develop a new efficient gradient-based multi-objective reinforcement learning approach that seeks to iteratively uncover the quantitative inter-objective relationship via finding a minimum-norm point in the convex hull of the set of multiple policy gradients when the impact of one objective on others is unknown a priori. In particular, we first propose a new PAOLS algorithm that integrates pruning and approximate optimistic linear support algorithm to efficiently discover the weight-vector sets of multiple gradients that quantify the inter-objective relationship. Then we construct an actor and a multi-objective critic that can co-learn the policy and the multi-objective vector value function. Finally, the weight discovery process and the policy and vector value function learning process can be iteratively executed to yield stable weight-vector sets and policies. To validate the effectiveness of the proposed approach, we present a quantitative evaluation of the approach based on three case studies.

Download Full-text

Pareto Optimal Solutions for Network Defense Strategy Selection Simulator in Multi-Objective Reinforcement Learning

Applied Sciences ◽

10.3390/app8010136 ◽

2018 ◽

Vol 8 (1) ◽

pp. 136 ◽

Cited By ~ 3

Author(s):

Yang Sun ◽

Yun Li ◽

Wei Xiong ◽

Zhonghua Yao ◽

Krishna Moniz ◽

...

Keyword(s):

Reinforcement Learning ◽

Pareto Optimal ◽

Strategy Selection ◽

Pareto Optimal Solutions ◽

Optimal Solutions ◽

Defense Strategy ◽

Multi Objective ◽

Network Defense

Download Full-text

Multi-objective Optimization Method for Distribution System Configuration using Pareto Optimal Solution

IEEJ Transactions on Power and Energy ◽

10.1541/ieejpes.128.1208 ◽

2008 ◽

Vol 128 (10) ◽

pp. 1208-1216 ◽

Cited By ~ 3

Author(s):

Yasuhiro Hayashi ◽

Hirotaka Takano ◽

Junya Matsuki ◽

Yuji Nishikawa

Keyword(s):

Distribution System ◽

Optimal Solution ◽

Optimization Method ◽

Pareto Optimal Solution ◽

Pareto Optimal ◽

Multi Objective Optimization ◽

System Configuration ◽

Multi Objective

Download Full-text

A novel two-phase cycle algorithm for effective cyber intrusion detection in edge computing

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-021-02016-z ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Yiguang Gong ◽

Yunping Liu ◽

Chuanyang Yin

Keyword(s):

Intrusion Detection ◽

Optimal Parameter ◽

Edge Computing ◽

Cloud Services ◽

Second Phase ◽

Pareto Optimal ◽

Two Phase ◽

Multi Objective ◽

Phase Cycle ◽

Positive Rate

AbstractEdge computing extends traditional cloud services to the edge of the network, closer to users, and is suitable for network services with low latency requirements. With the rise of edge computing, its security issues have also received increasing attention. In this paper, a novel two-phase cycle algorithm is proposed for effective cyber intrusion detection in edge computing based on a multi-objective genetic algorithm (MOGA) and modified back-propagation neural network (MBPNN), namely TPC-MOGA-MBPNN. In the first phase, the MOGA is employed to build a multi-objective optimization model that tries to find the Pareto optimal parameter set for MBPNN. The Pareto optimal parameter set is applied for simultaneous minimization of the average false positive rate (Avg FPR), mean squared error (MSE) and negative average true positive rate (Avg TPR) in the dataset. In the second phase, some MBPNNs are created based on the parameter set obtained by MOGA and are trained to search for a more optimal parameter set locally. The parameter set obtained in the second phase is used as the input of the first phase, and the training process is repeated until the termination criteria are reached. A benchmark dataset, KDD cup 1999, is used to demonstrate and validate the performance of the proposed approach for intrusion detection. The proposed approach can discover a pool of MBPNN-based solutions. Combining these MBPNN solutions can significantly improve detection performance, and a GA is used to find the optimal MBPNN combination. The results show that the proposed approach achieves an accuracy of 98.81% and a detection rate of 98.23% and outperform most systems of previous works found in the literature. In addition, the proposed approach is a generalized classification approach that is applicable to the problem of any field having multiple conflicting objectives.

Download Full-text