Discounted Sampling Policy Gradient for Robot Multi-objective Visual Control

Author(s):  
Meng Xu ◽  
Qingfu Zhang ◽  
Jianping Wang
2016 ◽  
Vol 57 ◽  
pp. 187-227 ◽  
Author(s):  
Simone Parisi ◽  
Matteo Pirotta ◽  
Marcello Restelli

Many real-world control applications, from economics to robotics, are characterized by the presence of multiple conflicting objectives. In these problems, the standard concept of optimality is replaced by Pareto-optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi-objective optimization, achieving an accurate representation of the Pareto frontier is still an important challenge. In this paper, we propose a reinforcement learning policy gradient approach to learn a continuous approximation of the Pareto frontier in multi-objective Markov Decision Problems (MOMDPs). Differently from previous policy gradient algorithms, where n optimization routines are executed to have n solutions, our approach performs a single gradient ascent run, generating at each step an improved continuous approximation of the Pareto frontier. The idea is to optimize the parameters of a function defining a manifold in the policy parameters space, so that the corresponding image in the objectives space gets as close as possible to the true Pareto frontier. Besides deriving how to compute and estimate such gradient, we will also discuss the non-trivial issue of defining a metric to assess the quality of the candidate Pareto frontiers. Finally, the properties of the proposed approach are empirically evaluated on two problems, a linear-quadratic Gaussian regulator and a water reservoir control task.


2019 ◽  
Vol 18 (03) ◽  
pp. 1045-1082 ◽  
Author(s):  
Javier García ◽  
Roberto Iglesias ◽  
Miguel A. Rodríguez ◽  
Carlos V. Regueiro

Usually, real-world problems involve the optimization of multiple, possibly conflicting, objectives. These problems may be addressed by Multi-objective Reinforcement learning (MORL) techniques. MORL is a generalization of standard Reinforcement Learning (RL) where the single reward signal is extended to multiple signals, in particular, one for each objective. MORL is the process of learning policies that optimize multiple objectives simultaneously. In these problems, the use of directional/gradient information can be useful to guide the exploration to better and better behaviors. However, traditional policy-gradient approaches have two main drawbacks: they require the use of a batch of episodes to properly estimate the gradient information (reducing in this way the learning speed), and they use stochastic policies which could have a disastrous impact on the safety of the learning system. In this paper, we present a novel population-based MORL algorithm for problems in which the underlying objectives are reasonably smooth. It presents two main characteristics: fast computation of the gradient information for each objective through the use of neighboring solutions, and the use of this information to carry out a geometric partition of the search space and thus direct the exploration to promising areas. Finally, the algorithm is evaluated and compared to policy gradient MORL algorithms on different multi-objective problems: the water reservoir and the biped walking problem (the latter both on simulation and on a real robot).


2020 ◽  
Vol 39 (5) ◽  
pp. 6339-6350
Author(s):  
Esra Çakır ◽  
Ziya Ulukan

Due to the increase in energy demand, many countries suffer from energy poverty because of insufficient and expensive energy supply. Plans to use alternative power like nuclear power for electricity generation are being revived among developing countries. Decisions for installation of power plants need to be based on careful assessment of future energy supply and demand, economic and financial implications and requirements for technology transfer. Since the problem involves many vague parameters, a fuzzy model should be an appropriate approach for dealing with this problem. This study develops a Fuzzy Multi-Objective Linear Programming (FMOLP) model for solving the nuclear power plant installation problem in fuzzy environment. FMOLP approach is recommended for cases where the objective functions are imprecise and can only be stated within a certain threshold level. The proposed model attempts to minimize total duration time, total cost and maximize the total crash time of the installation project. By using FMOLP, the weighted additive technique can also be applied in order to transform the model into Fuzzy Multiple Weighted-Objective Linear Programming (FMWOLP) to control the objective values such that all decision makers target on each criterion can be met. The optimum solution with the achievement level for both of the models (FMOLP and FMWOLP) are compared with each other. FMWOLP results in better performance as the overall degree of satisfaction depends on the weight given to the objective functions. A numerical example demonstrates the feasibility of applying the proposed models to nuclear power plant installation problem.


2020 ◽  
Vol 39 (3) ◽  
pp. 3259-3273
Author(s):  
Nasser Shahsavari-Pour ◽  
Najmeh Bahram-Pour ◽  
Mojde Kazemi

The location-routing problem is a research area that simultaneously solves location-allocation and vehicle routing issues. It is critical to delivering emergency goods to customers with high reliability. In this paper, reliability in location and routing problems was considered as the probability of failure in depots, vehicles, and routs. The problem has two objectives, minimizing the cost and maximizing the reliability, the latter expressed by minimizing the expected cost of failure. First, a mathematical model of the problem was presented and due to its NP-hard nature, it was solved by a meta-heuristic approach using a NSGA-II algorithm and a discrete multi-objective firefly algorithm. The efficiency of these algorithms was studied through a complete set of examples and it was found that the multi-objective discrete firefly algorithm has a better Diversification Metric (DM) index; the Mean Ideal Distance (MID) and Spacing Metric (SM) indexes are only suitable for small to medium problems, losing their effectiveness for big problems.


2012 ◽  
Vol 3 (4) ◽  
pp. 1-6 ◽  
Author(s):  
M.Jayalakshmi M.Jayalakshmi ◽  
◽  
P.Pandian P.Pandian

Sign in / Sign up

Export Citation Format

Share Document