scholarly journals Reinforcement Learning for Route Optimization with Robustness Guarantees

Author(s):  
Tobias Jacobs ◽  
Francesco Alesiani ◽  
Gulcin Ermis

Application of deep learning to NP-hard combinatorial optimization problems is an emerging research trend, and a number of interesting approaches have been published over the last few years. In this work we address robust optimization, which is a more complex variant where a max-min problem is to be solved. We obtain robust solutions by solving the inner minimization problem exactly and apply Reinforcement Learning to learn a heuristic for the outer problem. The minimization term in the inner objective represents an obstacle to existing RL-based approaches, as its value depends on the full solution in a non-linear manner and cannot be evaluated for partial solutions constructed by the agent over the course of each episode. We overcome this obstacle by defining the reward in terms of the one-step advantage over a baseline policy whose role can be played by any fast heuristic for the given problem. The agent is trained to maximize the total advantage, which, as we show, is equivalent to the original objective. We validate our approach by solving min-max versions of standard benchmarks for the Capacitated Vehicle Routing and the Traveling Salesperson Problem, where our agents obtain near-optimal solutions and improve upon the baselines.

2014 ◽  
Vol 2014 ◽  
pp. 1-17 ◽  
Author(s):  
E. Osaba ◽  
F. Diaz ◽  
R. Carballedo ◽  
E. Onieva ◽  
A. Perallos

Nowadays, the development of new metaheuristics for solving optimization problems is a topic of interest in the scientific community. In the literature, a large number of techniques of this kind can be found. Anyway, there are many recently proposed techniques, such as the artificial bee colony and imperialist competitive algorithm. This paper is focused on one recently published technique, the one called Golden Ball (GB). The GB is a multiple-population metaheuristic based on soccer concepts. Although it was designed to solve combinatorial optimization problems, until now, it has only been tested with two simple routing problems: the traveling salesman problem and the capacitated vehicle routing problem. In this paper, the GB is applied to four different combinatorial optimization problems. Two of them are routing problems, which are more complex than the previously used ones: the asymmetric traveling salesman problem and the vehicle routing problem with backhauls. Additionally, one constraint satisfaction problem (the n-queen problem) and one combinatorial design problem (the one-dimensional bin packing problem) have also been used. The outcomes obtained by GB are compared with the ones got by two different genetic algorithms and two distributed genetic algorithms. Additionally, two statistical tests are conducted to compare these results.


Author(s):  
Lin Lan ◽  
Zhenguo Li ◽  
Xiaohong Guan ◽  
Pinghui Wang

Despite significant progress, deep reinforcement learning (RL) suffers from data-inefficiency and limited generalization. Recent efforts apply meta-learning to learn a meta-learner from a set of RL tasks such that a novel but related task could be solved quickly. Though specific in some ways, different tasks in meta-RL are generally similar at a high level. However, most meta-RL methods do not explicitly and adequately model the specific and shared information among different tasks, which limits their ability to learn training tasks and to generalize to novel tasks. In this paper, we propose to capture the shared information on the one hand and meta-learn how to quickly abstract the specific information about a task on the other hand. Methodologically, we train an SGD meta-learner to quickly optimize a task encoder for each task, which generates a task embedding based on past experience. Meanwhile, we learn a policy which is shared across all tasks and conditioned on task embeddings. Empirical results on four simulated tasks demonstrate that our method has better learning capacity on both training and novel tasks and attains up to 3 to 4 times higher returns compared to baselines.


Author(s):  
André L. C. Ottoni ◽  
Erivelton G. Nepomuceno ◽  
Marcos S. de Oliveira ◽  
Daniela C. R. de Oliveira

AbstractThe traveling salesman problem (TSP) is one of the best-known combinatorial optimization problems. Many methods derived from TSP have been applied to study autonomous vehicle route planning with fuel constraints. Nevertheless, less attention has been paid to reinforcement learning (RL) as a potential method to solve refueling problems. This paper employs RL to solve the traveling salesman problem With refueling (TSPWR). The technique proposes a model (actions, states, reinforcements) and RL-TSPWR algorithm. Focus is given on the analysis of RL parameters and on the refueling influence in route learning optimization of fuel cost. Two RL algorithms: Q-learning and SARSA are compared. In addition, RL parameter estimation is performed by Response Surface Methodology, Analysis of Variance and Tukey Test. The proposed method achieves the best solution in 15 out of 16 case studies.


2015 ◽  
Vol 7 (3) ◽  
pp. 280-284
Author(s):  
Rasa Giniūnaitė

Semidefinite Programming (SDP) is a fairly recent way of solving optimization problems which are becoming more and more important in our fast moving world. It is a minimization of linear function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e. non-linear but convex constraints. All linear problems and many engineering and combinatorial optimization problems can be expressed as SDP, so it is highly applicable. There are many packages that use different algorithms to solve SDP problems. They can be downloaded from internet and easily learnt how to use, two of these are SeDuMi and SDPT-3. In this paper truss structure optimization problem with the goal of minimizing the mass of the truss structure was solved. After doing some algebraic manipulation the problem was formulated suitably for Semidefinite Programming. SeDuMi and SDPT-3 packages were used to solve it. The choice of the initial solution had a great impact on the result using SeDuMi. The mass obtained using SDPT-3 was on average smaller than the one obtained using SeDuMi. Moreover, SDPT-3 worked more efficiently. However, the comparison of my approach and two versions of particle swarm optimization algorithm implied that semidefinite programming is in general more appropriate for solving such problems. Pusiau apibrėžtas programavimas yra iškiliojo optimizavimo posritis, kuriame tikslo funkcija tiesinė, o leistinoji sritis – pusiau teigiamai apibrėžtų matricų kūgio ir afininės erdvės sankirta. Tai gana naujas optimizavimo problemų sprendimo būdas, tačiau jau plačiai taikomas sprendžiant inžinerinius bei kombinatorinius optimizavimo uždavinius. Tokiems uždaviniams spręsti yra daug skirtingų paketų, taikančių įvairius algoritmus. Šiame darbe buvo naudojami SeDuMi ir SDPT-3 paketai, kuriuos, kaip ir daugumą kitų, galima parsisiųsti iš interneto. Tikslas buvo rasti minimalią santvaros masę atsižvelgiant į numatytus apribojimus. Naudojant SDPT-3 gauta optimali masė buvo vidutiniškai mažesnė nei naudojant SeDuMi. SDPT-3 veikė efektyviau ir pradinių sąlygų pasirinkimas neturėjo tokios didelės įtakos sprendiniui kaip naudojant SeDuMi paketą. Palyginus rezultatus su sprendiniais, gautais taikant dalelių spiečiaus optimizavimo algoritmą, nustatyta, kad tokio tipo uždaviniams pusiau apibrėžtas programavimas yra tinkamesnis.


Author(s):  
Quentin Cappart ◽  
Emmanuel Goutierre ◽  
David Bergman ◽  
Louis-Martin Rousseau

Finding tight bounds on the optimal solution is a critical element of practical solution methods for discrete optimization problems. In the last decade, decision diagrams (DDs) have brought a new perspective on obtaining upper and lower bounds that can be significantly better than classical bounding mechanisms, such as linear relaxations. It is well known that the quality of the bounds achieved through this flexible bounding method is highly reliant on the ordering of variables chosen for building the diagram, and finding an ordering that optimizes standard metrics is an NP-hard problem. In this paper, we propose an innovative and generic approach based on deep reinforcement learning for obtaining an ordering for tightening the bounds obtained with relaxed and restricted DDs. We apply the approach to both the Maximum Independent Set Problem and the Maximum Cut Problem. Experimental results on synthetic instances show that the deep reinforcement learning approach, by achieving tighter objective function bounds, generally outperforms ordering methods commonly used in the literature when the distribution of instances is known. To the best knowledge of the authors, this is the first paper to apply machine learning to directly improve relaxation bounds obtained by general-purpose bounding mechanisms for combinatorial optimization problems.


2019 ◽  
Vol 139 (4) ◽  
pp. 401-408
Author(s):  
Shunya Tanabe ◽  
Zeyuan Sun ◽  
Masayuki Nakatani ◽  
Yutaka Uchimura

Sign in / Sign up

Export Citation Format

Share Document