scholarly journals Exploratory Combinatorial Optimization with Reinforcement Learning

2020 ◽  
Vol 34 (04) ◽  
pp. 3243-3250 ◽  
Author(s):  
Thomas Barrett ◽  
William Clements ◽  
Jakob Foerster ◽  
Alex Lvovsky

Many real-world problems can be reduced to combinatorial optimization on a graph, where the subset or ordering of vertices that maximize some objective function must be found. With such tasks often NP-hard and analytically intractable, reinforcement learning (RL) has shown promise as a framework with which efficient heuristic methods to tackle these problems can be learned. Previous works construct the solution subset incrementally, adding one element at a time, however, the irreversible nature of this approach prevents the agent from revising its earlier decisions, which may be necessary given the complexity of the optimization task. We instead propose that the agent should seek to continuously improve the solution by learning to explore at test time. Our approach of exploratory combinatorial optimization (ECO-DQN) is, in principle, applicable to any combinatorial problem that can be defined on a graph. Experimentally, we show our method to produce state-of-the-art RL performance on the Maximum Cut problem. Moreover, because ECO-DQN can start from any arbitrary configuration, it can be combined with other search methods to further improve performance, which we demonstrate using a simple random search.

Author(s):  
William H. Guss ◽  
Brandon Houghton ◽  
Nicholay Topin ◽  
Phillip Wang ◽  
Cayden Codel ◽  
...  

The sample inefficiency of standard deep reinforcement learning methods precludes their application to many real-world problems. Methods which leverage human demonstrations require fewer samples but have been researched less. As demonstrated in the computer vision and natural language processing communities, large-scale datasets have the capacity to facilitate research by serving as an experimental and benchmarking platform for new methods. However, existing datasets compatible with reinforcement learning simulators do not have sufficient scale, structure, and quality to enable the further development and evaluation of methods focused on using human examples. Therefore, we introduce a comprehensive, large-scale, simulator-paired dataset of human demonstrations: MineRL. The dataset consists of over 60 million automatically annotated state-action pairs across a variety of related tasks in Minecraft, a dynamic, 3D, open-world environment. We present a novel data collection scheme which allows for the ongoing introduction of new tasks and the gathering of complete state information suitable for a variety of methods. We demonstrate the hierarchality, diversity, and scale of the MineRL dataset. Further, we show the difficulty of the Minecraft domain along with the potential of MineRL in developing techniques to solve key research challenges within it.


Author(s):  
Yang Gao ◽  
Christian M. Meyer ◽  
Mohsen Mesgar ◽  
Iryna Gurevych

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative, but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.


Mathematics ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 298 ◽  
Author(s):  
Shenshen Gu ◽  
Yue Yang

The Max-cut problem is a well-known combinatorial optimization problem, which has many real-world applications. However, the problem has been proven to be non-deterministic polynomial-hard (NP-hard), which means that exact solution algorithms are not suitable for large-scale situations, as it is too time-consuming to obtain a solution. Therefore, designing heuristic algorithms is a promising but challenging direction to effectively solve large-scale Max-cut problems. For this reason, we propose a unique method which combines a pointer network and two deep learning strategies (supervised learning and reinforcement learning) in this paper, in order to address this challenge. A pointer network is a sequence-to-sequence deep neural network, which can extract data features in a purely data-driven way to discover the hidden laws behind data. Combining the characteristics of the Max-cut problem, we designed the input and output mechanisms of the pointer network model, and we used supervised learning and reinforcement learning to train the model to evaluate the model performance. Through experiments, we illustrated that our model can be well applied to solve large-scale Max-cut problems. Our experimental results also revealed that the new method will further encourage broader exploration of deep neural network for large-scale combinatorial optimization problems.


2021 ◽  
pp. 108466
Author(s):  
Dong Yan ◽  
Jiayi Weng ◽  
Shiyu Huang ◽  
Chongxuan Li ◽  
Yichi Zhou ◽  
...  

2008 ◽  
Vol 24 (03) ◽  
pp. 135-138
Author(s):  
Yasuhisa Okumoto

A ship hull block is generally composed of skin plates, longitudinals, and transverse webs as a grillage structure, and longitudinals and transverse webs are joined by a fillet weld on a skin plate panel. Then much labor time is necessary for this welding work because many welding lines exist. Now, a simple automatic welding machine using a truck system is applied widely as well as semiautomatic CO2 weld or gravity weld. Since the automatic welding machine needs the help of workers for initial setting, turning, and shifting, the efficient routing has to be investigated for the improvement of productivity. However, it is difficult to find an optimal weld sequence when weld lines increase and then the combination number of welding sequence increases. Such research is called combinatorial problem. This paper examines how to decrease the work time using the reinforcement learning method, which imitated the behavior pattern of animals.


1992 ◽  
Vol 02 (04) ◽  
pp. 389-395 ◽  
Author(s):  
KIICHI URAHAMA

The author previously developed a new neural algorithm effective for set-partitioning combinatorial optimization problems by extending the logistic transformation used in the Hopfield algorithm into its multivariable version. In this letter the performance of the algorithm is theoretically evaluated and it is proved that this algorithm is 1/p-approximate for p-partitioning maximum-cut problems.


1994 ◽  
Vol 05 (03) ◽  
pp. 229-239 ◽  
Author(s):  
KIICHI URAHAMA ◽  
TADASHI YAMADA

The Potts mean field approach for solving combinatorial optimization problems subject to winner-takes-all constraints is extended for problems subject to additional constraints. Extra variables corresponding to the Lagrange multipliers are incorporated into the Potts formulation for the additional constraints to be satisfied. The extended Potts equations are solved by using constrained gradient descent differential systems. This gradient system is proven theoretically to always produce a legal local optimum solution of the constrained combinatorial optimization problems. An analog electronic circuit implementing the present method is designed on the basis of the previous Potts electronic circuit. The performance of the present method is theoretically evaluated for the constrained maximum cut problems. The lower bound of the cut size obtained with the present method is proven to be the same as that of the basic Potts scheme for the unconstrained maximum cut problems.


Sign in / Sign up

Export Citation Format

Share Document