Exploratory Combinatorial Optimization with Reinforcement Learning

Many real-world problems can be reduced to combinatorial optimization on a graph, where the subset or ordering of vertices that maximize some objective function must be found. With such tasks often NP-hard and analytically intractable, reinforcement learning (RL) has shown promise as a framework with which efficient heuristic methods to tackle these problems can be learned. Previous works construct the solution subset incrementally, adding one element at a time, however, the irreversible nature of this approach prevents the agent from revising its earlier decisions, which may be necessary given the complexity of the optimization task. We instead propose that the agent should seek to continuously improve the solution by learning to explore at test time. Our approach of exploratory combinatorial optimization (ECO-DQN) is, in principle, applicable to any combinatorial problem that can be defined on a graph. Experimentally, we show our method to produce state-of-the-art RL performance on the Maximum Cut problem. Moreover, because ECO-DQN can start from any arbitrary configuration, it can be combined with other search methods to further improve performance, which we demonstrate using a simple random search.

Download Full-text

Combinatorial Optimization Meets Reinforcement Learning: Effective Taxi Order Dispatching at Large-Scale

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2021.3127077 ◽

2021 ◽

pp. 1-1

Author(s):

Yongxin Tong ◽

Dingyuan Shi ◽

Yi Xu ◽

Weifeng Lv ◽

Zhiwei Qin ◽

...

Keyword(s):

Reinforcement Learning ◽

Combinatorial Optimization ◽

Large Scale

Download Full-text

Selection and Reinforcement Learning for Combinatorial Optimization

Parallel Problem Solving from Nature PPSN VI - Lecture Notes in Computer Science ◽

10.1007/3-540-45356-3_59 ◽

2000 ◽

pp. 601-610 ◽

Cited By ~ 11

Author(s):

A. Berny

Keyword(s):

Reinforcement Learning ◽

Combinatorial Optimization

Download Full-text

MineRL: A Large-Scale Dataset of Minecraft Demonstrations

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/339 ◽

2019 ◽

Cited By ~ 1

Author(s):

William H. Guss ◽

Brandon Houghton ◽

Nicholay Topin ◽

Phillip Wang ◽

Cayden Codel ◽

...

Keyword(s):

Reinforcement Learning ◽

Language Processing ◽

Large Scale ◽

State Action ◽

Open World ◽

Large Scale Dataset ◽

Complete State ◽

Further Development ◽

World Environment ◽

Real World Problems

The sample inefficiency of standard deep reinforcement learning methods precludes their application to many real-world problems. Methods which leverage human demonstrations require fewer samples but have been researched less. As demonstrated in the computer vision and natural language processing communities, large-scale datasets have the capacity to facilitate research by serving as an experimental and benchmarking platform for new methods. However, existing datasets compatible with reinforcement learning simulators do not have sufficient scale, structure, and quality to enable the further development and evaluation of methods focused on using human examples. Therefore, we introduce a comprehensive, large-scale, simulator-paired dataset of human demonstrations: MineRL. The dataset consists of over 60 million automatically annotated state-action pairs across a variety of related tasks in Minecraft, a dynamic, 3D, open-world environment. We present a novel data collection scheme which allows for the ongoing introduction of new tasks and the gathering of complete state information suitable for a variety of methods. We demonstrate the hierarchality, diversity, and scale of the MineRL dataset. Further, we show the difficulty of the Minecraft domain along with the potential of MineRL in developing techniques to solve key research challenges within it.

Download Full-text

Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/326 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yang Gao ◽

Christian M. Meyer ◽

Mohsen Mesgar ◽

Iryna Gurevych

Keyword(s):

Reinforcement Learning ◽

Learning To Rank ◽

Poor Performance ◽

Parameter Tuning ◽

Test Time ◽

Sequential Decision ◽

Time Data ◽

Training Time ◽

Search Spaces ◽

Reward Function

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative, but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.

Download Full-text

Augmented Random Search for Quadcopter Control: An alternative to Reinforcement Learning

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2019.11.03 ◽

2019 ◽

Vol 11 (11) ◽

pp. 24-33

Author(s):

Ashutosh Kumar Tiwari ◽

◽

Sandeep Varma Nadimpalli

Keyword(s):

Reinforcement Learning ◽

Random Search

Download Full-text

A Deep Learning Algorithm for the Max-Cut Problem Based on Pointer Network Structure with Supervised Learning and Reinforcement Learning Strategies

Mathematics ◽

10.3390/math8020298 ◽

2020 ◽

Vol 8 (2) ◽

pp. 298 ◽

Cited By ~ 2

Author(s):

Shenshen Gu ◽

Yue Yang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Reinforcement Learning ◽

Combinatorial Optimization ◽

Supervised Learning ◽

Learning Strategies ◽

Large Scale ◽

Deep Neural Network ◽

Max Cut Problem ◽

Cut Problems

The Max-cut problem is a well-known combinatorial optimization problem, which has many real-world applications. However, the problem has been proven to be non-deterministic polynomial-hard (NP-hard), which means that exact solution algorithms are not suitable for large-scale situations, as it is too time-consuming to obtain a solution. Therefore, designing heuristic algorithms is a promising but challenging direction to effectively solve large-scale Max-cut problems. For this reason, we propose a unique method which combines a pointer network and two deep learning strategies (supervised learning and reinforcement learning) in this paper, in order to address this challenge. A pointer network is a sequence-to-sequence deep neural network, which can extract data features in a purely data-driven way to discover the hidden laws behind data. Combining the characteristics of the Max-cut problem, we designed the input and output mechanisms of the pointer network model, and we used supervised learning and reinforcement learning to train the model to evaluate the model performance. Through experiments, we illustrated that our model can be well applied to solve large-scale Max-cut problems. Our experimental results also revealed that the new method will further encourage broader exploration of deep neural network for large-scale combinatorial optimization problems.

Download Full-text

Deep Reinforcement Learning with Credit Assignment for Combinatorial Optimization

Pattern Recognition ◽

10.1016/j.patcog.2021.108466 ◽

2021 ◽

pp. 108466

Author(s):

Dong Yan ◽

Jiayi Weng ◽

Shiyu Huang ◽

Chongxuan Li ◽

Yichi Zhou ◽

...

Keyword(s):

Reinforcement Learning ◽

Combinatorial Optimization ◽

Credit Assignment

Download Full-text

Optimization of Welding Route by Automatic Machine Using Reinforcement Learning Method

Journal of Ship Production ◽

10.5957/jsp.2008.24.3.135 ◽

2008 ◽

Vol 24 (03) ◽

pp. 135-138

Author(s):

Yasuhisa Okumoto

Keyword(s):

Reinforcement Learning ◽

Behavior Pattern ◽

Combinatorial Problem ◽

Automatic Welding ◽

Learning Method ◽

Work Time ◽

Welding Machine ◽

Welding Sequence ◽

Combination Number ◽

Initial Setting

A ship hull block is generally composed of skin plates, longitudinals, and transverse webs as a grillage structure, and longitudinals and transverse webs are joined by a fillet weld on a skin plate panel. Then much labor time is necessary for this welding work because many welding lines exist. Now, a simple automatic welding machine using a truck system is applied widely as well as semiautomatic CO2 weld or gravity weld. Since the automatic welding machine needs the help of workers for initial setting, turning, and shifting, the efficient routing has to be investigated for the improvement of productivity. However, it is difficult to find an optimal weld sequence when weld lines increase and then the combination number of welding sequence increases. Such research is called combinatorial problem. This paper examines how to decrease the work time using the reinforcement learning method, which imitated the behavior pattern of animals.

Download Full-text

PERFORMANCE OF NEURAL ALGORITHMS FOR MAXIMUM-CUT PROBLEMS

Journal of Circuits System and Computers ◽

10.1142/s0218126692000246 ◽

1992 ◽

Vol 02 (04) ◽

pp. 389-395 ◽

Cited By ~ 1

Author(s):

KIICHI URAHAMA

Keyword(s):

Combinatorial Optimization ◽

Optimization Problems ◽

Set Partitioning ◽

Combinatorial Optimization Problems ◽

Maximum Cut ◽

Cut Problems

The author previously developed a new neural algorithm effective for set-partitioning combinatorial optimization problems by extending the logistic transformation used in the Hopfield algorithm into its multivariable version. In this letter the performance of the algorithm is theoretically evaluated and it is proved that this algorithm is 1/p-approximate for p-partitioning maximum-cut problems.

Download Full-text

CONSTRAINED POTTS MEAN FIELD SYSTEMS AND THEIR ELECTRONIC IMPLEMENTATION

International Journal of Neural Systems ◽

10.1142/s0129065794000244 ◽

1994 ◽

Vol 05 (03) ◽

pp. 229-239 ◽

Cited By ~ 2

Author(s):

KIICHI URAHAMA ◽

TADASHI YAMADA

Keyword(s):

Combinatorial Optimization ◽

Present Method ◽

Electronic Circuit ◽

Optimization Problems ◽

Mean Field ◽

Local Optimum ◽

Combinatorial Optimization Problems ◽

Maximum Cut ◽

Cut Problems ◽

Additional Constraints

The Potts mean field approach for solving combinatorial optimization problems subject to winner-takes-all constraints is extended for problems subject to additional constraints. Extra variables corresponding to the Lagrange multipliers are incorporated into the Potts formulation for the additional constraints to be satisfied. The extended Potts equations are solved by using constrained gradient descent differential systems. This gradient system is proven theoretically to always produce a legal local optimum solution of the constrained combinatorial optimization problems. An analog electronic circuit implementing the present method is designed on the basis of the previous Potts electronic circuit. The performance of the present method is theoretically evaluated for the constrained maximum cut problems. The lower bound of the cut size obtained with the present method is proven to be the same as that of the basic Potts scheme for the unconstrained maximum cut problems.

Download Full-text