Solving the Large-Scale TSP Problem in 1 h: Santa Claus Challenge 2020

The scalability of traveling salesperson problem (TSP) algorithms for handling large-scale problem instances has been an open problem for a long time. We arranged a so-called Santa Claus challenge and invited people to submit their algorithms to solve a TSP problem instance that is larger than 1 M nodes given only 1 h of computing time. In this article, we analyze the results and show which design choices are decisive in providing the best solution to the problem with the given constraints. There were three valid submissions, all based on local search, including k-opt up to k = 5. The most important design choice turned out to be the localization of the operator using a neighborhood graph. The divide-and-merge strategy suffers a 2% loss of quality. However, via parallelization, the result can be obtained within less than 2 min, which can make a key difference in real-life applications.

Download Full-text

Simple Constructive, Insertion, and Improvement Heuristics Based on the Girding Polygon for the Euclidean Traveling Salesman Problem

Algorithms ◽

10.3390/a13010005 ◽

2019 ◽

Vol 13 (1) ◽

pp. 5 ◽

Cited By ~ 3

Author(s):

Víctor Pacheco-Valencia ◽

José Alberto Hernández ◽

José María Sigarreta ◽

Nodari Vakhania

Keyword(s):

Traveling Salesman Problem ◽

Real Life ◽

Traveling Salesman ◽

Computational Time ◽

Problem Instance ◽

Dimensional Euclidean Space ◽

Benchmark Problem ◽

Small Constant ◽

Euclidean Traveling Salesman Problem ◽

Problem Instances

The Traveling Salesman Problem (TSP) aims at finding the shortest trip for a salesman, who has to visit each of the locations from a given set exactly once, starting and ending at the same location. Here, we consider the Euclidean version of the problem, in which the locations are points in the two-dimensional Euclidean space and the distances are correspondingly Euclidean distances. We propose simple, fast, and easily implementable heuristics that work well, in practice, for large real-life problem instances. The algorithm works on three phases, the constructive, the insertion, and the improvement phases. The first two phases run in time O ( n 2 ) and the number of repetitions in the improvement phase, in practice, is bounded by a small constant. We have tested the practical behavior of our heuristics on the available benchmark problem instances. The approximation provided by our algorithm for the tested benchmark problem instances did not beat best known results. At the same time, comparing the CPU time used by our algorithm with that of the earlier known ones, in about 92% of the cases our algorithm has required less computational time. Our algorithm is also memory efficient: for the largest tested problem instance with 744,710 cities, it has used about 50 MiB, whereas the average memory usage for the remained 217 instances was 1.6 MiB.

Download Full-text

An Instance Data Repository for the Round-robin Sports Timetabling Problem

Management and Labour Studies ◽

10.1177/0258042x20912108 ◽

2020 ◽

Vol 45 (2) ◽

pp. 184-200

Author(s):

David Van Bulck ◽

Dries Goossens ◽

Jo¨rn Scho¨nberger ◽

Mario Guajardo

Keyword(s):

Optimization Problem ◽

Real Life ◽

Short Description ◽

Problem Instance ◽

Data Repository ◽

Timetabling Problem ◽

Classification Framework ◽

Different Types ◽

Problem Instances ◽

Complex Matter

The sports timetabling problem is a combinatorial optimization problem that consists of creating a timetable that defines against whom, when and where teams play games. This is a complex matter, since real-life sports timetabling applications are typically highly constrained. The vast amount and variety of constraints and the lack of generally accepted benchmark problem instances make that timetable algorithms proposed in the literature are often tested on just one or two specific seasons of the competition under consideration. This is problematic since only a few algorithmic insights are gained. To mitigate this issue, this article provides a problem instance repository containing over 40 different types of instances covering artificial and real-life problem instances. The construction of such a repository is not trivial, since there are dozens of constraints that need to be expressed in a standardized format. For this, our repository relies on RobinX, an XML-supported classification framework. The resulting repository provides a (non-exhaustive) overview of most real-life sports timetabling applications published over the last five decades. For every problem, a short description highlights the most distinguishing characteristics of the problem. The repository is publicly available and will be continuously updated as new instances or better solutions become available.

Download Full-text

Speeding up reactive transport simulations: statistical surrogates and caching of simulation results in lookup tables

10.5194/egusphere-egu2020-17719 ◽

2020 ◽

Author(s):

Marco De Lucia ◽

Robert Engelmann ◽

Michael Kühn ◽

Alexander Lindemann ◽

Max Lübke ◽

...

Keyword(s):

Reactive Transport ◽

Large Scale ◽

Computing Time ◽

Real Life ◽

Data Driven ◽

Support Vector ◽

Coupled Transport ◽

Charge Balance ◽

Lookup Tables ◽

Simulation Results

A successful strategy for speeding up coupled reactive transport simulations at price of acceptable accuracy loss is to compute geochemistry, which represents the bottleneck of these simulations, through data-driven surrogates instead of &#8216;full physics&#8216; equation-based models [1]. A surrogate is a multivariate regressor trained on a set of pre-calculated geochemical simulations or potentially even at runtime during the coupled simulations. Many available algorithms and implementations are available from the thriving Machine Learning community: tree-based regressors such as Random Forests or xgboost, Artificial Neural Networks, Gaussian Processes and Support Vector Machines just to name a few. Given the &#8216;black-box&#8216; nature of the surrogates, however, they generally disregard physical constraints such as mass and charge balance, which are of course of paramount importance for coupled transport simulations. A runtime check of error of balances in the surrogate outcomes is therefore necessary: predictions offending a given tolerance must be rejected and the full physics chemical simulations run instead. Thus the practical speedup of this strategy is a tradeoff between careful training of the surrogate and run-time efficiency. In this contribution we demonstrate that the use of surrogates can lead to a dramatic decrease of required computing time, with speedup factors in the order of 10 or even 100 in the most favorable cases. Thus, large scale simulations with some 106 grid elements are feasible on common workstations without requiring computation on HPC clusters [2]. Furthermore, we showcase our implementation of Distributed Hash Tables caching geochemical simulation results for further reuse in subsequent time steps. The computational advantage here stems from the fact that query and retrieval from lookup tables is much faster than both full physics geochemical simulations and surrogate predictions. Another advantage of this algorithm is that virtually no loss of accuracy is introduced in the simulations. Enabling the caching of geochemical simulations through DHT speeds up large scale reactive transport simulations up to a factor of four even when computing on several hundred cores. These algorithmical developments are demonstrated in comparison with published reactive transport benchmarks and on a real-life scenario of CO2 storage.&#160;&#160;[1] Jatnieks, J., De Lucia, M., Dransch, D., Sips, M. (2016): Data-driven surrogate model approach for improving the performance of reactive transport simulations. Energy Procedia 97, pp. 447-453. DOI: 10.1016/j.egypro.2016.10.047[2] De Lucia, M., Kempka, T., Jatnieks, J., K&#252;hn, M. (2017): Integrating surrogate models into subsurface simulation framework allows computation of complex reactive transport scenarios. Energy Procedia 125, pp. 580-587. DOI: 10.1016/j.egypro.2017.08.200

Download Full-text

Design and Comparative Analysis of New Personalized Recommender Algorithms with Specific Features for Large Scale Datasets

Mathematics ◽

10.3390/math8071106 ◽

2020 ◽

Vol 8 (7) ◽

pp. 1106

Author(s):

S. Bhaskaran ◽

Raja Marappan ◽

B. Santhi

Keyword(s):

Large Scale ◽

Real Life ◽

Optimization Methods ◽

Tuning Parameter ◽

Computational Time ◽

Data Set ◽

Significant Difference ◽

Minimum Number ◽

Tremendous Amount ◽

The Given

Nowadays, because of the tremendous amount of information that humans and machines produce every day, it has become increasingly hard to choose the more relevant content across a broad range of choices. This research focuses on the design of two different intelligent optimization methods using Artificial Intelligence and Machine Learning for real-life applications that are used to improve the process of generation of recommenders. In the first method, the modified cluster based intelligent collaborative filtering is applied with the sequential clustering that operates on the values of dataset, user′s neighborhood set, and the size of the recommendation list. This strategy splits the given data set into different subsets or clusters and the recommendation list is extracted from each group for constructing the better recommendation list. In the second method, the specific features-based customized recommender that works in the training and recommendation steps by applying the split and conquer strategy on the problem datasets, which are clustered into a minimum number of clusters and the better recommendation list, is created among all the clusters. This strategy automatically tunes the tuning parameter λ that serves the role of supervised learning in generating the better recommendation list for the large datasets. The quality of the proposed recommenders for some of the large scale datasets is improved compared to some of the well-known existing methods. The proposed methods work well when λ = 0.5 with the size of the recommendation list, |L| = 30 and the size of the neighborhood, |S| < 30. For a large value of |S|, the significant difference of the root mean square error becomes smaller in the proposed methods. For large scale datasets, simulation of the proposed methods when varying the user sizes and when the user size exceeds 500, the experimental results show that better values of the metrics are obtained and the proposed method 2 performs better than proposed method 1. The significant differences are obtained in these methods because the structure of computation of the methods depends on the number of user attributes, λ, the number of bipartite graph edges, and |L|. The better values of the (Precision, Recall) metrics obtained with size as 3000 for the large scale Book-Crossing dataset in the proposed methods are (0.0004, 0.0042) and (0.0004, 0.0046) respectively. The average computational time of the proposed methods takes <10 seconds for the large scale datasets and yields better performance compared to the well-known existing methods.

Download Full-text

Lifting Symmetry Breaking Constraints with Inductive Logic Programming

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/284 ◽

2021 ◽

Author(s):

Alice Tarzariol ◽

Martin Gebser ◽

Konstantin Schekotihin

Keyword(s):

Symmetry Breaking ◽

Logic Programming ◽

Inductive Logic Programming ◽

Large Scale ◽

Inductive Logic ◽

Combinatorial Problem ◽

Combinatorial Problems ◽

Problem Instance ◽

Specific Method ◽

Problem Instances

Efficient omission of symmetric solution candidates is essential for combinatorial problem solving. Most of the existing approaches are instance-specific and focus on the automatic computation of Symmetry Breaking Constraints (SBCs) for each given problem instance. However, the application of such approaches to large-scale instances or advanced problem encodings might be problematic. Moreover, the computed SBCs are propositional and, therefore, can neither be meaningfully interpreted nor transferred to other instances. To overcome these limitations, we introduce a new model-oriented approach for Answer Set Programming that lifts the SBCs of small problem instances into a set of interpretable first-order constraints using the Inductive Logic Programming paradigm. Experiments demonstrate the ability of our framework to learn general constraints from instance-specific SBCs for a collection of combinatorial problems. The obtained results indicate that our approach significantly outperforms a state-of-the-art instance-specific method as well as the direct application of a solver.

Download Full-text

A genetic algorithm for vehicle routing in logistic networks with practical constraints

Przegląd Statystyczny ◽

10.5604/01.3001.0015.5584 ◽

2021 ◽

Vol 68 (3) ◽

pp. 16-40

Author(s):

Grzegorz Koloch ◽

Michał Lewandowski ◽

Marcin Zientara ◽

Grzegorz Grodecki ◽

Piotr Matuszak ◽

...

Keyword(s):

Genetic Algorithm ◽

Real Life ◽

Capacity Constraints ◽

Problem Instance ◽

Postal Service ◽

Genetic Operators ◽

Delivery Problem ◽

Network Time ◽

Problem Instances ◽

Practical Constraints

We optimise a postal delivery problem with time and capacity constraints imposed on vehicles and nodes of the logistic network. Time constraints relate to the duration of routes, whereas capacity constraints concern technical characteristics of vehicles and postal operation outlets. We consider a method which can be applied to a brownfield scenario, in which capacities of outlets can be relaxed and prospective hubs identified. As a solution, we apply a genetic algorithm and test its properties both in small case studies and in a simulated problem instance of a larger (i.e. comparable with real-world instances) size. We show that the genetic operators we employ are capable of switching between solutions based on direct origin-to-destination routes and solutions based on transfer connections, depending on what is more beneficial in a given problem instance. Moreover, the algorithm correctly identifies cases in which volumes should be shipped directly, and those in which it is optimal to use transfer connections within a single problem instance, if an instance in question requires such a selection for optimality. The algorithm is thus suitable for determining hubs and satellite locations. All considerations presented in this paper are motivated by real-life problem instances experienced by the Polish Post, the largest postal service provider in Poland, in its daily plans of delivering postal packages, letters and pallets.

Download Full-text

Development of Magnetic-Based Navigation by Constructing Maps Using Machine Learning for Autonomous Mobile Robots in Real Environments

Sensors ◽

10.3390/s21123972 ◽

2021 ◽

Vol 21 (12) ◽

pp. 3972

Author(s):

Takumi Takebayashi ◽

Renato Miyagusuku ◽

Koichi Ozaki

Keyword(s):

Mobile Robots ◽

Gaussian Processes ◽

Large Scale ◽

A Priori ◽

Gaussian Process Regression ◽

Kernel Functions ◽

Autonomous Mobile Robots ◽

Design Choice ◽

Important Design ◽

Selection Of

Localization is fundamental to enable the use of autonomous mobile robots. In this work, we use magnetic-based localization. As Earth’s geomagnetic field is stable in time and is not affected by nonmagnetic materials, such as a large number of people in the robot’s surroundings, magnetic-based localization is ideal for service robotics in supermarkets, hotels, etc. A common approach for magnetic-based localization is to first create a magnetic map of the environment where the robot will be deployed. For this, magnetic samples acquired a priori are used. To generate this map, the collected data is interpolated by training a Gaussian Process Regression model. Gaussian processes are nonparametric, data-drive models, where the most important design choice is the selection of an adequate kernel function. These models are flexible and generate mean predictions as well as the confidence of those predictions, making them ideal for their use in probabilistic approaches. However, their computational and memory cost scales poorly when large datasets are used for training, making their use in large-scale environments challenging. The purpose of this study is to: (i) enable magnetic-based localization on large-scale environments by using a sparse representation of Gaussian processes, (ii) test the effect of several kernel functions on robot localization, and (iii) evaluate the accuracy of the approach experimentally on different large-scale environments.

Download Full-text

TO THE QUESTION OF RESPONSIBILITY FOR TORTURE IN THE RUSSIAN FEDERATION

Sociopolitical sciences ◽

10.33693/2223-0092-2020-10-2-103-106 ◽

2020 ◽

Vol 10 (2) ◽

pp. 103-106

Author(s):

ASTEMIR ZHURTOV ◽

Keyword(s):

Human Rights ◽

Russian Federation ◽

Large Scale ◽

Human Life ◽

International Standards ◽

World Community ◽

The World ◽

Long Time ◽

Protection Of Human Rights ◽

The Russian Federation

Cruel and inhumane acts that harm human life and health, as well as humiliate the dignity, are prohibited in most countries of the world, and Russia is no exception in this issue. The article presents an analysis of the institution of responsibility for torture in the Russian Federation. The author comes to the conclusion that the current criminal law of Russia superficially and fragmentally regulates liability for torture, in connection with which the author formulated the proposals to define such act as an independent crime. In the frame of modern globalization, the world community pays special attention to the protection of human rights, in connection with which large-scale international standards have been created a long time ago. The Universal Declaration of Human Rights and other international acts enshrine prohibitions of cruel and inhumane acts that harm human life and health, as well as degrade the dignity.Considering the historical experience of the past, these standards focus on the prohibition of any kind of torture, regardless of the purpose of their implementation.

Download Full-text

Neural methods for effective, efficient, and exposure-aware information retrieval

ACM SIGIR Forum ◽

10.1145/3476415.3476434 ◽

2021 ◽

Vol 55 (1) ◽

pp. 1-2

Author(s):

Bhaskar Mitra

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Large Scale ◽

Web Search ◽

Real Life ◽

Inverted Index ◽

Information Need ◽

Product Model ◽

Performance Improvements ◽

Deep Model

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.

Download Full-text

Using Machine Learning for Quantum Annealing Accuracy Prediction

Algorithms ◽

10.3390/a14060187 ◽

2021 ◽

Vol 14 (6) ◽

pp. 187

Author(s):

Aaron Barbosa ◽

Elijah Pelofske ◽

Georg Hahn ◽

Hristo N. Djidjev

Keyword(s):

Machine Learning ◽

Maximum Clique ◽

Classification Model ◽

Maximum Clique Problem ◽

Problem Instance ◽

Np Hard ◽

Machine Learning Classification ◽

Hard Problems ◽

Problem Instances ◽

D Wave

Quantum annealers, such as the device built by D-Wave Systems, Inc., offer a way to compute solutions of NP-hard problems that can be expressed in Ising or quadratic unconstrained binary optimization (QUBO) form. Although such solutions are typically of very high quality, problem instances are usually not solved to optimality due to imperfections of the current generations quantum annealers. In this contribution, we aim to understand some of the factors contributing to the hardness of a problem instance, and to use machine learning models to predict the accuracy of the D-Wave 2000Q annealer for solving specific problems. We focus on the maximum clique problem, a classic NP-hard problem with important applications in network analysis, bioinformatics, and computational chemistry. By training a machine learning classification model on basic problem characteristics such as the number of edges in the graph, or annealing parameters, such as the D-Wave’s chain strength, we are able to rank certain features in the order of their contribution to the solution hardness, and present a simple decision tree which allows to predict whether a problem will be solvable to optimality with the D-Wave 2000Q. We extend these results by training a machine learning regression model that predicts the clique size found by D-Wave.

Download Full-text