scholarly journals Learning Optimal Classification Trees Using a Binary Linear Program Formulation

Author(s):  
Sicco Verwer ◽  
Yingqian Zhang

We provide a new formulation for the problem of learning the optimal classification tree of a given depth as a binary linear program. A limitation of previously proposed Mathematical Optimization formulations is that they create constraints and variables for every row in the training data. As a result, the running time of the existing Integer Linear programming (ILP) formulations increases dramatically with the size of data. In our new binary formulation, we aim to circumvent this problem by making the formulation size largely independent from the training data size. We show experimentally that our formulation achieves better performance than existing formulations on both small and large problem instances within shorter running time.

2018 ◽  
Vol 20 (4) ◽  
pp. 2085-2108 ◽  
Author(s):  
Hiba Yahyaoui ◽  
Islem Kaabachi ◽  
Saoussen Krichen ◽  
Abdulkader Dekdouk

Abstract We address in this paper a multi-compartment vehicle routing problem (MCVRP) that aims to plan the delivery of different products to a set of geographically dispatched customers. The MCVRP is encountered in many industries, our research has been motivated by petrol station replenishment problem. The main objective of the delivery process is to minimize the total driving distance by the used trucks. The problem configuration is described through a prefixed set of trucks with several compartments and a set of customers with demands and prefixed delivery. Given such inputs, the minimization of the total traveled distance is subject to assignment and routing constraints that express the capacity limitations of each truck’s compartment in terms of the pathways’ restrictions. For the NP-hardness of the problem, we propose in this paper two algorithms mainly for large problem instances: an adaptive variable neighborhood search (AVNS) and a Partially Matched Crossover PMX-based Genetic Algorithm to solve this problem with the goal of ensuring a better solution quality. We compare the ability of the proposed AVNS with the exact solution using CPLEX and a set of benchmark problem instances is used to analyze the performance of the both proposed meta-heuristics.


Symmetry ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 1460
Author(s):  
Hamza Jouhari ◽  
Deming Lei ◽  
Mohammed A. A. Al-qaness ◽  
Mohamed Abd Elaziz ◽  
Robertas Damaševičius ◽  
...  

Scheduling can be described as a decision-making process. It is applied in various applications, such as manufacturing, airports, and information processing systems. More so, the presence of symmetry is common in certain types of scheduling problems. There are three types of parallel machine scheduling problems (PMSP): uniform, identical, and unrelated parallel machine scheduling problems (UPMSPs). Recently, UPMSPs with setup time had attracted more attention due to its applications in different industries and services. In this study, we present an efficient method to address the UPMSPs while using a modified harris hawks optimizer (HHO). The new method, called MHHO, uses the salp swarm algorithm (SSA) as a local search for HHO in order to enhance its performance and to decrease its computation time. To test the performance of MHHO, several experiments are implemented using small and large problem instances. Moreover, the proposed method is compared to several state-of-art approaches used for UPMSPs. The MHHO shows better performance in both small and large problem cases.


2015 ◽  
Vol 14 (03) ◽  
pp. 521-533
Author(s):  
M. Sariyar ◽  
A. Borg

Deterministic record linkage (RL) is frequently regarded as a rival to more sophisticated strategies like probabilistic RL. We investigate the effect of combining deterministic linkage with other linkage techniques. For this task, we use a simple deterministic linkage strategy as a preceding filter: a data pair is classified as ‘match' if all values of attributes considered agree exactly, otherwise as ‘nonmatch'. This strategy is separately combined with two probabilistic RL methods based on the Fellegi–Sunter model and with two classification tree methods (CART and Bagging). An empirical comparison was conducted on two real data sets. We used four different partitions into training data and test data to increase the validity of the results. In almost all cases, application of deterministic linkage as a preceding filter leads to better results compared to the omission of such a pre-filter, and overall classification trees exhibited best results. On all data sets, probabilistic RL only profited from deterministic linkage when the underlying probabilities were estimated before applying deterministic linkage. When using a pre-filter for subtracting definite cases, the underlying population of data pairs changes. It is crucial to take this into account for model-based probabilistic RL.


2012 ◽  
Vol 2012 ◽  
pp. 1-23 ◽  
Author(s):  
Armin Jabbarzadeh ◽  
Seyed Gholamreza Jalali Naini ◽  
Hamid Davoudpour ◽  
Nader Azad

This paper studies a supply chain design problem with the risk of disruptions at facilities. At any point of time, the facilities are subject to various types of disruptions caused by natural disasters, man-made defections, and equipment breakdowns. We formulate the problem as a mixed-integer nonlinear program which maximizes the total profit for the whole system. The model simultaneously determines the number and location of facilities, the subset of customers to serve, the assignment of customers to facilities, and the cycle-order quantities at facilities. In order to obtain near-optimal solutions with reasonable computational requirements for large problem instances, two solution methods based on Lagrangian relaxation and genetic algorithm are developed. The effectiveness of the proposed solution approaches is shown using numerical experiments. The computational results, in addition, demonstrate that the benefits of considering disruptions in the supply chain design model can be significant.


Author(s):  
Shaun P Wilkinson ◽  
Simon K Davy ◽  
Michael Bunce ◽  
Michael Stat

High-throughput sequencing of environmental DNA (eDNA) offers a simple and cost-effective solution for marine biodiversity assessments. Yet several analytical challenges remain, including the incorporation of statistical inference in the assignment of taxonomic identities. We developed a probabilistic method for DNA barcode classification that can be used for both eDNA and traditional single-source sampling. The pipeline involves: (1) compiling a primer-specific database of barcode sequences to be used as training data (obtained from GenBank and other sequence repositories), (2) generating a classification tree using an iterative learning algorithm that divisively sorts the training data into hierarchical clusters based on profile hidden Markov models, (3) assignment of each query sequence to a cluster using a recursive series of model-comparison tests, and (4) taxonomic identification of the query sequences based on the lowest common taxonomic rank of the training sequences within the cluster. This method compares favorably to other DNA classification methods when tested on benchmark datasets, and offers the added features of classifying at higher taxonomic ranks and returning interpretable confidence values in the form of the Akaike weight statistic. This bioinformatics pipeline is available as an open source R package called ‘insect’ (informatic sequence classification trees).


10.29007/pz3t ◽  
2018 ◽  
Author(s):  
Nikolaj Bjorner ◽  
Dejan Jovanović ◽  
Tancrède Lepoint ◽  
Philipp Rümmer ◽  
Martin Schäf

Crowdsourcing promises to quasi-automate tasks that cannot be automated otherwise. Success stories like natural language translation or recognition of cats in images show that carefully crafted crowdsourcing tasks solve large problem instances which could not be solved otherwise. To utilize crowdsourcing, one has to define the problem in a way that is easy to split into small tasks, that the tasks are easy to solve for humans and hard to solve for a machine, and that the machine can efficiently check if the solution is correct.In this paper we discuss a novel approach of using crowdsourcing to assist software verification. We argue that Horn clauses form a good base for crowdsourcing since they are easy to subdivide, and that logic abduction is a suitable task since it is hard to find abductive inferences for Horn clauses automatically, but it is easy to check if an inference makes a Horn clause valid. We describe a prototype implementation, we show how crowdsourcing integrates in the verification process, and present preliminary results.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Kathleen Kelley ◽  
Marielle Todd ◽  
Helene Hopfer ◽  
Michela Centinari

Purpose This study aims to characterize several wine consumer segments who were “likely” to sample (i.e. taste before purchasing) wine from vineyards using cover crops, a sustainable production practice that reduces herbicide applications, and identify those with a greater probability of being a viable target market based on survey responses. Design/methodology/approach A total of 956 wine consumers from the Mid-Atlantic and boarding US states were separated into segments based on an ECHAID (exhaustive Chi-square automatic interaction detector) classification tree from internet survey responses. Findings Out of the 12 created segments, 6 (n = 530, 72% of training data) contained participants who were at least 1.02 times (index score =102%) more “likely” to try the wine compared to the overall sample and were willing to pay $18.99 for a 750-mL bottle of the wine, which included a $1 surcharge to cover associated production costs. Of these, three (n = 195, 26%) had the greatest potential for which a marketing plan could be developed (index scores of 109%–121%), with over half in each segment willing to pay $20.99 for the bottle of wine, which could motivate growers to consider implementing this sustainable strategy. Originality/value Although several segments of participants were “likely” to sample the sustainably produced wine, an ECHAID classification tree allowed us to identify participants who would not pay $18.99 for a 750-mL bottle of wine, even after learning about the use of cover crops and the trade-off ($1 bottle surcharge). By narrowing the number of potential “likely” segments to those with a greater potential of sampling the wine, more purposeful marketing strategies can be developed based on demographics, attitudes, and behaviors defined in the model.


Sign in / Sign up

Export Citation Format

Share Document