Learning Optimal Classification Trees Using a Binary Linear Program Formulation

We provide a new formulation for the problem of learning the optimal classification tree of a given depth as a binary linear program. A limitation of previously proposed Mathematical Optimization formulations is that they create constraints and variables for every row in the training data. As a result, the running time of the existing Integer Linear programming (ILP) formulations increases dramatically with the size of data. In our new binary formulation, we aim to circumvent this problem by making the formulation size largely independent from the training data size. We show experimentally that our formulation achieves better performance than existing formulations on both small and large problem instances within shorter running time.

Download Full-text

Evaluating Psychiatric Hospital Admission Decisions for Children in Foster Care: An Optimal Classification Tree Analysis

Journal of Clinical Child & Adolescent Psychology ◽

10.1080/15374410709336564 ◽

2007 ◽

Vol 36 (1) ◽

pp. 8-18 ◽

Cited By ~ 5

Author(s):

Jessica A. Snowden ◽

Scott C. Leon ◽

Fred B. Bryant ◽

John S. Lyons

Keyword(s):

Foster Care ◽

Hospital Admission ◽

Psychiatric Hospital ◽

Classification Tree ◽

Classification Tree Analysis ◽

Tree Analysis ◽

Children In Foster Care ◽

Admission Decisions ◽

Optimal Classification

Download Full-text

Two metaheuristic approaches for solving the multi-compartment vehicle routing problem

Operational Research ◽

10.1007/s12351-018-0403-4 ◽

2018 ◽

Vol 20 (4) ◽

pp. 2085-2108 ◽

Cited By ~ 4

Author(s):

Hiba Yahyaoui ◽

Islem Kaabachi ◽

Saoussen Krichen ◽

Abdulkader Dekdouk

Keyword(s):

Vehicle Routing ◽

Vehicle Routing Problem ◽

Variable Neighborhood Search ◽

Neighborhood Search ◽

Large Problem ◽

Solution Quality ◽

Routing Problem ◽

Petrol Station ◽

Capacity Limitations ◽

Problem Instances

Abstract We address in this paper a multi-compartment vehicle routing problem (MCVRP) that aims to plan the delivery of different products to a set of geographically dispatched customers. The MCVRP is encountered in many industries, our research has been motivated by petrol station replenishment problem. The main objective of the delivery process is to minimize the total driving distance by the used trucks. The problem configuration is described through a prefixed set of trucks with several compartments and a set of customers with demands and prefixed delivery. Given such inputs, the minimization of the total traveled distance is subject to assignment and routing constraints that express the capacity limitations of each truck’s compartment in terms of the pathways’ restrictions. For the NP-hardness of the problem, we propose in this paper two algorithms mainly for large problem instances: an adaptive variable neighborhood search (AVNS) and a Partially Matched Crossover PMX-based Genetic Algorithm to solve this problem with the goal of ensuring a better solution quality. We compare the ability of the proposed AVNS with the exact solution using CPLEX and a set of benchmark problem instances is used to analyze the performance of the both proposed meta-heuristics.

Download Full-text

Modified Harris Hawks Optimizer for Solving Machine Scheduling Problems

Symmetry ◽

10.3390/sym12091460 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1460

Author(s):

Hamza Jouhari ◽

Deming Lei ◽

Mohammed A. A. Al-qaness ◽

Mohamed Abd Elaziz ◽

Robertas Damaševičius ◽

...

Keyword(s):

Computation Time ◽

Machine Scheduling ◽

Setup Time ◽

Parallel Machine Scheduling ◽

Parallel Machine ◽

Scheduling Problems ◽

Large Problem ◽

Unrelated Parallel Machine Scheduling ◽

Problem Instances ◽

Machine Scheduling Problems

Scheduling can be described as a decision-making process. It is applied in various applications, such as manufacturing, airports, and information processing systems. More so, the presence of symmetry is common in certain types of scheduling problems. There are three types of parallel machine scheduling problems (PMSP): uniform, identical, and unrelated parallel machine scheduling problems (UPMSPs). Recently, UPMSPs with setup time had attracted more attention due to its applications in different industries and services. In this study, we present an efficient method to address the UPMSPs while using a modified harris hawks optimizer (HHO). The new method, called MHHO, uses the salp swarm algorithm (SSA) as a local search for HHO in order to enhance its performance and to decrease its computation time. To test the performance of MHHO, several experiments are implemented using small and large problem instances. Moreover, the proposed method is compared to several state-of-art approaches used for UPMSPs. The MHHO shows better performance in both small and large problem cases.

Download Full-text

Deterministic Linkage as a Preceding Filter for Other Record Linkage Methods

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622015500108 ◽

2015 ◽

Vol 14 (03) ◽

pp. 521-533

Author(s):

M. Sariyar ◽

A. Borg

Keyword(s):

Record Linkage ◽

Classification Tree ◽

Real Data ◽

Training Data ◽

Data Sets ◽

Empirical Comparison ◽

Linkage Methods ◽

Data Pair ◽

Tree Methods ◽

Almost All

Deterministic record linkage (RL) is frequently regarded as a rival to more sophisticated strategies like probabilistic RL. We investigate the effect of combining deterministic linkage with other linkage techniques. For this task, we use a simple deterministic linkage strategy as a preceding filter: a data pair is classified as ‘match' if all values of attributes considered agree exactly, otherwise as ‘nonmatch'. This strategy is separately combined with two probabilistic RL methods based on the Fellegi–Sunter model and with two classification tree methods (CART and Bagging). An empirical comparison was conducted on two real data sets. We used four different partitions into training data and test data to increase the validity of the results. In almost all cases, application of deterministic linkage as a preceding filter leads to better results compared to the omission of such a pre-filter, and overall classification trees exhibited best results. On all data sets, probabilistic RL only profited from deterministic linkage when the underlying probabilities were estimated before applying deterministic linkage. When using a pre-filter for subtracting definite cases, the underlying population of data pairs changes. It is crucial to take this into account for model-based probabilistic RL.

Download Full-text

An efficient GPU implementation of a multi-start TSP solver for large problem instances

Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference companion - GECCO Companion '12 ◽

10.1145/2330784.2330978 ◽

2012 ◽

Cited By ~ 3

Author(s):

Kamil Rocki ◽

Reiji Suda

Keyword(s):

Large Problem ◽

Problem Instances ◽

Gpu Implementation

Download Full-text

Designing a Supply Chain Network under the Risk of Disruptions

Mathematical Problems in Engineering ◽

10.1155/2012/234324 ◽

2012 ◽

Vol 2012 ◽

pp. 1-23 ◽

Cited By ~ 36

Author(s):

Armin Jabbarzadeh ◽

Seyed Gholamreza Jalali Naini ◽

Hamid Davoudpour ◽

Nader Azad

Keyword(s):

Supply Chain ◽

Supply Chain Design ◽

Mixed Integer ◽

Nonlinear Program ◽

Supply Chain Network ◽

Large Problem ◽

Solution Methods ◽

Total Profit ◽

Order Quantities ◽

Problem Instances

This paper studies a supply chain design problem with the risk of disruptions at facilities. At any point of time, the facilities are subject to various types of disruptions caused by natural disasters, man-made defections, and equipment breakdowns. We formulate the problem as a mixed-integer nonlinear program which maximizes the total profit for the whole system. The model simultaneously determines the number and location of facilities, the subset of customers to serve, the assignment of customers to facilities, and the cycle-order quantities at facilities. In order to obtain near-optimal solutions with reasonable computational requirements for large problem instances, two solution methods based on Lagrangian relaxation and genetic algorithm are developed. The effectiveness of the proposed solution approaches is shown using numerical experiments. The computational results, in addition, demonstrate that the benefits of considering disruptions in the supply chain design model can be significant.

Download Full-text

Taxonomic identification of environmental DNA with informatic sequence classification trees.

10.7287/peerj.preprints.26812v1 ◽

2018 ◽

Cited By ~ 5

Author(s):

Shaun P Wilkinson ◽

Simon K Davy ◽

Michael Bunce ◽

Michael Stat

Keyword(s):

Classification Tree ◽

Probabilistic Method ◽

Query Sequence ◽

Dna Barcode ◽

Classification Trees ◽

Environmental Dna ◽

Training Data ◽

Marine Biodiversity ◽

Taxonomic Identification ◽

Sequence Classification

High-throughput sequencing of environmental DNA (eDNA) offers a simple and cost-effective solution for marine biodiversity assessments. Yet several analytical challenges remain, including the incorporation of statistical inference in the assignment of taxonomic identities. We developed a probabilistic method for DNA barcode classification that can be used for both eDNA and traditional single-source sampling. The pipeline involves: (1) compiling a primer-specific database of barcode sequences to be used as training data (obtained from GenBank and other sequence repositories), (2) generating a classification tree using an iterative learning algorithm that divisively sorts the training data into hierarchical clusters based on profile hidden Markov models, (3) assignment of each query sequence to a cluster using a recursive series of model-comparison tests, and (4) taxonomic identification of the query sequences based on the lowest common taxonomic rank of the training sequences within the cluster. This method compares favorably to other DNA classification methods when tested on benchmark datasets, and offers the added features of classifying at higher taxonomic ranks and returning interpretable confidence values in the form of the Akaike weight statistic. This bioinformatics pipeline is available as an open source R package called ‘insect’ (informatic sequence classification trees).

Download Full-text

Abduction by Non-Experts

10.29007/pz3t ◽

2018 ◽

Author(s):

Nikolaj Bjorner ◽

Dejan Jovanović ◽

Tancrède Lepoint ◽

Philipp Rümmer ◽

Martin Schäf

Keyword(s):

Natural Language ◽

Software Verification ◽

Language Translation ◽

Horn Clause ◽

Large Problem ◽

Horn Clauses ◽

Success Stories ◽

Novel Approach ◽

Verification Process ◽

Problem Instances

Crowdsourcing promises to quasi-automate tasks that cannot be automated otherwise. Success stories like natural language translation or recognition of cats in images show that carefully crafted crowdsourcing tasks solve large problem instances which could not be solved otherwise. To utilize crowdsourcing, one has to define the problem in a way that is easy to split into small tasks, that the tasks are easy to solve for humans and hard to solve for a machine, and that the machine can efficiently check if the solution is correct.In this paper we discuss a novel approach of using crowdsourcing to assist software verification. We argue that Horn clauses form a good base for crowdsourcing since they are easy to subdivide, and that logic abduction is a suitable task since it is hard to find abductive inferences for Horn clauses automatically, but it is easy to check if an inference makes a Horn clause valid. We describe a prototype implementation, we show how crowdsourcing integrates in the verification process, and present preliminary results.

Download Full-text

Scaling techniques for parallel ant colony optimization on large problem instances

Proceedings of the Genetic and Evolutionary Computation Conference ◽

10.1145/3321707.3321832 ◽

2019 ◽

Author(s):

Joshua Peake ◽

Martyn Amos ◽

Paraskevas Yiapanis ◽

Huw Lloyd

Keyword(s):

Ant Colony Optimization ◽

Ant Colony ◽

Large Problem ◽

Problem Instances

Download Full-text

Identifying wine consumers interested in environmentally sustainable production practices

International Journal of Wine Business Research ◽

10.1108/ijwbr-01-2021-0003 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Kathleen Kelley ◽

Marielle Todd ◽

Helene Hopfer ◽

Michela Centinari

Keyword(s):

Cover Crops ◽

Classification Tree ◽

Sustainable Production ◽

Production Costs ◽

Training Data ◽

Internet Survey ◽

Target Market ◽

Chi Square ◽

Content Type ◽

Survey Responses

Purpose This study aims to characterize several wine consumer segments who were “likely” to sample (i.e. taste before purchasing) wine from vineyards using cover crops, a sustainable production practice that reduces herbicide applications, and identify those with a greater probability of being a viable target market based on survey responses. Design/methodology/approach A total of 956 wine consumers from the Mid-Atlantic and boarding US states were separated into segments based on an ECHAID (exhaustive Chi-square automatic interaction detector) classification tree from internet survey responses. Findings Out of the 12 created segments, 6 (n = 530, 72% of training data) contained participants who were at least 1.02 times (index score =102%) more “likely” to try the wine compared to the overall sample and were willing to pay $18.99 for a 750-mL bottle of the wine, which included a $1 surcharge to cover associated production costs. Of these, three (n = 195, 26%) had the greatest potential for which a marketing plan could be developed (index scores of 109%–121%), with over half in each segment willing to pay $20.99 for the bottle of wine, which could motivate growers to consider implementing this sustainable strategy. Originality/value Although several segments of participants were “likely” to sample the sustainably produced wine, an ECHAID classification tree allowed us to identify participants who would not pay $18.99 for a 750-mL bottle of wine, even after learning about the use of cover crops and the trade-off ($1 bottle surcharge). By narrowing the number of potential “likely” segments to those with a greater potential of sampling the wine, more purposeful marketing strategies can be developed based on demographics, attitudes, and behaviors defined in the model.

Download Full-text