A Mixed-Integer Fractional Optimization Approach to Best Subset Selection

Author(s):  
Andrés Gómez ◽  
Oleg A. Prokopyev

We consider the best subset selection problem in linear regression—that is, finding a parsimonious subset of the regression variables that provides the best fit to the data according to some predefined criterion. We are primarily concerned with alternatives to cross-validation methods that do not require data partitioning and involve a range of information criteria extensively studied in the statistical literature. We show that the problem of interest can be modeled using fractional mixed-integer optimization, which can be tackled by leveraging recent advances in modern optimization solvers. The proposed algorithms involve solving a sequence of mixed-integer quadratic optimization problems (or their convexifications) and can be implemented with off-the-shelf solvers. We report encouraging results in our computational experiments, with respect to both the optimization and statistical performance. Summary of Contribution: This paper considers feature selection problems with information criteria. We show that by adopting a fractional optimization perspective (a well-known field in nonlinear optimization and operations research), it is possible to leverage recent advances in mixed-integer quadratic optimization technology to tackle traditional statistical problems long considered intractable. We present extensive computational experiments, with both synthetic and real data, illustrating that the new fractional optimization approach is orders of magnitude faster than existing approaches in the literature.

Author(s):  
Enrico Civitelli ◽  
Matteo Lapucci ◽  
Fabio Schoen ◽  
Alessio Sortino

AbstractIn this paper, the problem of best subset selection in logistic regression is addressed. In particular, we take into account formulations of the problem resulting from the adoption of information criteria, such as AIC or BIC, as goodness-of-fit measures. There exist various methods to tackle this problem. Heuristic methods are computationally cheap, but are usually only able to find low quality solutions. Methods based on local optimization suffer from similar limitations as heuristic ones. On the other hand, methods based on mixed integer reformulations of the problem are much more effective, at the cost of higher computational requirements, that become unsustainable when the problem size grows. We thus propose a new approach, which combines mixed-integer programming and decomposition techniques in order to overcome the aforementioned scalability issues. We provide a theoretical characterization of the proposed algorithm properties. The results of a vast numerical experiment, performed on widely available datasets, show that the proposed method achieves the goal of outperforming state-of-the-art techniques.


2017 ◽  
Author(s):  
Deniz Akdemir

AbstractOptimal subset selection is an important task that has numerous algorithms designed for it and has many application areas. STPGA contains a special genetic algorithm supplemented with a tabu memory property (that keeps track of previously tried solutions and their fitness for a number of iterations), and with a regression of the fitness of the solutions on their coding that is used to form the ideal estimated solution (look ahead property) to search for solutions of generic optimal subset selection problems. I have initially developed the programs for the specific problem of selecting training populations for genomic prediction or association problems, therefore I give discussion of the theory behind optimal design of experiments to explain the default optimization criteria in STPGA, and illustrate the use of the programs in this endeavor. Nevertheless, I have picked a few other areas of application: supervised and unsupervised variable selection based on kernel alignment, supervised variable selection with design criteria, influential observation identification for regression, solving mixed integer quadratic optimization problems, balancing gains and inbreeding in a breeding population. Some of these illustrations pertain new statistical approaches.


2021 ◽  
pp. 1-15
Author(s):  
Jinding Gao

In order to solve some function optimization problems, Population Dynamics Optimization Algorithm under Microbial Control in Contaminated Environment (PDO-MCCE) is proposed by adopting a population dynamics model with microbial treatment in a polluted environment. In this algorithm, individuals are automatically divided into normal populations and mutant populations. The number of individuals in each category is automatically calculated and adjusted according to the population dynamics model, it solves the problem of artificially determining the number of individuals. There are 7 operators in the algorithm, they realize the information exchange between individuals the information exchange within and between populations, the information diffusion of strong individuals and the transmission of environmental information are realized to individuals, the number of individuals are increased or decreased to ensure that the algorithm has global convergence. The periodic increase of the number of individuals in the mutant population can greatly increase the probability of the search jumping out of the local optimal solution trap. In the iterative calculation, the algorithm only deals with 3/500∼1/10 of the number of individual features at a time, the time complexity is reduced greatly. In order to assess the scalability, efficiency and robustness of the proposed algorithm, the experiments have been carried out on realistic, synthetic and random benchmarks with different dimensions. The test case shows that the PDO-MCCE algorithm has better performance and is suitable for solving some optimization problems with higher dimensions.


Energies ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 2261
Author(s):  
Evgeniy Ganev ◽  
Boyan Ivanov ◽  
Natasha Vaklieva-Bancheva ◽  
Elisaveta Kirilova ◽  
Yunzile Dzhelil

This study proposes a multi-objective approach for the optimal design of a sustainable Integrated Biodiesel/Diesel Supply Chain (IBDSC) based on first- (sunflower and rapeseed) and second-generation (waste cooking oil and animal fat) feedstocks with solid waste use. It includes mixed-integer linear programming (MILP) models of the economic, environmental and social impact of IBDSC, and respective criteria defined in terms of costs. The purpose is to obtain the optimal number, sizes and locations of bio-refineries and solid waste plants; the areas and amounts of feedstocks needed for biodiesel production; and the transportation mode. The approach is applied on a real case study in which the territory of Bulgaria with its 27 districts is considered. Optimization problems are formulated for a 5-year period using either environmental or economic criteria and the remainder are defined as constraints. The obtained results show that in the case of the economic criterion, 14% of the agricultural land should be used for sunflower and 2% for rapeseed cultivation, while for the environmental case, 12% should be used for rapeseed and 3% for sunflower. In this case, the price of biodiesel is 14% higher, and the generated pollutants are 6.6% lower. The optimal transport for both cases is rail.


TECHNOLOGY ◽  
2018 ◽  
Vol 06 (02) ◽  
pp. 49-58
Author(s):  
Iman Dayarian ◽  
Timothy C.Y. Chan ◽  
David Jaffray ◽  
Teo Stanescu

Magnetic resonance imaging (MRI) is a powerful diagnostic tool that has become the imaging modality of choice for soft-tissue visualization in radiation therapy. Emerging technologies aim to integrate MRI with a medical linear accelerator to form novel cancer therapy systems (MR-linac), but the design of these systems to date relies on heuristic procedures. This paper develops an exact, optimization-based approach for magnet design that 1) incorporates the most accurate physics calculations to date, 2) determines precisely the relative spatial location, size, and current magnitude of the magnetic coils, 3) guarantees field homogeneity inside the imaging volume, 4) produces configurations that satisfy, for the first time, small-footprint feasibility constraints required for MR-linacs. Our approach leverages modern mixed-integer programming (MIP), enabling significant flexibility in magnet design generation, e.g., controlling the number of coils and enforcing symmetry between magnet poles. Our numerical results demonstrate the superiority of our method versus current mainstream methods.


Author(s):  
Mostafa Elshahed ◽  
Mahmoud Dawod ◽  
Zeinab H. Osman

Integrating Distributed Generation (DG) units into distribution systems can have an impact on the voltage profile, power flow, power losses, and voltage stability. In this paper, a new methodology for DG location and sizing are developed to minimize system losses and maximize voltage stability index (VSI). A proper allocation of DG has to be determined using the fuzzy ranking method to verify best compromised solutions and achieve maximum benefits. Synchronous machines are utilized and its power factor is optimally determined via genetic optimization to inject reactive power to decrease system losses and improve voltage profile and VSI. The Augmented Lagrangian Genetic Algorithm with nonlinear mixed-integer variables and Non-dominated Sorting Genetic Algorithm have been implemented to solve both single/multi-objective function optimization problems. For proposed methodology effectiveness verification, it is tested on 33-bus and 69-bus radial distribution systems then compared with previous works.


Sign in / Sign up

Export Citation Format

Share Document