Small-Data, Large-Scale Linear Optimization with Uncertain Objectives

Author(s):  
Vishal Gupta ◽  
Paat Rusmevichientong

Optimization applications often depend on a huge number of uncertain parameters. In many contexts, however, the amount of relevant data per parameter is small, and hence, we may only have imprecise estimates. We term this setting—in which the number of uncertainties is large but all estimates have low precision—the small-data, large-scale regime. We formalize a model for this new regime, focusing on optimization problems with uncertain linear objectives. We show that common data-driven methods, such as sample average approximation, data-driven robust optimization, and certain regularized policies, may perform poorly in this new setting. We then propose a novel framework for selecting a data-driven policy from a given policy class. As with the aforementioned data-driven methods, our new policy enjoys provably good performance in the large-sample regime. Unlike these methods, we show that in the small-data, large-scale regime, our data-driven policy performs comparably to an oracle best-in-class policy under some mild conditions. We strengthen this result for linear optimization problems and two natural policy classes, the first inspired by the empirical Bayes literature and the second by regularization techniques. For both classes, the suboptimality gap between our proposed policy and the oracle policy decays exponentially fast in the number of uncertain parameters even for a fixed amount of data. Thus, these policies retain the strong large-sample performance of traditional methods and additionally enjoy provably strong performance in the small-data, large-scale regime. Numerical experiments confirm the significant benefits of our methods. This paper was accepted by Yinyu Ye, optimization.

2020 ◽  
Author(s):  
Dragos Florin Ciocan ◽  
Krishnamurthy Iyer

Given the scale of the sponsored search market, it is practically important yet technically difficult to understand the interplay between bidders and the ad network and its effect on the long-run state of the market. Although typical equilibrium models account for bidders strategizing over the individual bids they submit to the auctions, they ignore that bidders also strategically set their campaign budgets. In “Tractable Equilibria in Sponsored Search with Endogenous Budgets,” F. Ciocan and K. Iyer ask how this additional strategic layer affects market operation and prove that endogenizing budgets surprisingly yields simple and interpretable equilibria. Namely, these equilibria generate quasi-truthful bidding strategies guaranteeing bidders an ROI exceeding their cost per dollar of committed budget. Additionally, the ad network’s optimal allocation policy becomes greedy with high probability. Thus, in this equilibrium, the ad network need not solve computationally challenging, large-scale linear optimization problems typically required under exogenous budgets.


Author(s):  
Martin Buhmann ◽  
Dirk Siegel

Abstract We consider Broyden class updates for large scale optimization problems in n dimensions, restricting attention to the case when the initial second derivative approximation is the identity matrix. Under this assumption we present an implementation of the Broyden class based on a coordinate transformation on each iteration. It requires only $$2nk + O(k^{2}) + O(n)$$ 2 n k + O ( k 2 ) + O ( n ) multiplications on the kth iteration and stores $$nK+ O(K^2) + O(n)$$ n K + O ( K 2 ) + O ( n ) numbers, where K is the total number of iterations. We investigate a modification of this algorithm by a scaling approach and show a substantial improvement in performance over the BFGS method. We also study several adaptations of the new implementation to the limited memory situation, presenting algorithms that work with a fixed amount of storage independent of the number of iterations. We show that one such algorithm retains the property of quadratic termination. The practical performance of the new methods is compared with the performance of Nocedal’s (Math Comput 35:773--782, 1980) method, which is considered the benchmark in limited memory algorithms. The tests show that the new algorithms can be significantly more efficient than Nocedal’s method. Finally, we show how a scaling technique can significantly improve both Nocedal’s method and the new generalized conjugate gradient algorithm.


2012 ◽  
Vol 218 (12) ◽  
pp. 6851-6859 ◽  
Author(s):  
Marta I. Velazco Fontova ◽  
Aurelio R.L. Oliveira ◽  
Christiano Lyra

2021 ◽  
Author(s):  
Dimitris Bertsimas ◽  
Shimrit Shtern ◽  
Bradley Sturt

In “Two-Stage Sample Robust Optimization,” Bertsimas, Shtern, and Sturt investigate a simple approximation scheme, based on overlapping linear decision rules, for solving data-driven two-stage distributionally robust optimization problems with the type-infinity Wasserstein ambiguity set. Their main result establishes that this approximation scheme is asymptotically optimal for two-stage stochastic linear optimization problems; that is, under mild assumptions, the optimal cost and optimal first-stage decisions obtained by approximating the robust optimization problem converge to those of the underlying stochastic problem as the number of data points grows to infinity. These guarantees notably apply to two-stage stochastic problems that do not have relatively complete recourse, which arise frequently in applications. In this context, the authors show through numerical experiments that the approximation scheme is practically tractable and produces decisions that significantly outperform those obtained from state-of-the-art data-driven alternatives.


Author(s):  
Minglong Zhou ◽  
Gar Goei Loke ◽  
Chaithanya Bandi ◽  
Zi Qiang Glen Liau ◽  
Wilson Wang

Problem definition: We consider the intraday scheduling problem in a group of orthopaedic clinics where the planner schedules appointment times, given a sequence of appointments. We consider patient re-entry—where patients may be required to go for an x-ray examination, returning to the same doctor they have seen—and variability in patient behaviours such as walk-ins, earliness, and no-shows, which leads to inefficiency such as long patient waiting time and physician overtime. Academic/practical relevance: In our data set, 25% of the patients are required to go for x-ray examination. We also found significant variability in patient behaviours. Hence, patient re-entry and variability in behaviours are common, but we found little in the literature that could handle them. Methodology: We formulate the problem as a two-stage optimization problem, where scheduling decisions are made in the first stage. Queue dynamics in the second stage are modeled under a P-Queue paradigm, which minimizes a risk index representing the chance of violating performance targets, such as patient waiting times. The model reduces to a sequence of mixed-integer linear-optimization problems. Results: Our model achieves significant reductions, in comparative studies against a sample average approximation (SAA) model, on patient waiting times, while keeping server overtime constant. Our simulations further characterize the types of uncertainties under which SAA performs poorly. Managerial insights: We present an optimization model that is easy to implement in practice and tractable to compute. Our simulations indicate that not accounting for patient re-entry or variability in patient behaviours will lead to suboptimal policies, especially when they have specific structure that should be considered.


2021 ◽  
Author(s):  
Michael F. Adamer ◽  
Sarah C. Brueningk ◽  
Alejandro Tejada-Arranz ◽  
Fabienne Estermann ◽  
Marek Basler ◽  
...  

With the steadily increasing abundance of omics data produced all over the world, sometimes decades apart and under vastly different experimental conditions residing in public databases, a crucial step in many data-driven bioinformatics applications is that of data integration. The challenge of batch effect removal for entire databases lies in the large number and coincide of both batches and desired, biological variation resulting in design matrix singularity. This problem currently cannot be solved by any common batch correction algorithm. In this study, we present reComBat, a regularised version of the empirical Bayes method to overcome this limitation. We demonstrate our approach for the harmonisation of public gene expression data of the human opportunistic pathogen Pseudomonas aeruginosa and study a several metrics to empirically demonstrate that batch effects are successfully mitigated while biologically meaningful gene expression variation is retained. reComBat fills the gap in batch correction approaches applicable to large scale, public omics databases and opens up new avenues for data driven analysis of complex biological processes beyond the scope of a single study.


MACRo 2015 ◽  
2015 ◽  
Vol 1 (1) ◽  
pp. 283-292
Author(s):  
Péter Böröcz ◽  
Péter Tar ◽  
István Maros

AbstractSparse linear algebraic data structures are widely used during the solution of large scale linear optimization problems. The efficiency of the solver is significantly influenced by the used data structures. The implementations of such data structures are not trivial. A performance analysis of the available data structures can provide valuable information to improve efficiency. In the talk we present our software that supports this task as well as our new, special vector representation. We also report results covering the solution for numerical issues affecting the performance of sparse linear algebraic operations.


2013 ◽  
Vol 221 (3) ◽  
pp. 190-200 ◽  
Author(s):  
Jörg-Tobias Kuhn ◽  
Thomas Kiefer

Several techniques have been developed in recent years to generate optimal large-scale assessments (LSAs) of student achievement. These techniques often represent a blend of procedures from such diverse fields as experimental design, combinatorial optimization, particle physics, or neural networks. However, despite the theoretical advances in the field, there still exists a surprising scarcity of well-documented test designs in which all factors that have guided design decisions are explicitly and clearly communicated. This paper therefore has two goals. First, a brief summary of relevant key terms, as well as experimental designs and automated test assembly routines in LSA, is given. Second, conceptual and methodological steps in designing the assessment of the Austrian educational standards in mathematics are described in detail. The test design was generated using a two-step procedure, starting at the item block level and continuing at the item level. Initially, a partially balanced incomplete item block design was generated using simulated annealing, whereas in a second step, items were assigned to the item blocks using mixed-integer linear optimization in combination with a shadow-test approach.


Sign in / Sign up

Export Citation Format

Share Document