scholarly journals Data Pooling in Stochastic Optimization

2021 ◽  
Author(s):  
Vishal Gupta ◽  
Nathan Kallus

Managing large-scale systems often involves simultaneously solving thousands of unrelated stochastic optimization problems, each with limited data. Intuition suggests that one can decouple these unrelated problems and solve them separately without loss of generality. We propose a novel data-pooling algorithm called Shrunken-SAA that disproves this intuition. In particular, we prove that combining data across problems can outperform decoupling, even when there is no a priori structure linking the problems and data are drawn independently. Our approach does not require strong distributional assumptions and applies to constrained, possibly nonconvex, nonsmooth optimization problems such as vehicle-routing, economic lot-sizing, or facility location. We compare and contrast our results to a similar phenomenon in statistics (Stein’s phenomenon), highlighting unique features that arise in the optimization setting that are not present in estimation. We further prove that, as the number of problems grows large, Shrunken-SAA learns if pooling can improve upon decoupling and the optimal amount to pool, even if the average amount of data per problem is fixed and bounded. Importantly, we highlight a simple intuition based on stability that highlights when and why data pooling offers a benefit, elucidating this perhaps surprising phenomenon. This intuition further suggests that data pooling offers the most benefits when there are many problems, each of which has a small amount of relevant data. Finally, we demonstrate the practical benefits of data pooling using real data from a chain of retail drug stores in the context of inventory management. This paper was accepted by Chung Piaw Teo, Special Issue on Data-Driven Prescriptive Analytics.

2020 ◽  
Vol 186 (3) ◽  
pp. 985-1005
Author(s):  
Pierre Carpentier ◽  
Jean-Philippe Chancelier ◽  
Michel De Lara ◽  
François Pacaud

2020 ◽  
Vol 80 (5) ◽  
pp. 870-909 ◽  
Author(s):  
Maxwell Mansolf ◽  
Annabel Vreeker ◽  
Steven P. Reise ◽  
Nelson B. Freimer ◽  
David C. Glahn ◽  
...  

Large-scale studies spanning diverse project sites, populations, languages, and measurements are increasingly important to relate psychological to biological variables. National and international consortia already are collecting and executing mega-analyses on aggregated data from individuals, with different measures on each person. In this research, we show that Asparouhov and Muthén’s alignment method can be adapted to align data from disparate item sets and response formats. We argue that with these adaptations, the alignment method is well suited for combining data across multiple sites even when they use different measurement instruments. The approach is illustrated using data from the Whole Genome Sequencing in Psychiatric Disorders consortium and a real-data-based simulation is used to verify accurate parameter recovery. Factor alignment appears to increase precision of measurement and validity of scores with respect to external criteria. The resulting parameter estimates may further inform development of more effective and efficient methods to assess the same constructs in prospectively designed studies.


Energies ◽  
2020 ◽  
Vol 14 (1) ◽  
pp. 23
Author(s):  
Vahab Rostampour ◽  
Tamás Keviczky

This paper presents a distributed computational framework for stochastic convex optimization problems using the so-called scenario approach. Such a problem arises, for example, in a large-scale network of interconnected linear systems with local and common uncertainties. Due to the large number of required scenarios to approximate the stochasticity of these problems, the stochastic optimization involves formulating a large-scale scenario program, which is in general computationally demanding. We present two novel ideas in this paper to address this issue. We first develop a technique to decompose the large-scale scenario program into distributed scenario programs that exchange a certain number of scenarios with each other to compute local decisions using the alternating direction method of multipliers (ADMM). We show the exactness of the decomposition with a-priori probabilistic guarantees for the desired level of constraint fulfillment for both local and common uncertainty sources. As our second contribution, we develop a so-called soft communication scheme based on a set parametrization technique together with the notion of probabilistically reliable sets to reduce the required communication between the subproblems. We show how to incorporate the probabilistic reliability notion into existing results and provide new guarantees for the desired level of constraint violations. Two different simulation studies of two types of interconnected network, namely dynamically coupled and coupling constraints, are presented to illustrate advantages of the proposed distributed framework.


Author(s):  
Zhengyuan Zhou ◽  
Panayotis Mertikopoulos ◽  
Nicholas Bambos ◽  
Peter Glynn ◽  
Yinyu Ye

The recent surge of breakthroughs in machine learning and artificial intelligence has sparked renewed interest in large-scale stochastic optimization problems that are universally considered hard. One of the most widely used methods for solving such problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent on distributed computing architectures (possibly) asychronously. However, a key obstacle in the efficient implementation of DASGD is the issue of delays: when a computing node contributes a gradient update, the global model parameter may have already been updated by other nodes several times over, thereby rendering this gradient information stale. These delays can quickly add up if the computational throughput of a node is saturated, so the convergence of DASGD may be compromised in the presence of large delays. Our first contribution is that, by carefully tuning the algorithm’s step size, convergence to the critical set is still achieved in mean square, even if the delays grow unbounded at a polynomial rate. We also establish finer results in a broad class of structured optimization problems (called variationally coherent), where we show that DASGD converges to a global optimum with a probability of one under the same delay assumptions. Together, these results contribute to the broad landscape of large-scale nonconvex stochastic optimization by offering state-of-the-art theoretical guarantees and providing insights for algorithm design.


Author(s):  
Paul Cronin ◽  
Harry Woerde ◽  
Rob Vasbinder

2019 ◽  
Author(s):  
Chem Int

The objective of this work is to study the ageing state of a used reverse osmosis (RO) membrane taken in Algeria from the Benisaf Water Company seawater desalination unit. The study consists of an autopsy procedure used to perform a chain of analyses on a membrane sheet. Wear of the membrane is characterized by a degradation of its performance due to a significant increase in hydraulic permeability (25%) and pressure drop as well as a decrease in salt retention (10% to 30%). In most cases the effects of ageing are little or poorly known at the local level and global measurements such as (flux, transmembrane pressure, permeate flow, retention rate, etc.) do not allow characterization. Therefore, a used RO (reverse osmosis) membrane was selected at the site to perform the membrane autopsy tests. These tests make it possible to analyze and identify the cause as well as to understand the links between performance degradation observed at the macroscopic scale and at the scale at which ageing takes place. External and internal visual observations allow seeing the state of degradation. Microscopic analysis of the used membranes surface shows the importance of fouling. In addition, quantification and identification analyses determine a high fouling rate in the used membrane whose foulants is of inorganic and organic nature. Moreover, the analyses proved the presence of a biofilm composed of protein.


1999 ◽  
Vol 9 (3) ◽  
pp. 755-778 ◽  
Author(s):  
Paul T. Boggs ◽  
Anthony J. Kearsley ◽  
Jon W. Tolle

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Camilo Broc ◽  
Therese Truong ◽  
Benoit Liquet

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.


1997 ◽  
Vol 84 (3) ◽  
pp. 1109-1112 ◽  
Author(s):  
M. B. Gitman ◽  
P. V. Trusov ◽  
S. A. Fedoseev

Sign in / Sign up

Export Citation Format

Share Document