scholarly journals Optimal Order Batching in Warehouse Management: A Data-Driven Robust Approach

Author(s):  
Vedat Bayram ◽  
Gohram Baloch ◽  
Fatma Gzara ◽  
Samir Elhedhli

Optimizing warehouse processes has direct impact on supply chain responsiveness, timely order fulfillment, and customer satisfaction. In this work, we focus on the picking process in warehouse management and study it from a data perspective. Using historical data from an industrial partner, we introduce, model, and study the robust order batching problem (ROBP) that groups orders into batches to minimize total order processing time accounting for uncertainty caused by system congestion and human behavior. We provide a generalizable, data-driven approach that overcomes warehouse-specific assumptions characterizing most of the work in the literature. We analyze historical data to understand the processes in the warehouse, to predict processing times, and to improve order processing. We introduce the ROBP and develop an efficient learning-based branch-and-price algorithm based on simultaneous column and row generation, embedded with alternative prediction models such as linear regression and random forest that predict processing time of a batch. We conduct extensive computational experiments to test the performance of the proposed approach and to derive managerial insights based on real data. The data-driven prescriptive analytics tool we propose achieves savings of seven to eight minutes per order, which translates into a 14.8% increase in daily picking operations capacity of the warehouse.

2017 ◽  
Vol 4 (suppl_1) ◽  
pp. S403-S404
Author(s):  
Maggie Makar ◽  
Jeeheh Oh ◽  
Christopher Fusco ◽  
Joseph Marchesani ◽  
Robert McCaffrey ◽  
...  

Abstract Background An estimated 293,300 healthcare-associated cases of Clostridium difficile infection (CDI) occur annually in the United States. Prior research on risk-prediction models for CDI have focused on a small number of risk factors with the goal of developing a model that works well across hospitals. We hypothesize that risk factors are, in part, hospital-specific. We applied a generalizable machine learning approach to discovering, or “learning”, hospital-specific risk-stratification models using electronic health record (EHR) data collected during the course of patient care from the Massachusetts General Hospital (MGH) and the University of Michigan Health System (UM). Methods We utilized EHR data from 115,958 adult inpatient admissions from 2012–2014 (MGH) and 258,050 adult inpatient admissions from 2010–2016 (UM) (Fig 1). We extracted patient demographics, admission details, patient history, and daily hospitalization details, resulting in 2,964 and 4,739 features in the MGH and UM models, respectively. We used L2 regularized logistic regression to learn the models and measured the discriminative performance of the models on a year of held-out data from each hospital. Results The MGH and UM models achieved AUROCs of 0.74 (CI: 0.73–0.75) and 0.77 (CI: 0.75–0.80), respectively. The relative importance of risk factors varied significantly across hospitals. In particular, in-hospital locations appeared in the set of top risk factors at one hospital and in the set of protective factors at the other. On average, both models were able to predict CDI five days in advance of clinical diagnosis (Fig 2). Conclusion We used EHR data to generate a daily estimate of the risk of CDI for each inpatient hospitalization. We applied a generalizable data-driven approach to existing data from two large institutions with different patient populations and different data formats and content. In contrast to approaches that focus on learning models that apply generally across hospitals, our proposed approach yields risk stratification models tailored to an institution’s EHR system and patient population. In turn, these hospital-specific models could allow for earlier and more accurate identification of high-risk patients. Disclosures All authors: No reported disclosures.


2018 ◽  
Vol 39 (4) ◽  
pp. 425-433 ◽  
Author(s):  
Jeeheh Oh ◽  
Maggie Makar ◽  
Christopher Fusco ◽  
Robert McCaffrey ◽  
Krishna Rao ◽  
...  

OBJECTIVEAn estimated 293,300 healthcare-associated cases ofClostridium difficileinfection (CDI) occur annually in the United States. To date, research has focused on developing risk prediction models for CDI that work well across institutions. However, this one-size-fits-all approach ignores important hospital-specific factors. We focus on a generalizable method for building facility-specific models. We demonstrate the applicability of the approach using electronic health records (EHR) from the University of Michigan Hospitals (UM) and the Massachusetts General Hospital (MGH).METHODSWe utilized EHR data from 191,014 adult admissions to UM and 65,718 adult admissions to MGH. We extracted patient demographics, admission details, patient history, and daily hospitalization details, resulting in 4,836 features from patients at UM and 1,837 from patients at MGH. We used L2 regularized logistic regression to learn the models, and we measured the discriminative performance of the models on held-out data from each hospital.RESULTSUsing the UM and MGH test data, the models achieved area under the receiver operating characteristic curve (AUROC) values of 0.82 (95% confidence interval [CI], 0.80–0.84) and 0.75 ( 95% CI, 0.73–0.78), respectively. Some predictive factors were shared between the 2 models, but many of the top predictive factors differed between facilities.CONCLUSIONA data-driven approach to building models for estimating daily patient risk for CDI was used to build institution-specific models at 2 large hospitals with different patient populations and EHR systems. In contrast to traditional approaches that focus on developing models that apply across hospitals, our generalizable approach yields risk-stratification models tailored to an institution. These hospital-specific models allow for earlier and more accurate identification of high-risk patients and better targeting of infection prevention strategies.Infect Control Hosp Epidemiol2018;39:425–433


2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Yuanyuan Zhang ◽  
Shudong Wang ◽  
Xinzeng Wang

Background. DNA methylation is essential for regulating gene expression, and the changes of DNA methylation status are commonly discovered in disease. Therefore, identification of differentially methylation patterns, especially differentially methylated regions (DMRs), in two different groups is important for understanding the mechanism of complex diseases. Few tools exist for DMR identification through considering features of methylation data, but there is no comprehensive integration of the characteristics of DNA methylation data in current methods. Results. Accounting for the characteristics of methylation data, such as the correlation characteristics of neighboring CpG sites and the high heterogeneity of DNA methylation data, we propose a data-driven approach for DMR identification through evaluating the energy of single site using modified 1D Ising model. Applied to both simulated and publicly available datasets, our approach is compared with other popular methods in terms of performance. Simulated results show that our method is more sensitive than competing methods. Applied to the real data, our method can identify more common DMRs than DMRcate, ProbeLasso, and Wang’s methods with a high overlapping ratio. Also, the necessity of integrating the heterogeneity and correlation characteristics in identifying DMR is shown through comparing results with only considering mean or variance signals and without considering relationship of neighboring CpG sites, respectively. Through analyzing the number of DMRs identified in real data located in different genomic regions, we find that about 90% DMRs are located in CGI which always regulates the expression of genes. It may help us understand the functional effect of DNA methylation on disease.


Water ◽  
2019 ◽  
Vol 11 (7) ◽  
pp. 1500
Author(s):  
Adrià Soldevila ◽  
Joaquim Blesa ◽  
Rosa M. Fernandez-Canti ◽  
Sebastian Tornil-Sin ◽  
Vicenç Puig

This paper presents a new data-driven method for leak localization in water distribution networks. The proposed method relies on the use of available pressure measurements in some selected internal network nodes and on the estimation of the pressure at the remaining nodes using Kriging spatial interpolation. Online leak localization is attained by comparing current pressure values with their reference values. Supported by Kriging; this comparison can be performed for all the network nodes, not only for those equipped with pressure sensors. On the one hand, reference pressure values in all nodes are obtained by applying Kriging to measurement data previously recorded under network operation without leaks. On the other hand, current pressure values at all nodes are obtained by applying Kriging to the current measured pressure values. The node that presents the maximum difference (residual) between current and reference pressure values is proposed as a leaky node candidate. Thereafter, a time horizon computation based on Bayesian reasoning is applied to consider the residual time evolution, resulting in an improved leak localization accuracy. As a data-driven approach, the proposed method does not need a hydraulic model; only historical data from normal operation is required. This is an advantage with respect to most data-driven methods that need historical data for the considered leak scenarios. Since, in practice, the obtained leak localization results will strongly depend on the number of available pressure measurements and their location, an optimal sensor placement procedure is also proposed in the paper. Three different case studies illustrate the performance of the proposed methodologies.


2021 ◽  
Author(s):  
Cecilia E Thomas ◽  
Leo Dahl ◽  
Sanna Byström ◽  
Yan Chen ◽  
Mathias Uhlén ◽  
...  

Background: Risk prediction is crucial for early detection and prognosis of breast cancer. Circulating plasma proteins could provide a valuable source to increase the validity of risk prediction models, however, no such markers have yet been identified for clinical use. Methods: EDTA plasma samples from 183 breast cancer cases and 366 age-matched controls were collected prior to diagnosis from the Swedish breast cancer cohort KARMA. The samples were profiled on 700 circulating proteins using an exploratory affinity proteomics approach. Linear association analyses were performed on case-control status and a data-driven analysis strategy was applied to cluster the women on their plasma proteome profiles in an unsupervised manner. The resulting clusters were subsequently annotated for the differences in phenotypic characteristics, clinical parameters, and genetic risk. Results: Using the data-driven approach we identified five clusters with distinct proteomic plasma profiles. Women in a particular sub-group (cluster 1) were significantly more likely to have used menopausal hormonal therapy (MHT), more likely to get a breast cancer diagnosis, and were older compared to the remaining clusters. The levels of circulating proteins in cluster 1 were decreased for proteins related to DNA repair and cell replication and increased for proteins related to mammographic density and female tissues. In contrast, classical dichotomous case-control analyses did not reveal any proteins significantly associated with future breast cancer. Conclusion: Using a data-driven approach, we identified a subset of women with circulating proteins associated with previous use of MHT and risk of breast cancer. Our findings point to the potential long-lasting effects of MHT on the circulating proteome even after ending the treatment, and hence provide valuable insights concerning risk predication of breast cancer.


2020 ◽  
Vol 10 (16) ◽  
pp. 5696 ◽  
Author(s):  
Samar A. Shilbayeh ◽  
Abdullah Abonamah ◽  
Ahmad A. Masri

Prediction models of coronavirus disease utilizing machine learning algorithms range from forecasting future suspect cases, predicting mortality rates, to building a pattern for country-specific pandemic end date. To predict the future suspect infection and death cases, we categorized the approaches found in the literature into: first, a purely data-driven approach, whose goal is to build a mathematical model that relates the data variables including outputs with inputs to detect general patterns. The discovered patterns can then be used to predict the future infected cases without any expert input. The second approach is partially data-driven; it uses historical data, but allows expert input such as the SIR epidemic algorithm. This approach assumes that the epidemic will end according to medical reasoning. In this paper, we compare the purely data-driven and partially-data driven approaches by applying them to data from three countries having different past pattern behavior. The countries are the US, Jordan, and Italy. It is found that those two prediction approaches yield significantly different results. Purely data-driven approach depends totally on the past behavior and does not show any decline in the number of the infected cases if the country did not experience any decline in the number of cases. On the other hand, a partially data-driven approach guarantees a timely decline of the infected curve to reach zero. Using the two approaches highlights the importance of human intervention in pandemic prediction to guide the learning process as opposed to the purely data-driven approach that predicts future cases based on the pattern detected in the data.


2021 ◽  
Vol 11 (15) ◽  
pp. 6967
Author(s):  
Marco Cipriano ◽  
Luca Colomba ◽  
Paolo Garza

Mobility in cities is a fundamental asset and opens several problems in decision making and the creation of new services for citizens. In the last years, transportation sharing systems have been continuously growing. Among these, bike sharing systems became commonly adopted. There exist two different categories of bike sharing systems: station-based systems and free-floating services. In this paper, we concentrate our analyses on station-based systems. Such systems require periodic rebalancing operations to guarantee good quality of service and system usability by moving bicycles from full stations to empty stations. In particular, in this paper, we propose a dynamic bicycle rebalancing methodology based on frequent pattern mining and its implementation. The extracted patterns represent frequent unbalanced situations among nearby stations. They are used to predict upcoming critical statuses and plan the most effective rebalancing operations using an entirely data-driven approach. Experiments performed on real data of the Barcelona bike sharing system show the effectiveness of the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document