Historical Data Driven and Component Based Prediction Models for Predicting Preliminary Engineering Costs of Roadway Projects

Author(s):  
Asregedew Woldesenbet ◽  
"David" Hyung Seok Jeong
Author(s):  
Vedat Bayram ◽  
Gohram Baloch ◽  
Fatma Gzara ◽  
Samir Elhedhli

Optimizing warehouse processes has direct impact on supply chain responsiveness, timely order fulfillment, and customer satisfaction. In this work, we focus on the picking process in warehouse management and study it from a data perspective. Using historical data from an industrial partner, we introduce, model, and study the robust order batching problem (ROBP) that groups orders into batches to minimize total order processing time accounting for uncertainty caused by system congestion and human behavior. We provide a generalizable, data-driven approach that overcomes warehouse-specific assumptions characterizing most of the work in the literature. We analyze historical data to understand the processes in the warehouse, to predict processing times, and to improve order processing. We introduce the ROBP and develop an efficient learning-based branch-and-price algorithm based on simultaneous column and row generation, embedded with alternative prediction models such as linear regression and random forest that predict processing time of a batch. We conduct extensive computational experiments to test the performance of the proposed approach and to derive managerial insights based on real data. The data-driven prescriptive analytics tool we propose achieves savings of seven to eight minutes per order, which translates into a 14.8% increase in daily picking operations capacity of the warehouse.


Author(s):  
Vijay Kumar Dwivedi ◽  
Manoj Madhava Gore

Background: Stock price prediction is a challenging task. The social, economic, political, and various other factors cause frequent abrupt changes in the stock price. This article proposes a historical data-based ensemble system to predict the closing stock price with higher accuracy and consistency over the existing stock price prediction systems. Objective: The primary objective of this article is to predict the closing price of a stock for the next trading in more accurate and consistent manner over the existing methods employed for the stock price prediction. Method: The proposed system combines various machine learning-based prediction models employing least absolute shrinkage and selection operator (LASSO) regression regularization technique to enhance the accuracy of stock price prediction system as compared to any one of the base prediction models. Results: The analysis of results for all the eleven stocks (listed under Information Technology sector on the Bombay Stock Exchange, India) reveals that the proposed system performs best (on all defined metrics of the proposed system) for training datasets and test datasets comprising of all the stocks considered in the proposed system. Conclusion: The proposed ensemble model consistently predicts stock price with a high degree of accuracy over the existing methods used for the prediction.


2019 ◽  
Vol 33 (3) ◽  
pp. 89-109 ◽  
Author(s):  
Ting (Sophia) Sun

SYNOPSIS This paper aims to promote the application of deep learning to audit procedures by illustrating how the capabilities of deep learning for text understanding, speech recognition, visual recognition, and structured data analysis fit into the audit environment. Based on these four capabilities, deep learning serves two major functions in supporting audit decision making: information identification and judgment support. The paper proposes a framework for applying these two deep learning functions to a variety of audit procedures in different audit phases. An audit data warehouse of historical data can be used to construct prediction models, providing suggested actions for various audit procedures. The data warehouse will be updated and enriched with new data instances through the application of deep learning and a human auditor's corrections. Finally, the paper discusses the challenges faced by the accounting profession, regulators, and educators when it comes to applying deep learning.


Energies ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 2135
Author(s):  
Marcin Witczak ◽  
Marcin Mrugalski ◽  
Bogdan Lipiec

The paper presents a new method of predicting the remaining useful life of technical devices. The proposed soft computing approach bridges the gap between analytical and data-driven health prognostic approaches. Whilst the former ones are based on the classical exponential shape of degradation, the latter ones learn the degradation behavior from the observed historical data. As a result of the proposed fusion, a practical method for calculating components’ remaining useful life is proposed. Contrarily to the approaches presented in the literature, the proposed ensemble of analytical and data-driven approaches forms the uncertainty interval containing an expected remaining useful life. In particular, a Takagi–Sugeno multiple models-based framework is used as a data-driven approach while an exponential curve fitting on-line approach serves as an analytical one. Unlike conventional data-driven methods, the proposed approach is designed on the basis of the historical data that apart from learning is also applied to support the diagnostic decisions. Finally, the entire scheme is used to predict power Metal Oxide Field Effect Transistors’ (MOSFETs) health status. The status of the currently operating MOSFET is determined taking into consideration the knowledge obtained from the preceding MOSFETs, which went through the run-to-failure process. Finally, the proposed approach is validated with the application of real data obtained from the NASA Ames Prognostics Data Repository.


2020 ◽  
Author(s):  
Al-Ekram Elahee Hridoy ◽  
Mohammad Naim ◽  
Nazim Uddin Emon ◽  
Imrul Hasan Tipo ◽  
Safayet Alam ◽  
...  

AbstractOn December 31, 2019, the World Health Organization (WHO) was informed that atypical pneumonia-like cases have emerged in Wuhan City, Hubei province, China. WHO identified it as a novel coronavirus and declared a global pandemic on March 11th, 2020. At the time of writing this, the COVID-19 claimed more than 440 thousand lives worldwide and led to the global economy and social life into an abyss edge in the living memory. As of now, the confirmed cases in Bangladesh have surpassed 100 thousand and more than 1343 deaths putting startling concern on the policymakers and health professionals; thus, prediction models are necessary to forecast a possible number of cases in the future. To shed light on it, in this paper, we presented data-driven estimation methods, the Long Short-Term Memory (LSTM) networks, and Logistic Curve methods to predict the possible number of COVID-19 cases in Bangladesh for the upcoming months. The results using Logistic Curve suggests that Bangladesh has passed the inflection point on around 28-30 May 2020, a plausible end date to be on the 2nd of January 2021 and it is expected that the total number of infected people to be between 187 thousand to 193 thousand with the assumption that stringent policies are in place. The logistic curve also suggested that Bangladesh would reach peak COVID-19 cases at the end of August with more than 185 thousand total confirmed cases, and around 6000 thousand daily new cases may observe. Our findings recommend that the containment strategies should immediately implement to reduce transmission and epidemic rate of COVID-19 in upcoming days.HighlightsAccording to the Logistic curve fitting analysis, the inflection point of the COVID-19 pandemic has recently passed, which was approximately between May 28, 2020, to May 30, 2020.It is estimated that the total number of confirmed cases will be around 187-193 thousand at the end of the epidemic. We expect that the actual number will most likely to in between these two values, under the assumption that the current transmission is stable and improved stringent policies will be in place to contain the spread of COVID-19.The estimated total death toll will be around 3600-4000 at the end of the epidemic.The epidemic of COVID-19 in Bangladesh will be mostly under control by the 2nd of January 2021 if stringent measures are taken immediately.


Author(s):  
Yunpeng Li ◽  
Utpal Roy ◽  
Y. Tina Lee ◽  
Sudarsan Rachuri

Rule-based expert systems such as CLIPS (C Language Integrated Production System) are 1) based on inductive (if-then) rules to elicit domain knowledge and 2) designed to reason new knowledge based on existing knowledge and given inputs. Recently, data mining techniques have been advocated for discovering knowledge from massive historical or real-time sensor data. Combining top-down expert-driven rule models with bottom-up data-driven prediction models facilitates enrichment and improvement of the predefined knowledge in an expert system with data-driven insights. However, combining is possible only if there is a common and formal representation of these models so that they are capable of being exchanged, reused, and orchestrated among different authoring tools. This paper investigates the open standard PMML (Predictive Model Mockup Language) in integrating rule-based expert systems with data analytics tools, so that a decision maker would have access to powerful tools in dealing with both reasoning-intensive tasks and data-intensive tasks. We present a process planning use case in the manufacturing domain, which is originally implemented as a CLIPS-based expert system. Different paradigms in interpreting expert system facts and rules as PMML models (and vice versa), as well as challenges in representing and composing these models, have been explored. They will be discussed in detail.


2017 ◽  
Vol 4 (suppl_1) ◽  
pp. S403-S404
Author(s):  
Maggie Makar ◽  
Jeeheh Oh ◽  
Christopher Fusco ◽  
Joseph Marchesani ◽  
Robert McCaffrey ◽  
...  

Abstract Background An estimated 293,300 healthcare-associated cases of Clostridium difficile infection (CDI) occur annually in the United States. Prior research on risk-prediction models for CDI have focused on a small number of risk factors with the goal of developing a model that works well across hospitals. We hypothesize that risk factors are, in part, hospital-specific. We applied a generalizable machine learning approach to discovering, or “learning”, hospital-specific risk-stratification models using electronic health record (EHR) data collected during the course of patient care from the Massachusetts General Hospital (MGH) and the University of Michigan Health System (UM). Methods We utilized EHR data from 115,958 adult inpatient admissions from 2012–2014 (MGH) and 258,050 adult inpatient admissions from 2010–2016 (UM) (Fig 1). We extracted patient demographics, admission details, patient history, and daily hospitalization details, resulting in 2,964 and 4,739 features in the MGH and UM models, respectively. We used L2 regularized logistic regression to learn the models and measured the discriminative performance of the models on a year of held-out data from each hospital. Results The MGH and UM models achieved AUROCs of 0.74 (CI: 0.73–0.75) and 0.77 (CI: 0.75–0.80), respectively. The relative importance of risk factors varied significantly across hospitals. In particular, in-hospital locations appeared in the set of top risk factors at one hospital and in the set of protective factors at the other. On average, both models were able to predict CDI five days in advance of clinical diagnosis (Fig 2). Conclusion We used EHR data to generate a daily estimate of the risk of CDI for each inpatient hospitalization. We applied a generalizable data-driven approach to existing data from two large institutions with different patient populations and different data formats and content. In contrast to approaches that focus on learning models that apply generally across hospitals, our proposed approach yields risk stratification models tailored to an institution’s EHR system and patient population. In turn, these hospital-specific models could allow for earlier and more accurate identification of high-risk patients. Disclosures All authors: No reported disclosures.


2014 ◽  
Vol 635-637 ◽  
pp. 1618-1623
Author(s):  
Yue Dan Wang ◽  
Chun Xiang Li

With the rapid development of information science and technology, data-driven approaches are already being the research tide in many fields. BP neural network (BPNN), support vector machine (SVM) and least squares support vector machine (LS-SVM) are introduced and adopted to simulate fluctuating time-series wind speeds in this paper. The regression-prediction models developed by implementing machine interpolation learning are established respectively. And the original speeds used as learning and forecast samples for the simulation of the data-driven approaches are obtained through AR numerical modeling. Based on the comparison of evaluation index, the results show that the simulated fluctuating wind speeds through SVM and LS-SVM are more accurate than the simulated speeds through BPNN, but the simulation time of LS-SVM and BPNN are shorter than the SVM.


2020 ◽  
Author(s):  
Murat Sorkun ◽  
J. M. Koelman ◽  
Süleyman Er

Abstract Accurate prediction of the solubility of chemical substances in solvents remains a challenge. The sparsity of high-quality solubility data is recognized as the biggest hurdle in the development of robust data-driven methods for practical use. Nonetheless, the effects of the quality and quantity of data on aqueous solubility predictions have not yet been scrutinized. In this study, the roles of the size and the quality of datasets on the performances of the solubility prediction models are unraveled, and the concepts of actual and observed performances are introduced. In an effort to curtail the gap between actual and observed performances, a quality-oriented data selection method, which evaluates the quality of data and extracts the most accurate part of it through statistical validation, is designed. Applying this method on the largest publicly available solubility database and using a consensus machine learning approach, a top-performing solubility prediction model is achieved.


Sign in / Sign up

Export Citation Format

Share Document