Machine Learning in Updating Predictive Models of Planning and Scheduling Transportation Projects

Author(s):  
Liye Zhang ◽  
W. M. Kim Roddis

A method combining machine learning and regression analysis to automatically and intelligently update predictive models used in the Kansas Department of Transportation’s (KDOT’s) internal management system is presented. The predictive models used by KDOT consist of planning factors (mathematical functions) and base quantities (constants). The duration of a functional unit (defined as a subactivity) is determined by the product of a planning factor and its base quantity. The availability of a large data base on projects executed over the past decade provided the opportunity to develop an automated process updating predictive models based on extracting information from historical data through machine learning. To perform the entire task of updating the predictive models, the learning process consists of three stages. The first stage derives the numerical relationship between the duration of a functional unit and the project attributes recorded in the data base. The second stage finds the functional units with similar behavior—that is, identifies functional units that can be described by the same shared planning factor scaled in terms of their own base quantities. The third stage generates new planning factors and base quantities. A system called PFactor built on the basis of the three-stage learning process shows good performance in updating KDOT’s predictive models.

Author(s):  
Rasoul Hejazi ◽  
Andrew Grime ◽  
Mark Randolph ◽  
Mike Efthymiou

Abstract In-service integrity management (IM) of steel lazy wave risers (SLWRs) can benefit significantly from quantitative assessment of the overall risk of system failure as it can provide an effective tool for decision making. SLWRs are prone to fatigue failure within their touchdown zone (TDZ). This failure mode needs to be evaluated rigorously in riser IM processes because fatigue is an ongoing degradation mechanism threatening the structural integrity of risers throughout their service life. However, accurately evaluating the probability of fatigue failure for riser systems within a useful time frame is challenging due to the need to run a large number of nonlinear, dynamic numerical time domain simulations. Applying the Bayesian framework for machine learning, through the use of Gaussian Processes (GP) for regression, offers an attractive solution to overcome the burden of prohibitive simulation run times. GPs are stochastic, data-driven predictive models which incorporate the underlying physics of the problem in the learning process, and facilitate rapid probabilistic assessments with limited loss in accuracy. This paper proposes an efficient framework for practical implementation of a GP to create predictive models for the estimation of fatigue responses at SLWR hotspots. Such models are able to perform stochastic response prediction within a few milliseconds, thus enabling rapid prediction of the probability of SLWR fatigue failure. A realistic North West Shelf (NWS) case study is used to demonstrate the framework, comprising a 20” SLWR connected to a representative floating facility located in 950 m water depth. A full hindcast metocean dataset with associated statistical distributions are used for the riser long-term fatigue loading conditions. Numerical simulation and sampling techniques are adopted to generate a simulation-based dataset for training the data-driven model. In addition, a recently developed dimensionality reduction technique is employed to improve efficiency and reduce complexity of the learning process. The results show that the stochastic predictive models developed by the suggested framework can predict the long-term TDZ fatigue damage of SLWRs due to vessel motions with an acceptable level of accuracy for practical purposes.


2022 ◽  
Author(s):  
Alexandre Perez-Lebel ◽  
Gaël Varoquaux ◽  
Marine Le Morvan ◽  
Julie Josse ◽  
Jean-Baptiste Poline

BACKGROUND As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for instance for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative --rather than generative-- modeling, and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics. RESULTS Here we conduct a systematic benchmark of missing-values strategies in predictive models with a focus on large health databases: four electronic health record datasets, a population brain imaging one, a health survey and two intensive care ones. Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning. We investigate prediction accuracy and computational time. For prediction after imputation, we find that adding an indicator to express which values have been imputed is important, suggesting that the data are missing not at random. Elaborate missing values imputation can improve prediction compared to simple strategies but requires longer computational time on large data. Learning trees that model missing values --with missing incorporated attribute-- leads to robust, fast, and well-performing predictive modeling. CONCLUSIONS Native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost. When using imputation, it is important to add indicator columns expressing which values have been imputed.


2015 ◽  
Vol 115 ◽  
pp. S685
Author(s):  
R. Autorino ◽  
M.A. Gambacorta ◽  
L. Tagliaferri ◽  
M. Campitelli ◽  
E. Meldolesi ◽  
...  

2020 ◽  
Vol 20 (10) ◽  
pp. 6610-6621
Author(s):  
Dingyan Wang ◽  
Zeen Yang ◽  
Bingqing Zhu ◽  
Xuefeng Mei ◽  
Xiaomin Luo

2021 ◽  
Author(s):  
Norberto Sánchez-Cruz ◽  
Jose L. Medina-Franco

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>


2019 ◽  
Vol 21 (9) ◽  
pp. 662-669 ◽  
Author(s):  
Junnan Zhao ◽  
Lu Zhu ◽  
Weineng Zhou ◽  
Lingfeng Yin ◽  
Yuchen Wang ◽  
...  

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.


2021 ◽  
Vol 188 ◽  
pp. 105264
Author(s):  
M. Pilar Romero ◽  
Yu-Mei Chang ◽  
Lucy A. Brunton ◽  
Alison Prosser ◽  
Paul Upton ◽  
...  

Author(s):  
Juan A. Gómez-Pulido ◽  
José M. Gómez-Pulido ◽  
Diego Rodríguez-Puyol ◽  
María-Luz Polo-Luque ◽  
Miguel Vargas-Lombardo

A patient suffering from advanced chronic renal disease undergoes several dialysis sessions on different dates. Several clinical parameters are monitored during the different hours of any of these sessions. These parameters, together with the information provided by other parameters of analytical nature, can be very useful to determine the probability that a patient may suffer from hypotension during the session, which should be specially watched since it represents a proven factor of possible mortality. However, the analytical information is not always available to the healthcare personnel, or it is far in time, so the clinical parameters monitored during the session become key to the prevention of hypotension. This article presents an investigation to predict the appearance of hypotension during a dialysis session, using predictive models trained from a large dialysis database, which contains the clinical information of 98,015 sessions corresponding to 758 patients. The prediction model takes into account up to 22 clinical parameters measured five times during the session, as well as the gender and age of the patient. This model was trained by means of machine learning classifiers, providing a success in the prediction higher than 80%.


1976 ◽  
Vol 10 (4) ◽  
pp. 23-23 ◽  
Author(s):  
Tomas Lang
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document