Machine learning augmented predictive and generative model for rupture life in ferritic and austenitic steels

Osman Mamun; Madison Wenzlick; Arun Sathanur; Jeffrey Hawk; Ram Devanathan

doi:10.1038/s41529-021-00166-5

Machine learning augmented predictive and generative model for rupture life in ferritic and austenitic steels

npj Materials Degradation ◽

10.1038/s41529-021-00166-5 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Osman Mamun ◽

Madison Wenzlick ◽

Arun Sathanur ◽

Jeffrey Hawk ◽

Ram Devanathan

Keyword(s):

Pearson Correlation ◽

Rupture Life ◽

Model Performance ◽

Austenitic Stainless Steels ◽

Generative Model ◽

Austenitic Steels ◽

Gradient Boosting ◽

Variational Autoencoder ◽

Feature Importance ◽

Boosting Algorithm

AbstractThe Larson–Miller parameter (LMP) offers an efficient and fast scheme to estimate the creep rupture life of alloy materials for high-temperature applications; however, poor generalizability and dependence on the constant C often result in sub-optimal performance. In this work, we show that the direct rupture life parameterization without intermediate LMP parameterization, using a gradient boosting algorithm, can be used to train ML models for very accurate prediction of rupture life in a variety of alloys (Pearson correlation coefficient >0.9 for 9–12% Cr and >0.8 for austenitic stainless steels). In addition, the Shapley value was used to quantify feature importance, making the model interpretable by identifying the effect of various features on the model performance. Finally, a variational autoencoder-based generative model was built by conditioning on the experimental dataset to sample hypothetical synthetic candidate alloys from the learnt joint distribution not existing in both 9–12% Cr ferritic–martensitic alloys and austenitic stainless steel datasets.

Get full-text (via PubEx)

Uncertainty Quantification and Bayesian Active Learning for Rupture Life Prediction in Ferritic-Martensitic Steels

10.21203/rs.3.rs-887257/v1 ◽

2021 ◽

Author(s):

Osman Mamun ◽

M.F.N. Taufique ◽

Madison Wenzlick ◽

Jeffrey Hawk ◽

Ram Devanathan

Keyword(s):

Active Learning ◽

Epistemic Uncertainty ◽

Rupture Life ◽

Model Performance ◽

Gaussian Process Regression ◽

Processing Parameters ◽

Gradient Boosting ◽

Martensitic Steels ◽

Test Set ◽

Probabilistic Machine Learning

Abstract Three probabilistic methodologies are developed for predicting the long-term creep rupture life of 9−12 𝑤𝑡% 𝐶𝑟 ferritic-martensitic steels using their chemical and processing parameters. The framework developed in this research strives to simultaneously make efficient inference along with associated risk, i.e., the uncertainty of estimation. The study highlights the limitations of applying probabilistic machine learning to model creep life and provides suggestions as to how this might be alleviated to make an efficient and accurate model with the evaluation of epistemic uncertainty of each prediction. Based on extensive experimentation, Gaussian Process Regression yielded more accurate inference (𝑃𝑒𝑎𝑟𝑠𝑜𝑛 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑒𝑛𝑡> 0.95 for the holdout test set) in addition to meaningful uncertainty estimate (i.e., coverage ranges from 94 – 98% for the test set) as compared to quantile regression and natural gradient boosting algorithm. Furthermore, the possibility of an active learning framework to iteratively explore the material space intelligently was demonstrated by simulating the experimental data collection process. This framework can be subsequently deployed to improve model performance or to explore new alloy domains with minimal experimental effort.

Get full-text (via PubEx)

Mortality Prediction Using SaO2/FiO2 Ratio Based on eICU Database Analysis

Critical Care Research and Practice ◽

10.1155/2021/6672603 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Sharad Patel ◽

Gurkeerat Singh ◽

Samson Zarbiv ◽

Kia Ghiassi ◽

Jean-Sebastien Rachoin

Keyword(s):

Prediction Models ◽

Model Performance ◽

Predictive Ability ◽

Mortality Prediction ◽

Gradient Boosting ◽

Admission Diagnosis ◽

Research Database ◽

Icu Mortality ◽

Feature Importance ◽

Partial Dependence

Purpose. PaO2 to FiO2 ratio (P/F) is used to assess the degree of hypoxemia adjusted for oxygen requirements. The Berlin definition of Acute Respiratory Distress Syndrome (ARDS) includes P/F as a diagnostic criterion. P/F is invasive and cost-prohibitive for resource-limited settings. SaO2/FiO2 (S/F) ratio has the advantages of being easy to calculate, noninvasive, continuous, cost-effective, and reliable, as well as lower infection exposure potential for staff, and avoids iatrogenic anemia. Previous work suggests that the SaO2/FiO2 ratio (S/F) correlates with P/F and can be used as a surrogate in ARDS. Quantitative correlation between S/F and P/F has been verified, but the data for the relative predictive ability for ICU mortality remains in question. We hypothesize that S/F is noninferior to P/F as a predictive feature for ICU mortality. Using a machine-learning approach, we hope to demonstrate the relative mortality predictive capacities of S/F and P/F. Methods. We extracted data from the eICU Collaborative Research Database. The features age, gender, SaO2, PaO2, FIO2, admission diagnosis, Apache IV, mechanical ventilation (MV), and ICU mortality were extracted. Mortality was the dependent variable for our prediction models. Exploratory data analysis was performed in Python. Missing data was imputed with Sklearn Iterative Imputer. Random assignment of all the encounters, 80% to the training (n = 26690) and 20% to testing (n = 6741), was stratified by positive and negative classes to ensure a balanced distribution. We scaled the data using the Sklearn Standard Scaler. Categorical values were encoded using Target Encoding. We used a gradient boosting decision tree algorithm variant called XGBoost as our model. Model hyperparameters were tuned using the Sklearn RandomizedSearchCV with tenfold cross-validation. We used AUC as our metric for model performance. Feature importance was assessed using SHAP, ELI5 (permutation importance), and a built-in XGBoost feature importance method. We constructed partial dependence plots to illustrate the relationship between mortality probability and S/F values. Results. The XGBoost hyperparameter optimized model had an AUC score of .85 on the test set. The hyperparameters selected to train the final models were as follows: colsample_bytree of 0.8, gamma of 1, max_depth of 3, subsample of 1, min_child_weight of 10, and scale_pos_weight of 3. The SHAP, ELI5, and XGBoost feature importance analysis demonstrates that the S/F ratio ranks as the strongest predictor for mortality amongst the physiologic variables. The partial dependence plots illustrate that mortality rises significantly above S/F values of 200. Conclusion. S/F was a stronger predictor of mortality than P/F based upon feature importance evaluation of our data. Our study is hypothesis-generating and a prospective evaluation is warranted. Take-Home Points. S/F ratio is a noninvasive continuous method of measuring hypoxemia as compared to P/F ratio. Our study shows that the S/F ratio is a better predictor of mortality than the more widely used P/F ratio to monitor and manage hypoxemia.

Get full-text (via PubEx)

A Study of Nitrogen Effect on the Characteristics of Creep-Rupture in 18Cr-9Ni Austenitic Steels

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.297-300.409 ◽

2005 ◽

Vol 297-300 ◽

pp. 409-414 ◽

Cited By ~ 3

Author(s):

Jae Kyoung Shin ◽

Soo Woo Nam ◽

Soo Chan Lee

Keyword(s):

Grain Boundary ◽

Stainless Steels ◽

Rupture Life ◽

Austenitic Stainless Steels ◽

Creep Rupture ◽

Austenitic Steels ◽

Nitrogen Effect ◽

Transmission Electron ◽

The Difference ◽

Nitrogen Contents

To understand the effects of nitrogen on high temperature, creep-rupture tests have been conducted at 973 and 1073K for 18Cr-9Ni austenitic stainless steels with 0.14 and 0.08wt% nitrogen contents. It is observed that creep-rupture life of 18Cr-9Ni-0.14N steel is longer than that of 18Cr-9Ni-0.08N steel. To verify the difference in creep-rupture life between two alloys, scanning electron microscope and transmission electron microscopy are used to observe the microstructure. From the observations, it is known that the Cr-rich carbides are precipitated mainly at the grain boundary. Comparing the ratio of the linear density of the precipitate particles, the higher nitrogen content is, the less carbide is precipitated. Nitrogen might retard the formation of carbides at the grain boundary and reduce the density of cavity sites which are one of the main grain boundary damages.

Get full-text (via PubEx)

Complex-Valued Variational Autoencoder: A Novel Deep Generative Model for Direct Representation of Complex Spectra

10.21437/interspeech.2020-1964 ◽

2020 ◽

Author(s):

Toru Nakashika

Keyword(s):

Generative Model ◽

Variational Autoencoder ◽

Complex Valued

Get full-text (via PubEx)

ICD10Net: An Artificial Intelligence Algorithm with Medical Background Conducts ICD-10-CM Coding Task with Outstanding Performance (Preprint)

10.2196/preprints.13677 ◽

2019 ◽

Author(s):

Chin Lin ◽

Yu-Sheng Lou ◽

Chia-Cheng Lee ◽

Chia-Jung Hsu ◽

Ding-Chung Wu ◽

...

Keyword(s):

Artificial Intelligence ◽

General Hospital ◽

Pearson Correlation ◽

Model Performance ◽

International Classification Of Diseases ◽

Free Text ◽

Daily Work ◽

Medical Background ◽

Icd 10 ◽

F Measure

BACKGROUND An artificial intelligence-based algorithm has shown a powerful ability for coding the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) in discharge notes. However, its performance still requires improvement compared with human experts. The major disadvantage of the previous algorithm is its lack of understanding medical terminologies. OBJECTIVE We propose some methods based on human-learning process and conduct a series of experiments to validate their improvements. METHODS We compared two data sources for training the word-embedding model: English Wikipedia and PubMed journal abstracts. Moreover, the fixed, changeable, and double-channel embedding tables were used to test their performance. Some additional tricks were also applied to improve accuracy. We used these methods to identify the three-chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. Subsequently, 94,483-labeled discharge notes from June 1, 2015 to June 30, 2017 were used from the Tri-Service General Hospital in Taipei, Taiwan. To evaluate performance, 24,762 discharge notes from July 1, 2017 to December 31, 2017, from the same hospital were used. Moreover, 74,324 additional discharge notes collected from other seven hospitals were also tested. The F-measure is the major global measure of effectiveness. RESULTS In understanding medical terminologies, the PubMed-embedding model (Pearson correlation = 0.60/0.57) shows a better performance compared with the Wikipedia-embedding model (Pearson correlation = 0.35/0.31). In the accuracy of ICD-10-CM coding, the changeable model both used the PubMed- and Wikipedia-embedding model has the highest testing mean F-measure (0.7311 and 0.6639 in Tri-Service General Hospital and other seven hospitals, respectively). Moreover, a proposed method called a hybrid sampling method, an augmentation trick to avoid algorithms identifying negative terms, was found to additionally improve the model performance. CONCLUSIONS The proposed model architecture and training method is named as ICD10Net, which is the first expert level model practically applied to daily work. This model can also be applied in unstructured information extraction from free-text medical writing. We have developed a web app to demonstrate our work (https://linchin.ndmctsgh.edu.tw/app/ICD10/).

Get full-text (via PubEx)

Patient-Specific Predictive Antibiogram in Decision Support for Empiric Antibiotic Treatment

Infection Control and Hospital Epidemiology ◽

10.1017/ice.2020.1205 ◽

2020 ◽

Vol 41 (S1) ◽

pp. s521-s522

Author(s):

Debarka Sengupta ◽

Vaibhav Singh ◽

Seema Singh ◽

Dinesh Tewari ◽

Mudit Kapoor ◽

...

Keyword(s):

Machine Learning ◽

Antimicrobial Resistance ◽

Model Building ◽

Medical Center ◽

Bacterial Species ◽

Model Performance ◽

The United States ◽

Patient Specific ◽

Gradient Boosting ◽

Comparative Performance

Background: The rising trend of antibiotic resistance imposes a heavy burden on healthcare both clinically and economically (US$55 billion), with 23,000 estimated annual deaths in the United States as well as increased length of stay and morbidity. Machine-learning–based methods have, of late, been used for leveraging patient’s clinical history and demographic information to predict antimicrobial resistance. We developed a machine-learning model ensemble that maximizes the accuracy of such a drug-sensitivity versus resistivity classification system compared to the existing best-practice methods. Methods: We first performed a comprehensive analysis of the association between infecting bacterial species and patient factors, including patient demographics, comorbidities, and certain healthcare-specific features. We leveraged the predictable nature of these complex associations to infer patient-specific antibiotic sensitivities. Various base-learners, including k-NN (k-nearest neighbors) and gradient boosting machine (GBM), were used to train an ensemble model for confident prediction of antimicrobial susceptibilities. Base learner selection and model performance evaluation was performed carefully using a variety of standard metrics, namely accuracy, precision, recall, F1 score, and Cohen κ. Results: For validating the performance on MIMIC-III database harboring deidentified clinical data of 53,423 distinct patient admissions between 2001 and 2012, in the intensive care units (ICUs) of the Beth Israel Deaconess Medical Center in Boston, Massachusetts. From ~11,000 positive cultures, we used 4 major specimen types namely urine, sputum, blood, and pus swab for evaluation of the model performance. Figure 1 shows the receiver operating characteristic (ROC) curves obtained for bloodstream infection cases upon model building and prediction on 70:30 split of the data. We received area under the curve (AUC) values of 0.88, 0.92, 0.92, and 0.94 for urine, sputum, blood, and pus swab samples, respectively. Figure 2 shows the comparative performance of our proposed method as well as some off-the-shelf classification algorithms. Conclusions: Highly accurate, patient-specific predictive antibiogram (PSPA) data can aid clinicians significantly in antibiotic recommendation in ICU, thereby accelerating patient recovery and curbing antimicrobial resistance.Funding: This study was supported by Circle of Life Healthcare Pvt. Ltd.Disclosures: None

Get full-text (via PubEx)

An intelligent evolutionary extreme gradient boosting algorithm development for modeling scour depths under submerged weir

Information Sciences ◽

10.1016/j.ins.2021.04.063 ◽

2021 ◽

Author(s):

Hai Tao ◽

Maria Habib ◽

Ibrahim Aljarah ◽

Hossam Faris ◽

Haitham Abdulmohsin Afan ◽

...

Keyword(s):

Gradient Boosting ◽

Algorithm Development ◽

Extreme Gradient Boosting ◽

Boosting Algorithm

Get full-text (via PubEx)

Age Prediction of Human Based on DNA Methylation by Blood Tissues

Genes ◽

10.3390/genes12060870 ◽

2021 ◽

Vol 12 (6) ◽

pp. 870

Author(s):

Jiansheng Zhang ◽

Hongli Fu ◽

Yan Xu

Keyword(s):

Dna Methylation ◽

Pearson Correlation ◽

Close Correlation ◽

Gradient Boosting ◽

Tissue Samples ◽

Regression Methods ◽

Cpg Sites ◽

Testing Dataset ◽

Age Related ◽

Better Than

In recent years, scientists have found a close correlation between DNA methylation and aging in epigenetics. With the in-depth research in the field of DNA methylation, researchers have established a quantitative statistical relationship to predict the individual ages. This work used human blood tissue samples to study the association between age and DNA methylation. We built two predictors based on healthy and disease data, respectively. For the health data, we retrieved a total of 1191 samples from four previous reports. By calculating the Pearson correlation coefficient between age and DNA methylation values, 111 age-related CpG sites were selected. Gradient boosting regression was utilized to build the predictive model and obtained the R2 value of 0.86 and MAD of 3.90 years on testing dataset, which were better than other four regression methods as well as Horvath’s results. For the disease data, 354 rheumatoid arthritis samples were retrieved from a previous study. Then, 45 CpG sites were selected to build the predictor and the corresponded MAD and R2 were 3.11 years and 0.89 on the testing dataset respectively, which showed the robustness of our predictor. Our results were better than the ones from other four regression methods. Finally, we also analyzed the twenty-four common CpG sites in both healthy and disease datasets which illustrated the functional relevance of the selected CpG sites.

Get full-text (via PubEx)

Deep Learning for Monitoring Agricultural Drought in South Asia Using Remote Sensing Data

Remote Sensing ◽

10.3390/rs13091715 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1715

Author(s):

Foyez Ahmed Prodhan ◽

Jiahua Zhang ◽

Fengmei Yao ◽

Lamei Shi ◽

Til Prasad Pangali Sharma ◽

...

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

South Asia ◽

Model Performance ◽

Remote Sensing Data ◽

Agricultural Drought ◽

Gradient Boosting ◽

Complex Nature ◽

Sensing Data

Drought, a climate-related disaster impacting a variety of sectors, poses challenges for millions of people in South Asia. Accurate and complete drought information with a proper monitoring system is very important in revealing the complex nature of drought and its associated factors. In this regard, deep learning is a very promising approach for delineating the non-linear characteristics of drought factors. Therefore, this study aims to monitor drought by employing a deep learning approach with remote sensing data over South Asia from 2001–2016. We considered the precipitation, vegetation, and soil factors for the deep forwarded neural network (DFNN) as model input parameters. The study evaluated agricultural drought using the soil moisture deficit index (SMDI) as a response variable during three crop phenology stages. For a better comparison of deep learning model performance, we adopted two machine learning models, distributed random forest (DRF) and gradient boosting machine (GBM). Results show that the DFNN model outperformed the other two models for SMDI prediction. Furthermore, the results indicated that DFNN captured the drought pattern with high spatial variability across three penology stages. Additionally, the DFNN model showed good stability with its cross-validated data in the training phase, and the estimated SMDI had high correlation coefficient R2 ranges from 0.57~0.90, 0.52~0.94, and 0.49~0.82 during the start of the season (SOS), length of the season (LOS), and end of the season (EOS) respectively. The comparison between inter-annual variability of estimated SMDI and in-situ SPEI (standardized precipitation evapotranspiration index) showed that the estimated SMDI was almost similar to in-situ SPEI. The DFNN model provides comprehensive drought information by producing a consistent spatial distribution of SMDI which establishes the applicability of the DFNN model for drought monitoring.

Get full-text (via PubEx)

Gradient boosting for linear mixed models

The International Journal of Biostatistics ◽

10.1515/ijb-2020-0136 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Colin Griesbach ◽

Benjamin Säfken ◽

Elisabeth Waldmann

Keyword(s):

Random Effects ◽

Mixed Models ◽

Selection Procedure ◽

Classification Theory ◽

Gradient Boosting ◽

Random Structure ◽

Boosting Algorithm ◽

The One ◽

Biased Estimates ◽

Selection Of

Abstract Gradient boosting from the field of statistical learning is widely known as a powerful framework for estimation and selection of predictor effects in various regression models by adapting concepts from classification theory. Current boosting approaches also offer methods accounting for random effects and thus enable prediction of mixed models for longitudinal and clustered data. However, these approaches include several flaws resulting in unbalanced effect selection with falsely induced shrinkage and a low convergence rate on the one hand and biased estimates of the random effects on the other hand. We therefore propose a new boosting algorithm which explicitly accounts for the random structure by excluding it from the selection procedure, properly correcting the random effects estimates and in addition providing likelihood-based estimation of the random effects variance structure. The new algorithm offers an organic and unbiased fitting approach, which is shown via simulations and data examples.

Get full-text (via PubEx)