Spatial Variability of Snow Density and Its Estimation in Different Periods of Snow Season in the Middle Tianshan Mountains, China

Snow density is one of the essential properties to describe snowpack characteristics. To obtain the spatial variability of snow density and estimate it accurately in different periods of snow season still remain as challenges, particularly in the mountains. This study analyzed the spatial variability of snow density with in-situ measurements in three different periods (i.e. accumulation, stable, melt period) of snow seasons 2017/2018 and 2018/2019 in the middle Tianshan Mountains, China. The performance of multiple linear regression model (MLR) and three machine learning models (i.e. Random Forest (RF), Extreme Gradient Boosting (XGB) and Light Gradient Boosting Machine (LGBM)) to simulate snow density has been evaluated. It was found that the snow density in melt period (0.27 g cm-3) was generally greater than that in stable (0.20 g cm-3) and accumulation period (0.18 g cm-3), and the spatial variability of snow density in melt period was slightly smaller than that in other two periods. The snow density in mountainous areas was generally higher than that in plain or valley areas, and snow density increased significantly (p < 0.05) with elevation in the accumulation and stable periods. Besides elevation, latitude and ground surface temperature also had critical impacts on the spatial variability of snow density in the middle Tianshan Mountains, China. In this work, the machine learning model, especially RF model, performed better than MLR on snow density simulation in three periods. Compared with MLR, the determination coefficients of RF promoted to 0.61, 0.51 and 0.58 from 0.50, 0.1 and 0.52 in accumulation period, stable period and melt period respectively. This study provide a more accurate snow density simulation method for estimating regional snow mass and snow water equivalent, which allows us to achieve a better understanding of regional snow resources.

Download Full-text

Development and validation of a difficult laryngoscopy prediction model using machine learning of neck circumference and thyromental height

BMC Anesthesiology ◽

10.1186/s12871-021-01343-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jong Ho Kim ◽

Haewon Kim ◽

Ji Su Jang ◽

Sung Mi Hwang ◽

So Young Lim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Confidence Interval ◽

Neck Circumference ◽

Difficult Laryngoscopy ◽

Gradient Boosting ◽

Test Set ◽

Equal Distribution ◽

Light Gradient ◽

Extreme Gradient Boosting

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.

Download Full-text

Interpretable Machine Learning for Early Neurological Deterioration Prediction in Atrial Fibrillation-Related Stroke

10.21203/rs.3.rs-446890/v1 ◽

2021 ◽

Author(s):

Seong Hwan Kim ◽

Eun-Tae Jeon ◽

Sungwook Yu ◽

Kyungmi O ◽

Chi Kyung Kim ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Neurological Deterioration ◽

Gradient Boosting ◽

Support Vector ◽

Light Gradient ◽

Interpretable Machine Learning ◽

Extreme Gradient Boosting ◽

Early Neurological Deterioration ◽

Feature Importance

Abstract We aimed to develop a novel prediction model for early neurological deterioration (END) based on an interpretable machine learning (ML) algorithm for atrial fibrillation (AF)-related stroke and to evaluate the prediction accuracy and feature importance of ML models. Data from multi-center prospective stroke registries in South Korea were collected. After stepwise data preprocessing, we utilized logistic regression, support vector machine, extreme gradient boosting, light gradient boosting machine (LightGBM), and multilayer perceptron models. We used the Shapley additive explanations (SHAP) method to evaluate feature importance. Of the 3,623 stroke patients, the 2,363 who had arrived at the hospital within 24 hours of symptom onset and had available information regarding END were included. Of these, 318 (13.5%) had END. The LightGBM model showed the highest area under the receiver operating characteristic curve (0.778, 95% CI, 0.726 - 0.830). The feature importance analysis revealed that fasting glucose level and the National Institute of Health Stroke Scale score were the most influential factors. Among ML algorithms, the LightGBM model was particularly useful for predicting END, as it revealed new and diverse predictors. Additionally, the SHAP method can be adjusted to individualize the features’ effects on the predictive power of the model.

Download Full-text

Modeling of nitrogen solubility in normal alkanes using machine learning methods compared with cubic and PC-SAFT equations of state

Scientific Reports ◽

10.1038/s41598-021-03643-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Seyed Ali Madani ◽

Mohammad-Reza Mohammadi ◽

Saeid Atashrouz ◽

Ali Abedi ◽

Abdolhossein Hemmati-Sarapardeh ◽

...

Keyword(s):

Machine Learning ◽

Molecular Weight ◽

Oil Recovery ◽

Equations Of State ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Operating Pressure ◽

Normal Alkanes ◽

Light Gradient ◽

Extreme Gradient Boosting

AbstractAccurate prediction of the solubility of gases in hydrocarbons is a crucial factor in designing enhanced oil recovery (EOR) operations by gas injection as well as separation, and chemical reaction processes in a petroleum refinery. In this work, nitrogen (N2) solubility in normal alkanes as the major constituents of crude oil was modeled using five representative machine learning (ML) models namely gradient boosting with categorical features support (CatBoost), random forest, light gradient boosting machine (LightGBM), k-nearest neighbors (k-NN), and extreme gradient boosting (XGBoost). A large solubility databank containing 1982 data points was utilized to establish the models for predicting N2 solubility in normal alkanes as a function of pressure, temperature, and molecular weight of normal alkanes over broad ranges of operating pressure (0.0212–69.12 MPa) and temperature (91–703 K). The molecular weight range of normal alkanes was from 16 to 507 g/mol. Also, five equations of state (EOSs) including Redlich–Kwong (RK), Soave–Redlich–Kwong (SRK), Zudkevitch–Joffe (ZJ), Peng–Robinson (PR), and perturbed-chain statistical associating fluid theory (PC-SAFT) were used comparatively with the ML models to estimate N2 solubility in normal alkanes. Results revealed that the CatBoost model is the most precise model in this work with a root mean square error of 0.0147 and coefficient of determination of 0.9943. ZJ EOS also provided the best estimates for the N2 solubility in normal alkanes among the EOSs. Lastly, the results of relevancy factor analysis indicated that pressure has the greatest influence on N2 solubility in normal alkanes and the N2 solubility increases with increasing the molecular weight of normal alkanes.

Download Full-text

Protein pKa prediction by tree-based machine learning

10.26434/chemrxiv-2021-4d420 ◽

2021 ◽

Author(s):

Ada Y. Chen ◽

Juyong Lee ◽

Ana Damjanovic ◽

Bernard R. Brooks

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Pka Prediction ◽

Light Gradient ◽

Structure Database ◽

Gradient Boosting Machine ◽

Extreme Gradient Boosting ◽

Better Than ◽

Protein Pka

We present four tree-based machine learning models for protein pKa prediction. The four models, Random Forest, Extra Trees, eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM), were trained on three experimental PDB and pKa datasets, two of which included a notable portion of internal residues. We observed similar performance among the four machine learning algorithms. The best model trained on the largest dataset performs 37% better than the widely used empirical pKa prediction tool PROPKA. The overall RMSE for this model is 0.69, with surface and buried RMSE values being 0.56 and 0.78, respectively, considering six residue types (Asp, Glu, His, Lys, Cys and Tyr), and 0.63 when considering Asp, Glu, His and Lys only. We provide pKa predictions for proteins in human proteome from the AlphaFold Protein Structure Database and observed that 1% of Asp/Glu/Lys residues have highly shifted pKa values close to the physiological pH.

Download Full-text

Application of Machine-Learning-Based Fusion Model in Visibility Forecast: A Case Study of Shanghai, China

Remote Sensing ◽

10.3390/rs13112096 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2096

Author(s):

Zhongqi Yu ◽

Yuanhao Qu ◽

Yunxin Wang ◽

Jinghui Ma ◽

Yu Cao

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Eastern China ◽

Prediction Method ◽

Sampling Technique ◽

Environmental Modeling ◽

Gradient Boosting ◽

Fusion Model ◽

Light Gradient ◽

Extreme Gradient Boosting

A visibility forecast model called a boosting-based fusion model (BFM) was established in this study. The model uses a fusion machine learning model based on multisource data, including air pollutants, meteorological observations, moderate resolution imaging spectroradiometer (MODIS) aerosol optical depth (AOD) data, and an operational regional atmospheric environmental modeling System for eastern China (RAEMS) outputs. Extreme gradient boosting (XGBoost), a light gradient boosting machine (LightGBM), and a numerical prediction method, i.e., RAEMS were fused to establish this prediction model. Three sets of prediction models, that is, BFM, LightGBM based on multisource data (LGBM), and RAEMS, were used to conduct visibility prediction tasks. The training set was from 1 January 2015 to 31 December 2018 and used several data pre-processing methods, including a synthetic minority over-sampling technique (SMOTE) data resampling, a loss function adjustment, and a 10-fold cross verification. Moreover, apart from the basic features (variables), more spatial and temporal gradient features were considered. The testing set was from 1 January to 31 December 2019 and was adopted to validate the feasibility of the BFM, LGBM, and RAEMS. Statistical indicators confirmed that the machine learning methods improved the RAEMS forecast significantly and consistently. The root mean square error and correlation coefficient of BFM for the next 24/48 h were 5.01/5.47 km and 0.80/0.77, respectively, which were much higher than those of RAEMS. The statistics and binary score analysis for different areas in Shanghai also proved the reliability and accuracy of using BFM, particularly in low-visibility forecasting. Overall, BFM is a suitable tool for predicting the visibility. It provides a more accurate visibility forecast for the next 24 and 48 h in Shanghai than LGBM and RAEMS. The results of this study provide support for real-time operational visibility forecasts.

Download Full-text

Buckling and ultimate load prediction models for perforated steel beams using machine learning algorithms

10.31224/osf.io/mezar ◽

2021 ◽

Author(s):

Vitaliy Degtyarev ◽

Konstantinos Daniel Tsavdaridis

Keyword(s):

Machine Learning ◽

Web Application ◽

Failure Modes ◽

Ultimate Load ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Elastic Buckling ◽

Light Gradient ◽

Extreme Gradient Boosting

Large web openings introduce complex structural behaviors and additional failure modes of steel cellular beams, which must be considered in the design using laborious calculations (e.g., exercising SCI P355). This paper presents seven machine learning (ML) models, including decision tree (DT), random forest (RF), k-nearest neighbor (KNN), gradient boosting regressor (GBR), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and gradient boosting with categorical features support (CatBoost), for predicting the elastic buckling and ultimate loads of steel cellular beams. Large datasets of finite element (FE) simulation results, validated against experimental data, were used to develop the models. The ML models were fine-tuned via an extensive hyperparameter search to obtain their best performance. The elastic buckling and ultimate loads predicted by the optimized ML models demonstrated excellent agreement with the numerical data. The accuracy of the ultimate load predictions by the ML models exceeded the accuracy provided by the existing design provisions for steel cellular beams published in SCI P355 and AISC Design Guide 31. The relative feature importance and feature dependence of the models were evaluated and discussed in the paper. An interactive Python-based notebook and a user-friendly web application for predicting the elastic buckling and ultimate loads of steel cellular beams using the developed optimized ML models were created and made publicly available. The web application deployed to the cloud allows for making predictions in any web browser on any device, including mobile. The source code of the application available on GitHub allows running the application locally and independently from the cloud service.

Download Full-text

Artificial Intelligence for Risk Prediction of Rehospitalization with Acute Kidney Injury in Sepsis Survivors

Journal of Personalized Medicine ◽

10.3390/jpm12010043 ◽

2022 ◽

Vol 12 (1) ◽

pp. 43

Author(s):

Shuo-Ming Ou ◽

Kuo-Hua Lee ◽

Ming-Tsun Tsai ◽

Wei-Cheng Tseng ◽

Yuan-Chia Chu ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Acute Kidney Injury ◽

Kidney Injury ◽

Gradient Boosting ◽

Light Gradient ◽

Cell Counts ◽

Tree Classifier ◽

Extreme Gradient Boosting ◽

Sepsis Survivors

Sepsis survivors have a higher risk of long-term complications. Acute kidney injury (AKI) may still be common among sepsis survivors after discharge from sepsis. Therefore, our study utilized an artificial-intelligence-based machine learning approach to predict future risks of rehospitalization with AKI between 1 January 2008 and 31 December 2018. We included a total of 23,761 patients aged ≥ 20 years who were admitted due to sepsis and survived to discharge. We adopted a machine learning method by using models based on logistic regression, random forest, extra tree classifier, gradient boosting decision tree (GBDT), extreme gradient boosting, and light gradient boosting machine (LGBM). The LGBM model exhibited the highest area under the receiver operating characteristic curves (AUCs) of 0.816 to predict rehospitalization with AKI in sepsis survivors and followed by the GBDT model with AUCs of 0.813. The top five most important features in the LGBM model were C-reactive protein, white blood cell counts, use of inotropes, blood urea nitrogen and use of diuretics. We established machine learning models for the prediction of the risk of rehospitalization with AKI in sepsis survivors, and the machine learning model may set the stage for the broader use of clinical features in healthcare.

Download Full-text

Interpretable Machine Learning Model to Predict Rupture of Small Intracranial Aneurysms and Facilitate Clinical Decision

10.21203/rs.3.rs-1015315/v1 ◽

2021 ◽

Author(s):

WeiGen Xiong ◽

TingTing Chen ◽

ZhiHong Zhao ◽

XueMei Li ◽

YaJie Shan ◽

...

Keyword(s):

Machine Learning ◽

Intracranial Aneurysms ◽

External Validation ◽

Maximum Size ◽

Clinical Decision ◽

Gradient Boosting ◽

Support Vector ◽

Rupture Risk ◽

Light Gradient ◽

Extreme Gradient Boosting

Abstract Estimating the rupture risk of small intracranial aneurysms (IAs) to determine whether to treat is difficult but crucial. We aimed to construct and external validation a convenient machine learning (ML) model for assessing the rupture risk of small IAs.1004 patients with small IAs recruited from two hospitals were included in our retrospective research. The patients at hospital 1 were stratified into training (70%) and internal validation set (30%) randomly, and the patients at hospital 2 were used for external validation. We selected predictive features using the least absolute shrinkage and selection operator (LASSO) method, and constructed five ML models applying diverse algorithms including random forest classifier (RFC), categorical boosting (CatBoost), support vector machine (SVM) with linear kernel, light gradient boosting machine (LightGBM) and extreme gradient boosting (XGBoost). The Shapley Additive Explanations (SHAP) analysis provided interpretation for the best ML model.The training, internal and external validation cohorts included 658, 282, and 64 IAs, respectively. The best performance was presented by SVM as AUC of 0.817 in the internal [95% confidence interval (CI), 0.769-0.866] and 0.893 in the external (95% CI, 0.808-0.979) validation cohorts, overperformed than the PHASES score significantly (all P < 0.001). SHAP analysis showed maximum size, location and irregular shape were the top three important features to predict rupture. Our SVM model based on readily accessible features presented satisfying ability of discrimination in predicting the rupture IAs with small size. Morphological parameters made important contributions to prediction result.

Download Full-text

Prediction of Radiation Pneumonitis With Machine Learning in Stage III Lung Cancer: A Pilot Study

Technology in Cancer Research & Treatment ◽

10.1177/15330338211016373 ◽

2021 ◽

Vol 20 ◽

pp. 153303382110163

Author(s):

Melek Yakar ◽

Durmus Etiz ◽

Muzaffer Metintas ◽

Guntulu Ak ◽

Ozer Celik

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Radiation Pneumonitis ◽

Stage Iii ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Volume Number ◽

Light Gradient ◽

Extreme Gradient Boosting

Background: Radiation pneumonitis (RP) is a dose-limiting toxicity in lung cancer radiotherapy (RT). As risk factors in the development of RP, patient and tumor characteristics, dosimetric parameters, and treatment features are intertwined, and it is not always possible to associate RP with a single parameter. This study aimed to determine the algorithm that most accurately predicted RP development with machine learning. Methods: Of the 197 cases diagnosed with stage III lung cancer and underwent RT and chemotherapy between 2014 and 2020, 193 were evaluated. The CTCAE 5.0 grading system was used for the RP evaluation. Synthetic minority oversampling technique was used to create a balanced data set. Logistic regression, artificial neural networks, eXtreme Gradient Boosting (XGB), Support Vector Machines, Random Forest, Gaussian Naive Bayes and Light Gradient Boosting Machine algorithms were used. After the correlation analysis, a permutation-based method was utilized for as a variable selection. Results: RP was seen in 51 of the 193 cases. Parameters affecting RP were determined as, total(t)V5, ipsilateral lung Dmax, contralateral lung Dmax, total lung Dmax, gross tumor volume, number of chemotherapy cycles before RT, tumor size, lymph node localization and asbestos exposure. LGBM was found to be the algorithm that best predicted RP at 85% accuracy (confidence interval: 0.73-0.96), 97% sensitivity, and 50% specificity. Conclusion: When the clinical and dosimetric parameters were evaluated together, the LGBM algorithm had the highest accuracy in predicting RP. However, in order to use this algorithm in clinical practice, it is necessary to increase data diversity and the number of patients by sharing data between centers.

Download Full-text

A comparative performance of machine learning algorithm to predict electric vehicles energy consumption: A path towards sustainability

Energy & Environment ◽

10.1177/0958305x211044998 ◽

2021 ◽

pp. 0958305X2110449

Author(s):

Irfan Ullah ◽

Kai Liu ◽

Toshiyuki Yamamoto ◽

Rabia Emhamed Al Mamlook ◽

Arshad Jamal

Keyword(s):

Machine Learning ◽

Energy Consumption ◽

Electric Vehicles ◽

Absolute Error ◽

Gradient Boosting ◽

Light Gradient ◽

Gradient Boosting Machine ◽

Extreme Gradient Boosting ◽

Energy Consumption Prediction ◽

Transport Emissions

The rapid growth of transportation sector and related emissions are attracting the attention of policymakers to ensure environmental sustainability. Therefore, the deriving factors of transport emissions are extremely important to comprehend. The role of electric vehicles is imperative amid rising transport emissions. Electric vehicles pave the way towards a low-carbon economy and sustainable environment. Successful deployment of electric vehicles relies heavily on energy consumption models that can predict energy consumption efficiently and reliably. Improving electric vehicles’ energy consumption efficiency will significantly help to alleviate driver anxiety and provide an essential framework for operation, planning, and management of the charging infrastructure. To tackle the challenge of electric vehicles’ energy consumption prediction, this study aims to employ advanced machine learning models, extreme gradient boosting, and light gradient boosting machine to compare with traditional machine learning models, multiple linear regression, and artificial neural network. Electric vehicles energy consumption data in the analysis were collected in Aichi Prefecture, Japan. To evaluate the performance of the prediction models, three evaluation metrics were used; coefficient of determination ( R2), root mean square error, and mean absolute error. The prediction outcome exhibits that the extreme gradient boosting and light gradient boosting machine provided better and robust results compared to multiple linear regression and artificial neural network. The models based on extreme gradient boosting and light gradient boosting machine yielded higher values of R2, lower mean absolute error, and root mean square error values have proven to be more accurate. However, the results demonstrated that the light gradient boosting machine is outperformed the extreme gradient boosting model. A detailed feature important analysis was carried out to demonstrate the impact and relative influence of different input variables on electric vehicles energy consumption prediction. The results imply that an advanced machine learning model can enhance the prediction performance of electric vehicles energy consumption.

Download Full-text