Application of Machine-Learning-Based Fusion Model in Visibility Forecast: A Case Study of Shanghai, China

A visibility forecast model called a boosting-based fusion model (BFM) was established in this study. The model uses a fusion machine learning model based on multisource data, including air pollutants, meteorological observations, moderate resolution imaging spectroradiometer (MODIS) aerosol optical depth (AOD) data, and an operational regional atmospheric environmental modeling System for eastern China (RAEMS) outputs. Extreme gradient boosting (XGBoost), a light gradient boosting machine (LightGBM), and a numerical prediction method, i.e., RAEMS were fused to establish this prediction model. Three sets of prediction models, that is, BFM, LightGBM based on multisource data (LGBM), and RAEMS, were used to conduct visibility prediction tasks. The training set was from 1 January 2015 to 31 December 2018 and used several data pre-processing methods, including a synthetic minority over-sampling technique (SMOTE) data resampling, a loss function adjustment, and a 10-fold cross verification. Moreover, apart from the basic features (variables), more spatial and temporal gradient features were considered. The testing set was from 1 January to 31 December 2019 and was adopted to validate the feasibility of the BFM, LGBM, and RAEMS. Statistical indicators confirmed that the machine learning methods improved the RAEMS forecast significantly and consistently. The root mean square error and correlation coefficient of BFM for the next 24/48 h were 5.01/5.47 km and 0.80/0.77, respectively, which were much higher than those of RAEMS. The statistics and binary score analysis for different areas in Shanghai also proved the reliability and accuracy of using BFM, particularly in low-visibility forecasting. Overall, BFM is a suitable tool for predicting the visibility. It provides a more accurate visibility forecast for the next 24 and 48 h in Shanghai than LGBM and RAEMS. The results of this study provide support for real-time operational visibility forecasts.

Download Full-text

Buckling and ultimate load prediction models for perforated steel beams using machine learning algorithms

10.31224/osf.io/mezar ◽

2021 ◽

Author(s):

Vitaliy Degtyarev ◽

Konstantinos Daniel Tsavdaridis

Keyword(s):

Machine Learning ◽

Web Application ◽

Failure Modes ◽

Ultimate Load ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Elastic Buckling ◽

Light Gradient ◽

Extreme Gradient Boosting

Large web openings introduce complex structural behaviors and additional failure modes of steel cellular beams, which must be considered in the design using laborious calculations (e.g., exercising SCI P355). This paper presents seven machine learning (ML) models, including decision tree (DT), random forest (RF), k-nearest neighbor (KNN), gradient boosting regressor (GBR), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and gradient boosting with categorical features support (CatBoost), for predicting the elastic buckling and ultimate loads of steel cellular beams. Large datasets of finite element (FE) simulation results, validated against experimental data, were used to develop the models. The ML models were fine-tuned via an extensive hyperparameter search to obtain their best performance. The elastic buckling and ultimate loads predicted by the optimized ML models demonstrated excellent agreement with the numerical data. The accuracy of the ultimate load predictions by the ML models exceeded the accuracy provided by the existing design provisions for steel cellular beams published in SCI P355 and AISC Design Guide 31. The relative feature importance and feature dependence of the models were evaluated and discussed in the paper. An interactive Python-based notebook and a user-friendly web application for predicting the elastic buckling and ultimate loads of steel cellular beams using the developed optimized ML models were created and made publicly available. The web application deployed to the cloud allows for making predictions in any web browser on any device, including mobile. The source code of the application available on GitHub allows running the application locally and independently from the cloud service.

Download Full-text

Development and validation of a difficult laryngoscopy prediction model using machine learning of neck circumference and thyromental height

BMC Anesthesiology ◽

10.1186/s12871-021-01343-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jong Ho Kim ◽

Haewon Kim ◽

Ji Su Jang ◽

Sung Mi Hwang ◽

So Young Lim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Confidence Interval ◽

Neck Circumference ◽

Difficult Laryngoscopy ◽

Gradient Boosting ◽

Test Set ◽

Equal Distribution ◽

Light Gradient ◽

Extreme Gradient Boosting

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.

Download Full-text

Establishing a Credit Risk Evaluation System for SMEs Using the Soft Voting Fusion Model

Risks ◽

10.3390/risks9110202 ◽

2021 ◽

Vol 9 (11) ◽

pp. 202

Author(s):

Ge Gao ◽

Hongxin Wang ◽

Pengbin Gao

Keyword(s):

Credit Risk ◽

Evaluation System ◽

Predictive Accuracy ◽

Assessment System ◽

Gradient Boosting ◽

Support Vector ◽

Fusion Model ◽

Light Gradient ◽

Extreme Gradient Boosting ◽

The Government

In China, SMEs are facing financing difficulties, and commercial banks and financial institutions are the main financing channels for SMEs. Thus, a reasonable and efficient credit risk assessment system is important for credit markets. Based on traditional statistical methods and AI technology, a soft voting fusion model, which incorporates logistic regression, support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), is constructed to improve the predictive accuracy of SMEs’ credit risk. To verify the feasibility and effectiveness of the proposed model, we use data from 123 SMEs nationwide that worked with a Chinese bank from 2016 to 2020, including financial information and default records. The results show that the accuracy of the soft voting fusion model is higher than that of a single machine learning (ML) algorithm, which provides a theoretical basis for the government to control credit risk in the future and offers important references for banks to make credit decisions.

Download Full-text

Interpretable Machine Learning for Early Neurological Deterioration Prediction in Atrial Fibrillation-Related Stroke

10.21203/rs.3.rs-446890/v1 ◽

2021 ◽

Author(s):

Seong Hwan Kim ◽

Eun-Tae Jeon ◽

Sungwook Yu ◽

Kyungmi O ◽

Chi Kyung Kim ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Neurological Deterioration ◽

Gradient Boosting ◽

Support Vector ◽

Light Gradient ◽

Interpretable Machine Learning ◽

Extreme Gradient Boosting ◽

Early Neurological Deterioration ◽

Feature Importance

Abstract We aimed to develop a novel prediction model for early neurological deterioration (END) based on an interpretable machine learning (ML) algorithm for atrial fibrillation (AF)-related stroke and to evaluate the prediction accuracy and feature importance of ML models. Data from multi-center prospective stroke registries in South Korea were collected. After stepwise data preprocessing, we utilized logistic regression, support vector machine, extreme gradient boosting, light gradient boosting machine (LightGBM), and multilayer perceptron models. We used the Shapley additive explanations (SHAP) method to evaluate feature importance. Of the 3,623 stroke patients, the 2,363 who had arrived at the hospital within 24 hours of symptom onset and had available information regarding END were included. Of these, 318 (13.5%) had END. The LightGBM model showed the highest area under the receiver operating characteristic curve (0.778, 95% CI, 0.726 - 0.830). The feature importance analysis revealed that fasting glucose level and the National Institute of Health Stroke Scale score were the most influential factors. Among ML algorithms, the LightGBM model was particularly useful for predicting END, as it revealed new and diverse predictors. Additionally, the SHAP method can be adjusted to individualize the features’ effects on the predictive power of the model.

Download Full-text

Modeling of nitrogen solubility in normal alkanes using machine learning methods compared with cubic and PC-SAFT equations of state

Scientific Reports ◽

10.1038/s41598-021-03643-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Seyed Ali Madani ◽

Mohammad-Reza Mohammadi ◽

Saeid Atashrouz ◽

Ali Abedi ◽

Abdolhossein Hemmati-Sarapardeh ◽

...

Keyword(s):

Machine Learning ◽

Molecular Weight ◽

Oil Recovery ◽

Equations Of State ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Operating Pressure ◽

Normal Alkanes ◽

Light Gradient ◽

Extreme Gradient Boosting

AbstractAccurate prediction of the solubility of gases in hydrocarbons is a crucial factor in designing enhanced oil recovery (EOR) operations by gas injection as well as separation, and chemical reaction processes in a petroleum refinery. In this work, nitrogen (N2) solubility in normal alkanes as the major constituents of crude oil was modeled using five representative machine learning (ML) models namely gradient boosting with categorical features support (CatBoost), random forest, light gradient boosting machine (LightGBM), k-nearest neighbors (k-NN), and extreme gradient boosting (XGBoost). A large solubility databank containing 1982 data points was utilized to establish the models for predicting N2 solubility in normal alkanes as a function of pressure, temperature, and molecular weight of normal alkanes over broad ranges of operating pressure (0.0212–69.12 MPa) and temperature (91–703 K). The molecular weight range of normal alkanes was from 16 to 507 g/mol. Also, five equations of state (EOSs) including Redlich–Kwong (RK), Soave–Redlich–Kwong (SRK), Zudkevitch–Joffe (ZJ), Peng–Robinson (PR), and perturbed-chain statistical associating fluid theory (PC-SAFT) were used comparatively with the ML models to estimate N2 solubility in normal alkanes. Results revealed that the CatBoost model is the most precise model in this work with a root mean square error of 0.0147 and coefficient of determination of 0.9943. ZJ EOS also provided the best estimates for the N2 solubility in normal alkanes among the EOSs. Lastly, the results of relevancy factor analysis indicated that pressure has the greatest influence on N2 solubility in normal alkanes and the N2 solubility increases with increasing the molecular weight of normal alkanes.

Download Full-text

Machine Learning-Based Three-Month Outcome Prediction in Acute Ischemic Stroke: A Single Cerebrovascular-Specialty Hospital Study in South Korea

Diagnostics ◽

10.3390/diagnostics11101909 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1909

Author(s):

Dougho Park ◽

Eunhwan Jeong ◽

Haejong Kim ◽

Hae Wook Pyun ◽

Haemin Kim ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Ischemic Stroke ◽

Acute Ischemic Stroke ◽

Functional Outcome ◽

Outcome Prediction ◽

Prediction Models ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting

Background: Functional outcomes after acute ischemic stroke are of great concern to patients and their families, as well as physicians and surgeons who make the clinical decisions. We developed machine learning (ML)-based functional outcome prediction models in acute ischemic stroke. Methods: This retrospective study used a prospective cohort database. A total of 1066 patients with acute ischemic stroke between January 2019 and March 2021 were included. Variables such as demographic factors, stroke-related factors, laboratory findings, and comorbidities were utilized at the time of admission. Five ML algorithms were applied to predict a favorable functional outcome (modified Rankin Scale 0 or 1) at 3 months after stroke onset. Results: Regularized logistic regression showed the best performance with an area under the receiver operating characteristic curve (AUC) of 0.86. Support vector machines represented the second-highest AUC of 0.85 with the highest F1-score of 0.86, and finally, all ML models applied achieved an AUC > 0.8. The National Institute of Health Stroke Scale at admission and age were consistently the top two important variables for generalized logistic regression, random forest, and extreme gradient boosting models. Conclusions: ML-based functional outcome prediction models for acute ischemic stroke were validated and proven to be readily applicable and useful.

Download Full-text

Prediction of Masked Hypertension and Masked Uncontrolled Hypertension Using Machine Learning

Frontiers in Cardiovascular Medicine ◽

10.3389/fcvm.2021.778306 ◽

2021 ◽

Vol 8 ◽

Author(s):

Ming-Hui Hung ◽

Ling-Chieh Shih ◽

Yu-Ching Wang ◽

Hsin-Bang Leu ◽

Po-Hsun Huang ◽

...

Keyword(s):

Machine Learning ◽

Clinical Characteristics ◽

Prediction Models ◽

External Validation ◽

Uncontrolled Hypertension ◽

Gradient Boosting ◽

Masked Hypertension ◽

Internal Validation ◽

Hypertensive Patients ◽

Extreme Gradient Boosting

Objective: This study aimed to develop machine learning-based prediction models to predict masked hypertension and masked uncontrolled hypertension using the clinical characteristics of patients at a single outpatient visit.Methods: Data were derived from two cohorts in Taiwan. The first cohort included 970 hypertensive patients recruited from six medical centers between 2004 and 2005, which were split into a training set (n = 679), a validation set (n = 146), and a test set (n = 145) for model development and internal validation. The second cohort included 416 hypertensive patients recruited from a single medical center between 2012 and 2020, which was used for external validation. We used 33 clinical characteristics as candidate variables to develop models based on logistic regression (LR), random forest (RF), eXtreme Gradient Boosting (XGboost), and artificial neural network (ANN).Results: The four models featured high sensitivity and high negative predictive value (NPV) in internal validation (sensitivity = 0.914–1.000; NPV = 0.853–1.000) and external validation (sensitivity = 0.950–1.000; NPV = 0.875–1.000). The RF, XGboost, and ANN models showed much higher area under the receiver operating characteristic curve (AUC) (0.799–0.851 in internal validation, 0.672–0.837 in external validation) than the LR model. Among the models, the RF model, composed of 6 predictor variables, had the best overall performance in both internal and external validation (AUC = 0.851 and 0.837; sensitivity = 1.000 and 1.000; specificity = 0.609 and 0.580; NPV = 1.000 and 1.000; accuracy = 0.766 and 0.721, respectively).Conclusion: An effective machine learning-based predictive model that requires data from a single clinic visit may help to identify masked hypertension and masked uncontrolled hypertension.

Download Full-text

Predicting Hard Rock Pillar Stability Using GBDT, XGBoost, and LightGBM Algorithms

Mathematics ◽

10.3390/math8050765 ◽

2020 ◽

Vol 8 (5) ◽

pp. 765 ◽

Cited By ~ 6

Author(s):

Weizhang Liang ◽

Suizhi Luo ◽

Guoyan Zhao ◽

Hao Wu

Keyword(s):

Large Scale ◽

Prediction Models ◽

Hard Rock ◽

Gradient Boosting ◽

Pillar Stability ◽

Rock Pillar ◽

Light Gradient ◽

Gradient Boosting Machine ◽

Extreme Gradient Boosting ◽

Hard Rock Mines

Predicting pillar stability is a vital task in hard rock mines as pillar instability can cause large-scale collapse hazards. However, it is challenging because the pillar stability is affected by many factors. With the accumulation of pillar stability cases, machine learning (ML) has shown great potential to predict pillar stability. This study aims to predict hard rock pillar stability using gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) algorithms. First, 236 cases with five indicators were collected from seven hard rock mines. Afterwards, the hyperparameters of each model were tuned using a five-fold cross validation (CV) approach. Based on the optimal hyperparameters configuration, prediction models were constructed using training set (70% of the data). Finally, the test set (30% of the data) was adopted to evaluate the performance of each model. The precision, recall, and F1 indexes were utilized to analyze prediction results of each level, and the accuracy and their macro average values were used to assess the overall prediction performance. Based on the sensitivity analysis of indicators, the relative importance of each indicator was obtained. In addition, the safety factor approach and other ML algorithms were adopted as comparisons. The results showed that GBDT, XGBoost, and LightGBM algorithms achieved a better comprehensive performance, and their prediction accuracies were 0.8310, 0.8310, and 0.8169, respectively. The average pillar stress and ratio of pillar width to pillar height had the most important influences on prediction results. The proposed methodology can provide a reliable reference for pillar design and stability risk management.

Download Full-text

Protein pKa prediction by tree-based machine learning

10.26434/chemrxiv-2021-4d420 ◽

2021 ◽

Author(s):

Ada Y. Chen ◽

Juyong Lee ◽

Ana Damjanovic ◽

Bernard R. Brooks

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Pka Prediction ◽

Light Gradient ◽

Structure Database ◽

Gradient Boosting Machine ◽

Extreme Gradient Boosting ◽

Better Than ◽

Protein Pka

We present four tree-based machine learning models for protein pKa prediction. The four models, Random Forest, Extra Trees, eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM), were trained on three experimental PDB and pKa datasets, two of which included a notable portion of internal residues. We observed similar performance among the four machine learning algorithms. The best model trained on the largest dataset performs 37% better than the widely used empirical pKa prediction tool PROPKA. The overall RMSE for this model is 0.69, with surface and buried RMSE values being 0.56 and 0.78, respectively, considering six residue types (Asp, Glu, His, Lys, Cys and Tyr), and 0.63 when considering Asp, Glu, His and Lys only. We provide pKa predictions for proteins in human proteome from the AlphaFold Protein Structure Database and observed that 1% of Asp/Glu/Lys residues have highly shifted pKa values close to the physiological pH.

Download Full-text

Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework

Briefings in Bioinformatics ◽

10.1093/bib/bby079 ◽

2018 ◽

Vol 20 (6) ◽

pp. 2185-2199 ◽

Cited By ~ 32

Author(s):

Yanju Zhang ◽

Ruopeng Xie ◽

Jiawei Wang ◽

André Leier ◽

Tatiana T Marquez-Lago ◽

...

Keyword(s):

Machine Learning ◽

Computational Methods ◽

Prediction Models ◽

Gradient Boosting ◽

Support Vector ◽

Post Translational Modification ◽

K Nearest Neighbor ◽

Ensemble Models ◽

Light Gradient ◽

Optimal Ensemble

AbstractAs a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.

Download Full-text