An ARDS Severity Recognition Model based on XGBoost

Abstract Given the subjectivity and non-real-time of disease scoring system and invasive parameters in evaluating the development of acute respiratory distress syndrome (ARDS), combined with noninvasive parameters, this paper proposed an ARDS severity recognition model based on extreme gradient boosting (XGBoost). Firstly, the physiological parameters of patients were extracted based on the MIMIC-III database for statistical analysis, and the outliers and unbalanced samples were processed by the interquartile range and synthetic minority oversampling technique. Then, Pearson correlation coefficient and random forest were used as hybrid feature selection to score the noninvasive parameters comprehensively, and essential parameters for identifying diseases were obtained. Finally, XGBoost combined with grid search cross-validation to determine the best hyper-parameters of the model to realize the accurate classification of disease degree. The experimental results show that the model’s area under the curve (AUC) is as high as 0.98, and the accuracy is 0.90; the total score of blood oxygen saturation (SpO2) is 0.625, which could be used as an essential parameter to evaluate the severity of ARDS. Compared with traditional methods, this model has excellent advantages in real-time and accuracy and could provide more accurate diagnosis and treatment suggestions for medical staff.

Download Full-text

Adaptive scheduling method for dynamic robotic cell based on pattern classification algorithm

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s179396231850040x ◽

2018 ◽

Vol 09 (05) ◽

pp. 1850040 ◽

Cited By ~ 1

Author(s):

Chuyuan Wang ◽

Linxuan Zhang ◽

Chongdang Liu

Keyword(s):

Real Time ◽

Pattern Classification ◽

Feature Selection Method ◽

Classification Algorithm ◽

Adaptive Scheduling ◽

Gradient Boosting ◽

Production Environment ◽

Robotic Cell ◽

Extreme Gradient Boosting ◽

Scheduling Method

In order to deal with the dynamic production environment with frequent fluctuation of processing time, robotic cell needs an efficient scheduling strategy which meets the real-time requirements. This paper proposes an adaptive scheduling method based on pattern classification algorithm to guide the online scheduling process. The method obtains the scheduling knowledge of manufacturing system from the production data and establishes an adaptive scheduler, which can adjust the scheduling rules according to the current production status. In the process of establishing scheduler, how to choose essential attributes is the main difficulty. In order to solve the low performance and low efficiency problem of embedded feature selection method, based on the application of Extreme Gradient Boosting model (XGBoost) to obtain the adaptive scheduler, an improved hybrid optimization algorithm which integrates Gini impurity of XGBoost model into Particle Swarm Optimization (PSO) is employed to acquire the optimal subset of features. The results based on simulated robotic cell system show that the proposed PSO-XGBoost algorithm outperforms existing pattern classification algorithms and the newly learned adaptive model can improve the basic dispatching rules. At the same time, it can meet the demand of real-time scheduling.

Download Full-text

Classification of Hot Spots using XGBoost and LightGBM Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e9459.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 722-724

Keyword(s):

Computational Methods ◽

Protein Interactions ◽

Hot Spots ◽

Cell Metabolism ◽

Pearson Correlation ◽

Classification Performance ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting ◽

Hub Proteins

Protein-Protein Interactions referred as PPIs perform significant role in biological functions like cell metabolism, immune response, signal transduction etc. Hot spots are small fractions of residues in interfaces and provide substantial binding energy in PPIs. Therefore, identification of hot spots is important to discover and analyze molecular medicines and diseases. The current strategy, alanine scanning isn't pertinent to enormous scope applications since the technique is very costly and tedious. The existing computational methods are poor in classification performance as well as accuracy in prediction. They are concerned with the topological structure and gene expression of hub proteins. The proposed system focuses on hot spots of hub proteins by eliminating redundant as well as highly correlated features using Pearson Correlation Coefficient and Support Vector Machine based feature elimination. Extreme Gradient boosting and LightGBM algorithms are used to ensemble a set of weak classifiers to form a strong classifier. The proposed system shows better accuracy than the existing computational methods. The model can also be used to predict accurate molecular inhibitors for specific PPIs

Download Full-text

Machine Learning Methods for Predicting Long-term Mortality in Patients after Cardiac Surgery

10.21203/rs.3.rs-1140660/v1 ◽

2021 ◽

Author(s):

Yue Yu ◽

Chi Peng ◽

Zhiyuan Zhang ◽

Kejia Shen ◽

Yufeng Zhang ◽

...

Keyword(s):

Machine Learning ◽

Cardiac Surgery ◽

Early Warning Systems ◽

Gradient Boosting ◽

Distribution Width ◽

Extreme Gradient Boosting ◽

Mimic Iii ◽

Long Term Mortality

Abstract Background Establishing a mortality prediction model of patients undergoing cardiac surgery might be useful for clinicians for alerting, judgment, and intervention, while few predictive tools for long-term mortality have been developed targeting patients post-cardiac surgery. Objective We aimed to construct and validate several machine learning (ML) algorithms to predict long-term mortality and identify risk factors in unselected patients after cardiac surgery during a 4-year follow-up. Methods The Medical Information Mart for Intensive Care (MIMIC-III) database was used to perform a retrospective administrative database study. Candidate predictors consisted of the demographics, comorbidity, vital signs, laboratory test results, prognostic scoring systems, and treatment information on the first day of ICU admission. 4-year mortality was set as the study outcome. We used the ML methods of logistic regression (LR), artificial neural network (NNET), naïve bayes (NB), gradient boosting machine (GBM), adapting boosting (Ada), random forest (RF), bagged trees (BT), and eXtreme Gradient Boosting (XGB). The prognostic capacity and clinical utility of these ML models were compared using the area under the receiver operating characteristic curves (AUC), calibration curves, and decision curve analysis (DCA). Results Of 7,368 patients in MIMIC-III included in the final cohort, a total of 1,337 (18.15%) patients died during a 4-year follow-up. Among 65 variables extracted from the database, a total of 25 predictors were selected using recursive feature elimination (RFE) and included in the subsequent analysis. The Ada model performed best among eight models in both discriminatory ability with the highest AUC of 0.801 and goodness of fit (visualized by calibration curve). Moreover, the DCA shows that the net benefit of the RF, Ada, and BT models surpassed that of other ML models for almost all threshold probability values. Additionally, through the Ada technique, we determined that red blood cell distribution width (RDW), blood urea nitrogen (BUN), SAPS II, anion gap (AG), age, urine output, chloride, creatinine, congestive heart failure, and SOFA were the Top 10 predictors in the feature importance rankings. Conclusions The Ada model performs best in predicting long-term mortality after cardiac surgery among the eight ML models. The ML-based algorithms might have significant application in the development of early warning systems for patients following operations.

Download Full-text

Cost-Sensitive Extreme Gradient Boosting for Imbalanced Classification of Breast Cancer Diagnosis

2020 10th IEEE International Conference on Control System, Computing and Engineering (ICCSCE) ◽

10.1109/iccsce50387.2020.9204948 ◽

2020 ◽

Author(s):

Manop Phankokkruad

Keyword(s):

Breast Cancer ◽

Cancer Diagnosis ◽

Breast Cancer Diagnosis ◽

Gradient Boosting ◽

Imbalanced Classification ◽

Extreme Gradient Boosting

Download Full-text

Clinical and Laboratory Predictors of In-hospital Mortality in Patients With Coronavirus Disease-2019: A Cohort Study in Wuhan, China

Clinical Infectious Diseases ◽

10.1093/cid/ciaa538 ◽

2020 ◽

Vol 71 (16) ◽

pp. 2079-2088 ◽

Cited By ~ 52

Author(s):

Kun Wang ◽

Peiyuan Zuo ◽

Yuwei Liu ◽

Meng Zhang ◽

Xiaofang Zhao ◽

...

Keyword(s):

Hospital Mortality ◽

Prediction Models ◽

Area Under The Curve ◽

Mortality Prediction ◽

Gradient Boosting ◽

Laboratory Model ◽

Training Cohort ◽

Clinical Model ◽

Extreme Gradient Boosting ◽

Mortality Prediction Models

Abstract Background This study aimed to develop mortality-prediction models for patients with coronavirus disease-2019 (COVID-19). Methods The training cohort included consecutive COVID-19 patients at the First People’s Hospital of Jiangxia District in Wuhan, China, from 7 January 2020 to 11 February 2020. We selected baseline data through the stepwise Akaike information criterion and ensemble XGBoost (extreme gradient boosting) model to build mortality-prediction models. We then validated these models by randomly collected COVID-19 patients in Union Hospital, Wuhan, from 1 January 2020 to 20 February 2020. Results A total of 296 COVID-19 patients were enrolled in the training cohort; 19 died during hospitalization and 277 discharged from the hospital. The clinical model developed using age, history of hypertension, and coronary heart disease showed area under the curve (AUC), 0.88 (95% confidence interval [CI], .80–.95); threshold, −2.6551; sensitivity, 92.31%; specificity, 77.44%; and negative predictive value (NPV), 99.34%. The laboratory model developed using age, high-sensitivity C-reactive protein, peripheral capillary oxygen saturation, neutrophil and lymphocyte count, d-dimer, aspartate aminotransferase, and glomerular filtration rate had a significantly stronger discriminatory power than the clinical model (P = .0157), with AUC, 0.98 (95% CI, .92–.99); threshold, −2.998; sensitivity, 100.00%; specificity, 92.82%; and NPV, 100.00%. In the subsequent validation cohort (N = 44), the AUC (95% CI) was 0.83 (.68–.93) and 0.88 (.75–.96) for the clinical model and laboratory model, respectively. Conclusions We developed 2 predictive models for the in-hospital mortality of patients with COVID-19 in Wuhan that were validated in patients from another center.

Download Full-text

An E-Commerce Coupon Target Population Positioning Model Based on Random Forest and eXtreme Gradient Boosting

2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) ◽

10.1109/cisp-bmei.2018.8633247 ◽

2018 ◽

Cited By ~ 1

Author(s):

Zhang-Fa Yan ◽

Yu-Lin Shen ◽

Wei-Jun Liu ◽

Jie-Min Long ◽

Qingyang Wei

Keyword(s):

Random Forest ◽

Target Population ◽

Gradient Boosting ◽

Model Based ◽

Extreme Gradient Boosting

Download Full-text

Gully Erosion Susceptibility Mapping in Highly Complex Terrain Using Machine Learning Models

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10100680 ◽

2021 ◽

Vol 10 (10) ◽

pp. 680

Author(s):

Annan Yang ◽

Chunmei Wang ◽

Guowei Pang ◽

Yongqing Long ◽

Lei Wang ◽

...

Keyword(s):

Machine Learning ◽

Complex Terrain ◽

Large Scale ◽

Area Under The Curve ◽

Gully Erosion ◽

Susceptibility Mapping ◽

Weight Of Evidence ◽

Gradient Boosting ◽

Machine Learning Classification ◽

Extreme Gradient Boosting

Gully erosion is the most severe type of water erosion and is a major land degradation process. Gully erosion susceptibility mapping (GESM)’s efficiency and interpretability remains a challenge, especially in complex terrain areas. In this study, a WoE-MLC model was used to solve the above problem, which combines machine learning classification algorithms and the statistical weight of evidence (WoE) model in the Loess Plateau. The three machine learning (ML) algorithms utilized in this research were random forest (RF), gradient boosted decision trees (GBDT), and extreme gradient boosting (XGBoost). The results showed that: (1) GESM were well predicted by combining both machine learning regression models and WoE-MLC models, with the area under the curve (AUC) values both greater than 0.92, and the latter was more computationally efficient and interpretable; (2) The XGBoost algorithm was more efficient in GESM than the other two algorithms, with the strongest generalization ability and best performance in avoiding overfitting (averaged AUC = 0.947), followed by the RF algorithm (averaged AUC = 0.944), and GBDT algorithm (averaged AUC = 0.938); and (3) slope gradient, land use, and altitude were the main factors for GESM. This study may provide a possible method for gully erosion susceptibility mapping at large scale.

Download Full-text

Machine learning techniques to predict daily rainfall amount

Journal Of Big Data ◽

10.1186/s40537-021-00545-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Chalachew Muluken Liyew ◽

Haileyesus Amsaya Melese

Keyword(s):

Machine Learning ◽

Pearson Correlation ◽

Daily Rainfall ◽

Learning Model ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Correlation Technique ◽

Learning Techniques ◽

Machine Learning Model ◽

Extreme Gradient Boosting

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.

Download Full-text

Smart Cardiac Framework for an Early Detection of Cardiac Arrest Condition and Risk

Frontiers in Public Health ◽

10.3389/fpubh.2021.762303 ◽

2021 ◽

Vol 9 ◽

Author(s):

Apeksha Shah ◽

Swati Ahirrao ◽

Sharnil Pandya ◽

Ketan Kotecha ◽

Suresh Rathod

Keyword(s):

Cardiac Arrest ◽

Real Time ◽

Cox Regression ◽

Risk Classification ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Time Data ◽

Extreme Gradient Boosting ◽

Real Time Data

Cardiovascular disease (CVD) is considered to be one of the most epidemic diseases in the world today. Predicting CVDs, such as cardiac arrest, is a difficult task in the area of healthcare. The healthcare industry has a vast collection of datasets for analysis and prediction purposes. Somehow, the predictions made on these publicly available datasets may be erroneous. To make the prediction accurate, real-time data need to be collected. This study collected real-time data using sensors and stored it on a cloud computing platform, such as Google Firebase. The acquired data is then classified using six machine-learning algorithms: Artificial Neural Network (ANN), Random Forest Classifier (RFC), Gradient Boost Extreme Gradient Boosting (XGBoost) classifier, Support Vector Machine (SVM), Naïve Bayes (NB), and Decision Tree (DT). Furthermore, we have presented two novel gender-based risk classification and age-wise risk classification approach in the undertaken study. The presented approaches have used Kaplan-Meier and Cox regression survival analysis methodologies for risk detection and classification. The presented approaches also assist health experts in identifying the risk probability risk and the 10-year risk score prediction. The proposed system is an economical alternative to the existing system due to its low cost. The outcome obtained shows an enhanced level of performance with an overall accuracy of 98% using DT on our collected dataset for cardiac risk prediction. We also introduced two risk classification models for gender- and age-wise people to detect their survival probability. The outcome of the proposed model shows accurate probability in both classes.

Download Full-text

KDClassifier: A urinary proteomic spectra analysis tool based on machine learning for the classification of kidney diseases

Aging Pathobiology and Therapeutics ◽

10.31491/apt.2021.09.064 ◽

2021 ◽

Vol 3 (3) ◽

pp. 63-72

Author(s):

Wanjun Zhao ◽

Keyword(s):

Kidney Disease ◽

Kidney Diseases ◽

Confusion Matrix ◽

Gradient Boosting ◽

Support Vector ◽

Diagnostic Model ◽

Analysis Tool ◽

Data Set ◽

Extreme Gradient Boosting

Background: We aimed to establish a novel diagnostic model for kidney diseases by combining artificial intelligence with complete mass spectrum information from urinary proteomics. Methods: We enrolled 134 patients (IgA nephropathy, membranous nephropathy, and diabetic kidney disease) and 68 healthy participants as controls, with a total of 610,102 mass spectra from their urinary proteomic profiles. The training data set (80%) was used to create a diagnostic model using XGBoost, random forest (RF), a support vector machine (SVM), and artificial neural networks (ANNs). The diagnostic accuracy was evaluated using a confusion matrix with a test dataset (20%). We also constructed receiver operating-characteristic, Lorenz, and gain curves to evaluate the diagnostic model. Results: Compared with the RF, SVM, and ANNs, the modified XGBoost model, called Kidney Disease Classifier (KDClassifier), showed the best performance. The accuracy of the XGBoost diagnostic model was 96.03%. The area under the curve of the extreme gradient boosting (XGBoost) model was 0.952 (95% confidence interval, 0.9307–0.9733). The Kolmogorov-Smirnov (KS) value of the Lorenz curve was 0.8514. The Lorenz and gain curves showed the strong robustness of the developed model. Conclusions: The KDClassifier achieved high accuracy and robustness and thus provides a potential tool for the classification of kidney diseases

Download Full-text