extreme gradient boosting
Recently Published Documents





Subhra Swetanisha ◽  
Amiya Ranjan Panda ◽  
Dayal Kumar Behera

<p>An ensemble model has been proposed in this work by combining the extreme gradient boosting classification (XGBoost) model with support vector machine (SVM) for land use and land cover classification (LULCC). We have used the multispectral Landsat-8 operational land imager sensor (OLI) data with six spectral bands in the electromagnetic spectrum (EM). The area of study is the administrative boundary of the twin cities of Odisha. Data collected in 2020 is classified into seven land use classes/labels: river, canal, pond, forest, urban, agricultural land, and sand. Comparative assessments of the results of ten machine learning models are accomplished by computing the overall accuracy, kappa coefficient, producer accuracy and user accuracy. An ensemble classifier model makes the classification more precise than the other state-of-the-art machine learning classifiers.</p>

2022 ◽  
Vol 30 (7) ◽  
pp. 0-0

Enterprise financial risks are analyzed utilizing the theory of organizational behavior, and a financial risk management system is constructed to improve the design and algorithm of the enterprise risk management system. Base on the CCER (China Center for Economic Research) database, the early warning model for enterprise financial risk management containing five indices is proposed for enterprises. Through Logistic regression analysis, the design principle of the financial risk management system based on AI (Artificial Intelligence) technology is explained. The proposed system innovatively introduces the AI integrated learning method, optimizes objective function through XGBoost (eXtreme Gradient Boosting) algorithm, and trains the model through BP (Backpropagation) NN (Neural Network). Finally, following comparative analysis, the effectiveness of the proposed method is verified.

2022 ◽  
Vol 8 ◽  
Chien-Liang Liu ◽  
You-Lin Tain ◽  
Yun-Chun Lin ◽  
Chien-Ning Hsu

ObjectiveThis study aimed to identify phenotypic clinical features associated with acute kidney injury (AKI) to predict non-recovery from AKI at hospital discharge using electronic health record data.MethodsData for hospitalized patients in the AKI Recovery Evaluation Study were derived from a large healthcare delivery system in Taiwan between January 2011 and December 2017. Living patients with AKI non-recovery were used to derive and validate multiple predictive models. In total, 64 candidates variables, such as demographic characteristics, comorbidities, healthcare services utilization, laboratory values, and nephrotoxic medication use, were measured within 1 year before the index admission and during hospitalization for AKI.ResultsAmong the top 20 important features in the predictive model, 8 features had a positive effect on AKI non-recovery prediction: AKI during hospitalization, serum creatinine (SCr) level at admission, receipt of dialysis during hospitalization, baseline comorbidity of cancer, AKI at admission, baseline lymphocyte count, baseline potassium, and low-density lipoprotein cholesterol levels. The predicted AKI non-recovery risk model using the eXtreme Gradient Boosting (XGBoost) algorithm achieved an area under the receiver operating characteristic (AUROC) curve statistic of 0.807, discrimination with a sensitivity of 0.724, and a specificity of 0.738 in the temporal validation cohort.ConclusionThe machine learning model approach can accurately predict AKI non-recovery using routinely collected health data in clinical practice. These results suggest that multifactorial risk factors are involved in AKI non-recovery, requiring patient-centered risk assessments and promotion of post-discharge AKI care to prevent AKI complications.

2022 ◽  
Vol 17 (1) ◽  
pp. 165-198
Kamil Matuszelański ◽  
Katarzyna Kopczewska

This study is a comprehensive and modern approach to predict customer churn in the example of an e-commerce retail store operating in Brazil. Our approach consists of three stages in which we combine and use three different datasets: numerical data on orders, textual after-purchase reviews and socio-geo-demographic data from the census. At the pre-processing stage, we find topics from text reviews using Latent Dirichlet Allocation, Dirichlet Multinomial Mixture and Gibbs sampling. In the spatial analysis, we apply DBSCAN to get rural/urban locations and analyse neighbourhoods of customers located with zip codes. At the modelling stage, we apply machine learning extreme gradient boosting and logistic regression. The quality of models is verified with area-under-curve and lift metrics. Explainable artificial intelligence represented with a permutation-based variable importance and a partial dependence profile help to discover the determinants of churn. We show that customers’ propensity to churn depends on: (i) payment value for the first order, number of items bought and shipping cost; (ii) categories of the products bought; (iii) demographic environment of the customer; and (iv) customer location. At the same time, customers’ propensity to churn is not influenced by: (i) population density in the customer’s area and division into rural and urban areas; (ii) quantitative review of the first purchase; and (iii) qualitative review summarised as a topic.

Diagnostics ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. 203
I-Jung Tsai ◽  
Wen-Chi Shen ◽  
Chia-Ling Lee ◽  
Horng-Dar Wang ◽  
Ching-Yu Lin

Bladder cancer has been increasing globally. Urinary cytology is considered a major screening method for bladder cancer, but it has poor sensitivity. This study aimed to utilize clinical laboratory data and machine learning methods to build predictive models of bladder cancer. A total of 1336 patients with cystitis, bladder cancer, kidney cancer, uterus cancer, and prostate cancer were enrolled in this study. Two-step feature selection combined with WEKA and forward selection was performed. Furthermore, five machine learning models, including decision tree, random forest, support vector machine, extreme gradient boosting (XGBoost), and light gradient boosting machine (GBM) were applied. Features, including calcium, alkaline phosphatase (ALP), albumin, urine ketone, urine occult blood, creatinine, alanine aminotransferase (ALT), and diabetes were selected. The lightGBM model obtained an accuracy of 84.8% to 86.9%, a sensitivity 84% to 87.8%, a specificity of 82.9% to 86.7%, and an area under the curve (AUC) of 0.88 to 0.92 in discriminating bladder cancer from cystitis and other cancers. Our study provides a demonstration of utilizing clinical laboratory data to predict bladder cancer.

2022 ◽  
Vol 11 (1) ◽  
pp. 23
Raj Bridgelall

Knowing what perpetrators want can inform strategies to achieve safe, secure, and sustainable societies. To help advance the body of knowledge in counterterrorism, this research applied natural language processing and machine learning techniques to a comprehensive database of terrorism events. A specially designed empirical topic modeling technique provided a machine-aided human decision process to glean six categories of perpetrator aims from the motive text narrative. Subsequently, six different machine learning models validated the aim categories based on the accuracy of their association with a different narrative field, the event summary. The ROC-AUC scores of the classification ranged from 86% to 93%. The Extreme Gradient Boosting model provided the best predictive performance. The intelligence community can use the identified aim categories to help understand the incentive structure of terrorist groups and customize strategies for dealing with them.

2022 ◽  
Vol 15 (1) ◽  
pp. 35
Shekar Shetty ◽  
Mohamed Musa ◽  
Xavier Brédart

In this study, we apply several advanced machine learning techniques including extreme gradient boosting (XGBoost), support vector machine (SVM), and a deep neural network to predict bankruptcy using easily obtainable financial data of 3728 Belgian Small and Medium Enterprises (SME’s) during the period 2002–2012. Using the above-mentioned machine learning techniques, we predict bankruptcies with a global accuracy of 82–83% using only three easily obtainable financial ratios: the return on assets, the current ratio, and the solvency ratio. While the prediction accuracy is similar to several previous models in the literature, our model is very simple to implement and represents an accurate and user-friendly tool to discriminate between bankrupt and non-bankrupt firms.

Sarah Barber ◽  
Florian Hammer ◽  
Adrian Tica

Abstract Data-driven wind turbine performance predictions, such as power and loads, are important for planning and operation. Current methods do not take site-specific conditions such as turbulence intensity and shear into account, which could result in errors of up to 10%. In this work, four different machine learning models (k-nearest neighbors regression, random forest regression, extreme gradient boosting regression and artificial neural networks (ANN) are trained and tested, firstly on a simulation dataset and then on a real dataset. It is found that machine learning methods that take site-specific conditions into account can improve prediction accuracy by a factor of two to three, depening on the error indicator chosen. Similar results are observed for multi-output ANNs for simulated in- and out-of-plane rotor blade tip deflection and root loads. Future work focuses on understanding transferability of results between different turbines within a wind farm and between different wind turbine types.

2022 ◽  
pp. ASN.2021040538
Arthur M. Lee ◽  
Jian Hu ◽  
Yunwen Xu ◽  
Alison G. Abraham ◽  
Rui Xiao ◽  

BackgroundUntargeted plasma metabolomic profiling combined with machine learning (ML) may lead to discovery of metabolic profiles that inform our understanding of pediatric CKD causes. We sought to identify metabolomic signatures in pediatric CKD based on diagnosis: FSGS, obstructive uropathy (OU), aplasia/dysplasia/hypoplasia (A/D/H), and reflux nephropathy (RN).MethodsUntargeted metabolomic quantification (GC-MS/LC-MS, Metabolon) was performed on plasma from 702 Chronic Kidney Disease in Children study participants (n: FSGS=63, OU=122, A/D/H=109, and RN=86). Lasso regression was used for feature selection, adjusting for clinical covariates. Four methods were then applied to stratify significance: logistic regression, support vector machine, random forest, and extreme gradient boosting. ML training was performed on 80% total cohort subsets and validated on 20% holdout subsets. Important features were selected based on being significant in at least two of the four modeling approaches. We additionally performed pathway enrichment analysis to identify metabolic subpathways associated with CKD cause.ResultsML models were evaluated on holdout subsets with receiver-operator and precision-recall area-under-the-curve, F1 score, and Matthews correlation coefficient. ML models outperformed no-skill prediction. Metabolomic profiles were identified based on cause. FSGS was associated with the sphingomyelin-ceramide axis. FSGS was also associated with individual plasmalogen metabolites and the subpathway. OU was associated with gut microbiome–derived histidine metabolites.ConclusionML models identified metabolomic signatures based on CKD cause. Using ML techniques in conjunction with traditional biostatistics, we demonstrated that sphingomyelin-ceramide and plasmalogen dysmetabolism are associated with FSGS and that gut microbiome–derived histidine metabolites are associated with OU.

Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 218
SaravanaKumar Venkatesan ◽  
Jonghyun Lim ◽  
Hoon Ko ◽  
Yongyun Cho

Context: Energy utilization is one of the most closely related factors affecting many areas of the smart farm, plant growth, crop production, device automation, and energy supply to the same degree. Recently, 4th industrial revolution technologies such as IoT, artificial intelligence, and big data have been widely used in smart farm environments to efficiently use energy and control smart farms’ conditions. In particular, machine learning technologies with big data analysis are actively used as one of the most potent prediction methods supporting energy use in the smart farm. Purpose: This study proposes a machine learning-based prediction model for peak energy use by analyzing energy-related data collected from various environmental and growth devices in a smart paprika farm of the Jeonnam Agricultural Research and Extension Service in South Korea between 2019 and 2021. Scientific method: To find out the most optimized prediction model, comparative evaluation tests are performed using representative ML algorithms such as artificial neural network, support vector regression, random forest, K-nearest neighbors, extreme gradient boosting and gradient boosting machine, and time series algorithm ARIMA with binary classification for a different number of input features. Validate: This article can provide an effective and viable way for smart farm managers or greenhouse farmers who can better manage the problem of agricultural energy economically and environmentally. Therefore, we hope that the recommended ML method will help improve the smart farm’s energy use or their energy policies in various fields related to agricultural energy. Conclusion: The seven performance metrics including R-squared, root mean squared error, and mean absolute error, are associated with these two algorithms. It is concluded that the RF-based model is more successful than in the pre-others diction accuracy of 92%. Therefore, the proposed model may be contributed to the development of various applications for environment energy usage in a smart farm, such as a notification service for energy usage peak time or an energy usage control for each device.

Sign in / Sign up

Export Citation Format

Share Document