gradient boosting machine
Recently Published Documents


TOTAL DOCUMENTS

193
(FIVE YEARS 163)

H-INDEX

13
(FIVE YEARS 7)

Author(s):  
Touria Hamim ◽  
Faouzia Benabbou ◽  
Nawal Sael

The student profile has become an important component of education systems. Many systems objectives, as e-recommendation, e-orientation, e-recruitment and dropout prediction are essentially based on the profile for decision support. Machine learning plays an important role in this context and several studies have been carried out either for classification, prediction or clustering purpose. In this paper, the authors present a comparative study between different boosting algorithms which have been used successfully in many fields and for many purposes. In addition, the authors applied feature selection methods Fisher Score, Information Gain combined with Recursive Feature Elimination to enhance the preprocessing task and models’ performances. Using multi-label dataset predict the class of the student performance in mathematics, this article results show that the Light Gradient Boosting Machine (LightGBM) algorithm achieved the best performance when using Information gain with Recursive Feature Elimination method compared to the other boosting algorithms.


2022 ◽  
pp. 80-127
Author(s):  
Viswanathan Rajagopalan ◽  
Houwei Cao

Despite significant advancements in diagnosis and disease management, cardiovascular (CV) disorders remain the No. 1 killer both in the United States and across the world, and innovative and transformative technologies such as artificial intelligence (AI) are increasingly employed in CV medicine. In this chapter, the authors introduce different AI and machine learning (ML) tools including support vector machine (SVM), gradient boosting machine (GBM), and deep learning models (DL), and their applicability to advance CV diagnosis and disease classification, and risk prediction and patient management. The applications include, but are not limited to, electrocardiogram, imaging, genomics, and drug research in different CV pathologies such as myocardial infarction (heart attack), heart failure, congenital heart disease, arrhythmias, valvular abnormalities, etc.


2022 ◽  
Vol 15 (1) ◽  
pp. 1-20
Author(s):  
Ravinder Kumar ◽  
Lokesh Kumar Shrivastav

Designing a system for analytics of high-frequency data (Big data) is a very challenging and crucial task in data science. Big data analytics involves the development of an efficient machine learning algorithm and big data processing techniques or frameworks. Today, the development of the data processing system is in high demand for processing high-frequency data in a very efficient manner. This paper proposes the processing and analytics of stochastic high-frequency stock market data using a modified version of suitable Gradient Boosting Machine (GBM). The experimental results obtained are compared with deep learning and Auto-Regressive Integrated Moving Average (ARIMA) methods. The results obtained using modified GBM achieves the highest accuracy (R2 = 0.98) and minimum error (RMSE = 0.85) as compared to the other two approaches.


2022 ◽  
Vol 15 (1) ◽  
pp. 1-19
Author(s):  
Ravinder Kumar ◽  
Lokesh Kumar Shrivastav

Stochastic time series analysis of high-frequency stock market data is a very challenging task for the analysts due to the lack availability of efficient tool and techniques for big data analytics. This has opened the door of opportunities for the developer and researcher to develop intelligent and machine learning based tools and techniques for data analytics. This paper proposed an ensemble for stock market data prediction using three most prominent machine learning based techniques. The stock market dataset with raw data size of 39364 KB with all attributes and processed data size of 11826 KB having 872435 instances. The proposed work implements an ensemble model comprises of Deep Learning, Gradient Boosting Machine (GBM) and distributed Random Forest techniques of data analytics. The performance results of the ensemble model are compared with each of the individual methods i.e. deep learning, Gradient Boosting Machine (GBM) and Random Forest. The ensemble model performs better and achieves the highest accuracy of 0.99 and lowest error (RMSE) of 0.1.


2021 ◽  
Vol 11 (1) ◽  
pp. 229
Author(s):  
Heekyoung Song ◽  
Seongeun Bak ◽  
Imhyeon Kim ◽  
Jae Yeon Woo ◽  
Eui Jin Cho ◽  
...  

This retrospective single-center study included patients diagnosed with epithelial ovarian cancer (EOC) using preoperative pelvic magnetic resonance imaging (MRI). The apparent diffusion coefficient (ADC) of the axial MRI maps that included the largest solid portion of the ovarian mass was analysed. The mean ADC values (ADCmean) were derived from the regions of interest (ROIs) of each largest solid portion. Logistic regression and three types of machine learning (ML) applications were used to analyse the ADCs and clinical factors. Of the 200 patients, 103 had high-grade serous ovarian cancer (HGSOC), and 97 had non-HGSOC (endometrioid carcinoma, clear cell carcinoma, mucinous carcinoma, and low-grade serous ovarian cancer). The median ADCmean of patients with HGSOC was significantly lower than that of patients without HGSOCs. Low ADCmean and CA 19-9 levels were independent predictors for HGSOC over non-HGSOC. Compared to stage I disease, stage III disease was associated with HGSOC. Gradient boosting machine and extreme gradient boosting machine showed the highest accuracy in distinguishing between the histological findings of HGSOC versus non-HGSOC and between the five histological types of EOC. In conclusion, ADCmean, disease stage at diagnosis, and CA 19-9 level were significant factors for differentiating between EOC histological types.


2021 ◽  
Vol 11 (24) ◽  
pp. 12083
Author(s):  
Rasa Zalakeviciute ◽  
Yves Rybarczyk ◽  
Katiuska Alexandrino ◽  
Santiago Bonilla-Bedoya ◽  
Danilo Mejia ◽  
...  

Political and economic protests build-up due to the financial uncertainty and inequality spreading throughout the world. In 2019, Latin America took the main stage in a wave of protests. While the social side of protests is widely explored, the focus of this study is the evolution of gaseous urban air pollutants during and after one of these events. Changes in concentrations of NO2, CO, O3 and SO2 during and after the strike, were studied in Quito, Ecuador using two approaches: (i) inter-period observational analysis; and (ii) machine learning (ML) gradient boosting machine (GBM) developed business-as-usual (BAU) comparison to the observations. During the strike, both methods showed a large reduction in the concentrations of NO2 (31.5–32.36%) and CO (15.55–19.85%) and a slight reduction for O3 and SO2. The GBM approach showed an exclusive potential, especially for a lengthier period of predictions, to estimate strike impact on air quality even after the strike was over. This advocates for the use of machine learning techniques to estimate an extended effect of changes in human activities on urban gaseous pollution.


Author(s):  
Naipeng Liu ◽  
Hui Gao ◽  
Zhen Zhao ◽  
Yule Hu ◽  
Longchen Duan

AbstractIn gas drilling operations, the rate of penetration (ROP) parameter has an important influence on drilling costs. Prediction of ROP can optimize the drilling operational parameters and reduce its overall cost. To predict ROP with satisfactory precision, a stacked generalization ensemble model is developed in this paper. Drilling data were collected from a shale gas survey well in Xinjiang, northwestern China. First, Pearson correlation analysis is used for feature selection. Then, a Savitzky-Golay smoothing filter is used to reduce noise in the dataset. In the next stage, we propose a stacked generalization ensemble model that combines six machine learning models: support vector regression (SVR), extremely randomized trees (ET), random forest (RF), gradient boosting machine (GB), light gradient boosting machine (LightGBM) and extreme gradient boosting (XGB). The stacked model generates meta-data from the five models (SVR, ET, RF, GB, LightGBM) to compute ROP predictions using an XGB model. Then, the leave-one-out method is used to verify modeling performance. The performance of the stacked model is better than each single model, with R2 = 0.9568 and root mean square error = 0.4853 m/h achieved on the testing dataset. Hence, the proposed approach will be useful in optimizing gas drilling. Finally, the particle swarm optimization (PSO) algorithm is used to optimize the relevant ROP parameters.


2021 ◽  
Author(s):  
Ada Y. Chen ◽  
Juyong Lee ◽  
Ana Damjanovic ◽  
Bernard R. Brooks

We present four tree-based machine learning models for protein pKa prediction. The four models, Random Forest, Extra Trees, eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM), were trained on three experimental PDB and pKa datasets, two of which included a notable portion of internal residues. We observed similar performance among the four machine learning algorithms. The best model trained on the largest dataset performs 37% better than the widely used empirical pKa prediction tool PROPKA. The overall RMSE for this model is 0.69, with surface and buried RMSE values being 0.56 and 0.78, respectively, considering six residue types (Asp, Glu, His, Lys, Cys and Tyr), and 0.63 when considering Asp, Glu, His and Lys only. We provide pKa predictions for proteins in human proteome from the AlphaFold Protein Structure Database and observed that 1% of Asp/Glu/Lys residues have highly shifted pKa values close to the physiological pH.


2021 ◽  
Vol 13 (24) ◽  
pp. 13782
Author(s):  
Soyoung Park ◽  
Sanghun Son ◽  
Jaegu Bae ◽  
Doi Lee ◽  
Jae-Jin Kim ◽  
...  

Particulate matter (PM) as an air pollutant is harmful to the human body as well as to the ecosystem. It is crucial to understand the spatiotemporal PM distribution in order to effectively implement reduction methods. However, ground-based air quality monitoring sites are limited in providing reliable concentration values owing to their patchy distribution. Here, we aimed to predict daily PM10 concentrations using boosting algorithms such as gradient boosting machine (GBM), extreme gradient boost (XGB), and light gradient boosting machine (LightGBM). The three models performed well in estimating the spatial contrasts and temporal variability in daily PM10 concentrations. In particular, the LightGBM model outperformed the GBM and XGM models, with an adjusted R2 of 0.84, a root mean squared error of 12.108 μg/m2, a mean absolute error of 8.543 μg/m2, and a mean absolute percentage error of 16%. Despite having high performance, the LightGBM model showed low spatial prediction accuracy near the southwest part of the study area. Additionally, temporal differences were found between the observed and predicted values at high concentrations. These outcomes indicate that such methods can provide intuitive and reliable PM10 concentration values for the management, prevention, and mitigation of air pollution. In the future, performance accuracy could be improved through consideration of different variables related to spatial and seasonal characteristics.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Yanwei Xu ◽  
Weiwei Cai ◽  
Liuyang Wang ◽  
Tancheng Xie

Aiming at the problems of weak generalization ability and long training time in most fault diagnosis models based on deep learning, such as support vector machines and random forest algorithms, one intelligent diagnosis method of rolling bearing fault based on the improved convolution neural network and light gradient boosting machine is proposed. At first, the convolution layer is used to extract the features of the original signal. Second, the generalization ability of the model is improved by replacing the full connection layer with the global average pooling layer. Then, the extracted features are classified by a light gradient boosting machine. Finally, the verification experiment is carried out, and the experimental result shows that the average training and diagnosis time of the model is only 39.73 s and 0.09 s, respectively, and the average classification accuracy of the model is 99.72% and 95.62%, respectively, on the same and variable load test sets, which indicates that the diagnostic efficiency and classification accuracy of the proposed model are better than those of other comparison models.


Sign in / Sign up

Export Citation Format

Share Document