gradient boosting
Recently Published Documents





Touria Hamim ◽  
Faouzia Benabbou ◽  
Nawal Sael

The student profile has become an important component of education systems. Many systems objectives, as e-recommendation, e-orientation, e-recruitment and dropout prediction are essentially based on the profile for decision support. Machine learning plays an important role in this context and several studies have been carried out either for classification, prediction or clustering purpose. In this paper, the authors present a comparative study between different boosting algorithms which have been used successfully in many fields and for many purposes. In addition, the authors applied feature selection methods Fisher Score, Information Gain combined with Recursive Feature Elimination to enhance the preprocessing task and models’ performances. Using multi-label dataset predict the class of the student performance in mathematics, this article results show that the Light Gradient Boosting Machine (LightGBM) algorithm achieved the best performance when using Information gain with Recursive Feature Elimination method compared to the other boosting algorithms.

2022 ◽  
Vol 13 (1) ◽  
pp. 1-21
Zhihan Lv ◽  
Ranran Lou ◽  
Hailin Feng ◽  
Dongliang Chen ◽  
Haibin Lv

Two-dimensional 1 arrays of bi-component structures made of cobalt and permalloy elliptical dots with thickness of 25 nm, length 1 mm and width of 225 nm, have been prepared by a self-aligned shadow deposition technique. Brillouin light scattering has been exploited to study the frequency dependence of thermally excited magnetic eigenmodes on the intensity of the external magnetic field, applied along the easy axis of the elements. Scientific information technology has been developed rapidly. Here, the purposes are to make people's lives more convenient and ensure information management and classification. The machine learning algorithm is improved to obtain the optimized Light Gradient Boosting Machine (LightGBM) algorithm. Then, an Android-based intelligent support information management system is designed based on LightGBM for the big data analysis and classification management of information in the intelligent support information management system. The system is designed with modules of employee registration and login, company announcement notice, attendance and attendance management, self-service, and daily tools with the company as the subject. Furthermore, the performance of the constructed information management system is analyzed through simulations. Results demonstrate that the training time of the optimized LightGBM algorithm can stabilize at about 100s, and the test time can stabilize at 0.68s. Besides, its accuracy rate can reach 89.24%, which is at least 3.6% higher than other machine learning algorithms. Moreover, the acceleration efficiency analysis of each algorithm suggests that the optimized LightGBM algorithm is suitable for processing large amounts of data; its acceleration effect is more apparent, and its acceleration ratio is higher than other algorithms. Hence, the constructed intelligent support information management system can reach a high accuracy while ensuring the error, with apparent acceleration effect. Therefore, this model can provide an experimental reference for information classification and management in various fields.

Ahmed Nasser ◽  
Huthaifa AL-Khazraji

<p>Predictive maintenance (PdM) is a successful strategy used to reduce cost by minimizing the breakdown stoppages and production loss. The massive amount of data that results from the integration between the physical and digital systems of the production process makes it possible for deep learning (DL) algorithms to be applied and utilized for fault prediction and diagnosis. This paper presents a hybrid convolutional neural network based and long short-term memory network (CNN-LSTM) approach to a predictive maintenance problem. The proposed CNN-LSTM approach enhances the predictive accuracy and also reduces the complexity of the model. To evaluate the proposed model, two comparisons with regular LSTM and gradient boosting decision tree (GBDT) methods using a freely available dataset have been made. The PdM model based on CNN-LSTM method demonstrates better prediction accuracy compared to the regular LSTM, where the average F-Score increases form 93.34% in the case of regular LSTM to 97.48% for the proposed CNN-LSTM. Compared to the related works the proposed hybrid CNN-LSTM PdM approach achieved better results in term of accuracy.</p>

Ramsha Saeed ◽  
Hammad Afzal ◽  
Haider Abbas ◽  
Maheen Fatima

Increased connectivity has contributed greatly in facilitating rapid access to information and reliable communication. However, the uncontrolled information dissemination has also resulted in the spread of fake news. Fake news might be spread by a group of people or organizations to serve ulterior motives such as political or financial gains or to damage a country’s public image. Given the importance of timely detection of fake news, the research area has intrigued researchers from all over the world. Most of the work for detecting fake news focuses on the English language. However, automated detection of fake news is important irrespective of the language used for spreading false information. Recognizing the importance of boosting research on fake news detection for low resource languages, this work proposes a novel semantically enriched technique to effectively detect fake news in Urdu—a low resource language. A model based on deep contextual semantics learned from the convolutional neural network is proposed. The features learned from the convolutional neural network are combined with other n-gram-based features and are fed to a conventional majority voting ensemble classifier fitted with three base learners: Adaptive Boosting, Gradient Boosting, and Multi-Layer Perceptron. Experiments are performed with different models, and results show that enriching the traditional ensemble learner with deep contextual semantics along with other standard features shows the best results and outperforms the state-of-the-art Urdu fake news detection model.

2022 ◽  
Vol 8 ◽  
Chien-Liang Liu ◽  
You-Lin Tain ◽  
Yun-Chun Lin ◽  
Chien-Ning Hsu

ObjectiveThis study aimed to identify phenotypic clinical features associated with acute kidney injury (AKI) to predict non-recovery from AKI at hospital discharge using electronic health record data.MethodsData for hospitalized patients in the AKI Recovery Evaluation Study were derived from a large healthcare delivery system in Taiwan between January 2011 and December 2017. Living patients with AKI non-recovery were used to derive and validate multiple predictive models. In total, 64 candidates variables, such as demographic characteristics, comorbidities, healthcare services utilization, laboratory values, and nephrotoxic medication use, were measured within 1 year before the index admission and during hospitalization for AKI.ResultsAmong the top 20 important features in the predictive model, 8 features had a positive effect on AKI non-recovery prediction: AKI during hospitalization, serum creatinine (SCr) level at admission, receipt of dialysis during hospitalization, baseline comorbidity of cancer, AKI at admission, baseline lymphocyte count, baseline potassium, and low-density lipoprotein cholesterol levels. The predicted AKI non-recovery risk model using the eXtreme Gradient Boosting (XGBoost) algorithm achieved an area under the receiver operating characteristic (AUROC) curve statistic of 0.807, discrimination with a sensitivity of 0.724, and a specificity of 0.738 in the temporal validation cohort.ConclusionThe machine learning model approach can accurately predict AKI non-recovery using routinely collected health data in clinical practice. These results suggest that multifactorial risk factors are involved in AKI non-recovery, requiring patient-centered risk assessments and promotion of post-discharge AKI care to prevent AKI complications.

Cancers ◽  
2022 ◽  
Vol 14 (2) ◽  
pp. 439
Anetta Sulewska ◽  
Jacek Niklinski ◽  
Radoslaw Charkiewicz ◽  
Piotr Karabowicz ◽  
Przemyslaw Biecek ◽  

LncRNAs have arisen as new players in the world of non-coding RNA. Disrupted expression of these molecules can be tightly linked to the onset, promotion and progression of cancer. The present study estimated the usefulness of 14 lncRNAs (HAGLR, ADAMTS9-AS2, LINC00261, MCM3AP-AS1, TP53TG1, C14orf132, LINC00968, LINC00312, TP73-AS1, LOC344887, LINC00673, SOX2-OT, AFAP1-AS1, LOC730101) for early detection of non-small-cell lung cancer (NSCLC). The total RNA was isolated from paired fresh-frozen cancerous and noncancerous lung tissue from 92 NSCLC patients diagnosed with either adenocarcinoma (LUAD) or lung squamous cell carcinoma (LUSC). The expression level of lncRNAs was evaluated by a quantitative real-time PCR (qPCR). Based on Ct and delta Ct values, logistic regression and gradient boosting decision tree classifiers were built. The latter is a novel, advanced machine learning algorithm with great potential in medical science. The established predictive models showed that a set of 14 lncRNAs accurately discriminates cancerous from noncancerous lung tissues (AUC value of 0.98 ± 0.01) and NSCLC subtypes (AUC value of 0.84 ± 0.09), although the expression of a few molecules was statistically insignificant (SOX2-OT, AFAP1-AS1 and LOC730101 for tumor vs. normal tissue; and TP53TG1, C14orf132, LINC00968 and LOC730101 for LUAD vs. LUSC). However for subtypes discrimination, the simplified logistic regression model based on the four variables (delta Ct AFAP1-AS1, Ct SOX2-OT, Ct LINC00261, and delta Ct LINC00673) had even stronger diagnostic potential than the original one (AUC value of 0.88 ± 0.07). Our results demonstrate that the 14 lncRNA signature can be an auxiliary tool to endorse and complement the histological diagnosis of non-small-cell lung cancer.

2022 ◽  
Vol 17 (1) ◽  
pp. 165-198
Kamil Matuszelański ◽  
Katarzyna Kopczewska

This study is a comprehensive and modern approach to predict customer churn in the example of an e-commerce retail store operating in Brazil. Our approach consists of three stages in which we combine and use three different datasets: numerical data on orders, textual after-purchase reviews and socio-geo-demographic data from the census. At the pre-processing stage, we find topics from text reviews using Latent Dirichlet Allocation, Dirichlet Multinomial Mixture and Gibbs sampling. In the spatial analysis, we apply DBSCAN to get rural/urban locations and analyse neighbourhoods of customers located with zip codes. At the modelling stage, we apply machine learning extreme gradient boosting and logistic regression. The quality of models is verified with area-under-curve and lift metrics. Explainable artificial intelligence represented with a permutation-based variable importance and a partial dependence profile help to discover the determinants of churn. We show that customers’ propensity to churn depends on: (i) payment value for the first order, number of items bought and shipping cost; (ii) categories of the products bought; (iii) demographic environment of the customer; and (iv) customer location. At the same time, customers’ propensity to churn is not influenced by: (i) population density in the customer’s area and division into rural and urban areas; (ii) quantitative review of the first purchase; and (iii) qualitative review summarised as a topic.

Sign in / Sign up

Export Citation Format

Share Document