bagging ensemble
Recently Published Documents


TOTAL DOCUMENTS

97
(FIVE YEARS 53)

H-INDEX

13
(FIVE YEARS 3)

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Mahyar Sharifi ◽  
Toktam Khatibi ◽  
Mohammad Hassan Emamian ◽  
Somayeh Sadat ◽  
Hassan Hashemi ◽  
...  

Abstract Objectives To develop and to propose a machine learning model for predicting glaucoma and identifying its risk factors. Method Data analysis pipeline is designed for this study based on Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology. The main steps of the pipeline include data sampling, preprocessing, classification and evaluation and validation. Data sampling for providing the training dataset was performed with balanced sampling based on over-sampling and under-sampling methods. Data preprocessing steps were missing value imputation and normalization. For classification step, several machine learning models were designed for predicting glaucoma including Decision Trees (DTs), K-Nearest Neighbors (K-NN), Support Vector Machines (SVM), Random Forests (RFs), Extra Trees (ETs) and Bagging Ensemble methods. Moreover, in the classification step, a novel stacking ensemble model is designed and proposed using the superior classifiers. Results The data were from Shahroud Eye Cohort Study including demographic and ophthalmology data for 5190 participants aged 40-64 living in Shahroud, northeast Iran. The main variables considered in this dataset were 67 demographics, ophthalmologic, optometric, perimetry, and biometry features for 4561 people, including 4474 non-glaucoma participants and 87 glaucoma patients. Experimental results show that DTs and RFs trained based on under-sampling of the training dataset have superior performance for predicting glaucoma than the compared single classifiers and bagging ensemble methods with the average accuracy of 87.61 and 88.87, the sensitivity of 73.80 and 72.35, specificity of 87.88 and 89.10 and area under the curve (AUC) of 91.04 and 94.53, respectively. The proposed stacking ensemble has an average accuracy of 83.56, a sensitivity of 82.21, a specificity of 81.32, and an AUC of 88.54. Conclusions In this study, a machine learning model is proposed and developed to predict glaucoma disease among persons aged 40-64. Top predictors in this study considered features for discriminating and predicting non-glaucoma persons from glaucoma patients include the number of the visual field detect on perimetry, vertical cup to disk ratio, white to white diameter, systolic blood pressure, pupil barycenter on Y coordinate, age, and axial length.


2021 ◽  
Vol 28 (1) ◽  
Author(s):  
C.I. Ejiofor ◽  
L.C. Ochei

Breast cancer is associated with abnormal breast cells emanating from the breast tissues, having the propensity for malignancy or non-malignancy. The causativeness of breast cancer can be linked with genetic or environmental factors. Reliable prediction is integral to proper management and treatment of breast cancer. Sequent to this, researchers have placed a high priority toward enhancing the accuracy for breast cancer prediction. This study employs the rich capability of ensemble bagging machine learning technique for predicting breast cancer. The Heterogenous Bagging Ensemble Model for Predicting Breast Cancer (HBEM-BC) was initiated employing Decision Tree (DT) and Logistic Regression (LR) as base learners. The HBEM-BC was implemented utilizing python programming language with subsequently interfaces presented. The validation of the HBEM-BC presented an accuracy value of 0.74(74%) while independently presenting a Root Square Means Error (RMSE) of 0.41(41%) for Logistic Regression (LR) and 0.51(51%) for Decision Tree (DT) respectively.


Information ◽  
2021 ◽  
Vol 12 (8) ◽  
pp. 291
Author(s):  
Moussa Diallo ◽  
Shengwu Xiong ◽  
Eshete Derb Emiru ◽  
Awet Fesseha ◽  
Aminu Onimisi Abdulsalami ◽  
...  

Classification algorithms have shown exceptional prediction results in the supervised learning area. These classification algorithms are not always efficient when it comes to real-life datasets due to class distributions. As a result, datasets for real-life applications are generally imbalanced. Several methods have been proposed to solve the problem of class imbalance. In this paper, we propose a hybrid method combining the preprocessing techniques and those of ensemble learning. The original training set is undersampled by evaluating the samples by stochastic measurement (SM) and then training these samples selected by Multilayer Perceptron to return a balanced training set. The MLPUS (Multilayer perceptron undersampling) balanced training set is aggregated using the bagging ensemble method. We applied our method to the real-life Niger_Rice dataset and forty-four other imbalanced datasets from the KEEL repository in this study. We also compared our method with six other existing methods in the literature, such as the MLP classifier on the original imbalance dataset, MLPUS, UnderBagging (combining random under-sampling and bagging), RUSBoost, SMOTEBagging (Synthetic Minority Oversampling Technique and bagging), SMOTEBoost. The results show that our method is competitive compared to other methods. The Niger_Rice real-life dataset results are 75.6, 0.73, 0.76, and 0.86, respectively, for accuracy, F-measure, G-mean, and ROC with our proposed method. In contrast, the MLP classifier on the original imbalance Niger_Rice dataset gives results 72.44, 0.82, 0.59, and 0.76 respectively for accuracy, F-measure, G-mean, and ROC.


2021 ◽  
Vol 25 (4) ◽  
pp. 825-846
Author(s):  
Ahmad Jaffar Khan ◽  
Basit Raza ◽  
Ahmad Raza Shahid ◽  
Yogan Jaya Kumar ◽  
Muhammad Faheem ◽  
...  

Almost all real-world datasets contain missing values. Classification of data with missing values can adversely affect the performance of a classifier if not handled correctly. A common approach used for classification with incomplete data is imputation. Imputation transforms incomplete data with missing values to complete data. Single imputation methods are mostly less accurate than multiple imputation methods which are often computationally much more expensive. This study proposes an imputed feature selected bagging (IFBag) method which uses multiple imputation, feature selection and bagging ensemble learning approach to construct a number of base classifiers to classify new incomplete instances without any need for imputation in testing phase. In bagging ensemble learning approach, data is resampled multiple times with substitution, which can lead to diversity in data thus resulting in more accurate classifiers. The experimental results show the proposed IFBag method is considerably fast and gives 97.26% accuracy for classification with incomplete data as compared to common methods used.


2021 ◽  
Vol 11 (14) ◽  
pp. 6322
Author(s):  
Zhibin Zhao ◽  
Jianfeng Xu ◽  
Yanlong Zang ◽  
Ran Hu

The diagnosis of abnormal transformer oil temperature is of great significance to guarantee the normal operation of the transformer. Due to concept drift, the oil temperature abnormal diagnosis of the oil-immersed main power transformer is usually unstable via the classic data mining method. Thus, this paper proposes an adaptive abnormal oil temperature diagnosis method (AAOTD) of the transformer based on concept drift. First, the bagging ensemble learning method was used to predict the oil temperature. Then, abnormal diagnosis was performed based on the difference between the predicted oil temperature and the actual measured oil temperature. At the same time, based on the concept drift detection strategy and Adaboost ensemble learning methods, adaptive update of the base classifier in the abnormal diagnosis model was realized. Experiments validated that the algorithm proposed in this paper can significantly reduce the influence of concept drift and has higher oil temperature prediction accuracy. Furthermore, since this method only utilizes the existing power grid data resources to realize abnormal oil temperature diagnosis without extra monitoring equipment, it is an economic and efficient solution for practical scenarios in the electric power industry.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Eugene Lin ◽  
Chieh-Hsin Lin ◽  
Hsien-Yuan Lane

AbstractGenetic variants such as single nucleotide polymorphisms (SNPs) have been suggested as potential molecular biomarkers to predict the functional outcome of psychiatric disorders. To assess the schizophrenia’ functional outcomes such as Quality of Life Scale (QLS) and the Global Assessment of Functioning (GAF), we leveraged a bagging ensemble machine learning method with a feature selection algorithm resulting from the analysis of 11 SNPs (AKT1 rs1130233, COMT rs4680, DISC1 rs821616, DRD3 rs6280, G72 rs1421292, G72 rs2391191, 5-HT2A rs6311, MET rs2237717, MET rs41735, MET rs42336, and TPH2 rs4570625) of 302 schizophrenia patients in the Taiwanese population. We compared our bagging ensemble machine learning algorithm with other state-of-the-art models such as linear regression, support vector machine, multilayer feedforward neural networks, and random forests. The analysis reported that the bagging ensemble algorithm with feature selection outperformed other predictive algorithms to forecast the QLS functional outcome of schizophrenia by using the G72 rs2391191 and MET rs2237717 SNPs. Furthermore, the bagging ensemble algorithm with feature selection surpassed other predictive algorithms to forecast the GAF functional outcome of schizophrenia by using the AKT1 rs1130233 SNP. The study suggests that the bagging ensemble machine learning algorithm with feature selection might present an applicable approach to provide software tools for forecasting the functional outcomes of schizophrenia using molecular biomarkers.


Sign in / Sign up

Export Citation Format

Share Document