Flight delay classification warning based on evolutionary under-sampling bagging ensemble learning

2021 ◽  
Author(s):  
Ying Yu ◽  
Haiyan Chen ◽  
Ligang Yuan ◽  
Bing Zhang
Author(s):  
Shaojian Qiu ◽  
Lu Lu ◽  
Siyu Jiang ◽  
Yang Guo

Machine-learning-based software defect prediction (SDP) methods are receiving great attention from the researchers of intelligent software engineering. Most existing SDP methods are performed under a within-project setting. However, there usually is little to no within-project training data to learn an available supervised prediction model for a new SDP task. Therefore, cross-project defect prediction (CPDP), which uses labeled data of source projects to learn a defect predictor for a target project, was proposed as a practical SDP solution. In real CPDP tasks, the class imbalance problem is ubiquitous and has a great impact on performance of the CPDP models. Unlike previous studies that focus on subsampling and individual methods, this study investigated 15 imbalanced learning methods for CPDP tasks, especially for assessing the effectiveness of imbalanced ensemble learning (IEL) methods. We evaluated the 15 methods by extensive experiments on 31 open-source projects derived from five datasets. Through analyzing a total of 37504 results, we found that in most cases, the IEL method that combined under-sampling and bagging approaches will be more effective than the other investigated methods.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 88322-88332
Author(s):  
Yuanyuan Wang ◽  
Yongsheng Guo ◽  
Xiangjun Zeng ◽  
Jun Chen ◽  
Yang Kong ◽  
...  

2021 ◽  
Vol 25 (4) ◽  
pp. 825-846
Author(s):  
Ahmad Jaffar Khan ◽  
Basit Raza ◽  
Ahmad Raza Shahid ◽  
Yogan Jaya Kumar ◽  
Muhammad Faheem ◽  
...  

Almost all real-world datasets contain missing values. Classification of data with missing values can adversely affect the performance of a classifier if not handled correctly. A common approach used for classification with incomplete data is imputation. Imputation transforms incomplete data with missing values to complete data. Single imputation methods are mostly less accurate than multiple imputation methods which are often computationally much more expensive. This study proposes an imputed feature selected bagging (IFBag) method which uses multiple imputation, feature selection and bagging ensemble learning approach to construct a number of base classifiers to classify new incomplete instances without any need for imputation in testing phase. In bagging ensemble learning approach, data is resampled multiple times with substitution, which can lead to diversity in data thus resulting in more accurate classifiers. The experimental results show the proposed IFBag method is considerably fast and gives 97.26% accuracy for classification with incomplete data as compared to common methods used.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Xiao-Yan Gao ◽  
Abdelmegeid Amin Ali ◽  
Hassan Shaban Hassan ◽  
Eman M. Anwar

Heart disease is the deadliest disease and one of leading causes of death worldwide. Machine learning is playing an essential role in the medical side. In this paper, ensemble learning methods are used to enhance the performance of predicting heart disease. Two features of extraction methods: linear discriminant analysis (LDA) and principal component analysis (PCA), are used to select essential features from the dataset. The comparison between machine learning algorithms and ensemble learning methods is applied to selected features. The different methods are used to evaluate models: accuracy, recall, precision, F-measure, and ROC.The results show the bagging ensemble learning method with decision tree has achieved the best performance.


2020 ◽  
Author(s):  
Huibing Zhang ◽  
Junchao Dong

Abstract In recent years, with the rapid development of wireless communication network, M-Commerce has achieved great success. Relying on mobile phones, tablets and other wireless communication devices for online shopping has become a mainstream way for users to consume. Users leave a lot of historical behavior data when shopping on the M-Commerce platform. Using these data to predict future purchasing behaviors of the users will be of great significance for improving user experience and realizing mutual benefit and win-win result between merchant and user. Therefore, a sample balance-based multi-perspective feature ensemble learning was proposed in this study as the solution to predicting user purchasing behaviors, specifically including: 1) “Sliding window”-centroid under-sampling was combined with sample balance method was used, while the positive sample size was enlarged using “sliding window”, centroid under-sampling was used to reduce the negative sample size within “sliding window”, so as to acquire user’s historical purchasing behavioral data with sample balance. 2) Influence feature of user purchasing behaviors were extracted from three perspectives—user, commodity and interaction, in order to further enrich the feature dimensions. Meanwhile, feature selection was carried out using XGBSFS algorithm. 3) An ensemble learning model—five-fold cross validation stacking—which could be used to predict user purchasing behaviors was raised. Three prediction models—XGBoost-Logistics, LightGBM-L2 and cascaded deep forest models—so that they could realize mutual collaboration and the overall prediction ability of the ensemble learning model could be improved. 4) Large-scale real datasets were experimented on Alibaba M-Commerce platform. The experimental results show that the proposed method has achieved better prediction effect in various evaluation indexes such as precision and recall rate.


Information ◽  
2021 ◽  
Vol 12 (8) ◽  
pp. 291
Author(s):  
Moussa Diallo ◽  
Shengwu Xiong ◽  
Eshete Derb Emiru ◽  
Awet Fesseha ◽  
Aminu Onimisi Abdulsalami ◽  
...  

Classification algorithms have shown exceptional prediction results in the supervised learning area. These classification algorithms are not always efficient when it comes to real-life datasets due to class distributions. As a result, datasets for real-life applications are generally imbalanced. Several methods have been proposed to solve the problem of class imbalance. In this paper, we propose a hybrid method combining the preprocessing techniques and those of ensemble learning. The original training set is undersampled by evaluating the samples by stochastic measurement (SM) and then training these samples selected by Multilayer Perceptron to return a balanced training set. The MLPUS (Multilayer perceptron undersampling) balanced training set is aggregated using the bagging ensemble method. We applied our method to the real-life Niger_Rice dataset and forty-four other imbalanced datasets from the KEEL repository in this study. We also compared our method with six other existing methods in the literature, such as the MLP classifier on the original imbalance dataset, MLPUS, UnderBagging (combining random under-sampling and bagging), RUSBoost, SMOTEBagging (Synthetic Minority Oversampling Technique and bagging), SMOTEBoost. The results show that our method is competitive compared to other methods. The Niger_Rice real-life dataset results are 75.6, 0.73, 0.76, and 0.86, respectively, for accuracy, F-measure, G-mean, and ROC with our proposed method. In contrast, the MLP classifier on the original imbalance Niger_Rice dataset gives results 72.44, 0.82, 0.59, and 0.76 respectively for accuracy, F-measure, G-mean, and ROC.


Sign in / Sign up

Export Citation Format

Share Document