imbalance dataset
Recently Published Documents


TOTAL DOCUMENTS

25
(FIVE YEARS 21)

H-INDEX

4
(FIVE YEARS 1)

Author(s):  
Bechoo Lal ◽  
Suraj Kumar

In this research paper the researcher builds a predictive model on churn customers using SMOTE and XG-Boost additive model and machine learning techniques in Telecommunication Industries. Customer’s churning is one of the global research issues in telecommunication industries. In somehow customers are not satisfying from telecommunication customer services, call rate, international plan, data pack, and others which are having a significant impact on customer’s services. The researcher used the SMOTE and XGboost technique to handle the imbalanced dataset and gives the higher-level accuracy for predictive model to identify the category of customer whether they are in churn or not churn. The researcher used the comparative study between logistics regression and random forest algorithms to classify the category of churn customers and non-churn customers in Telecommunication Industries. The predictive model is verifying at 96% accuracy level and can be capable to handle imbalance dataset. As per the data analysis the score of the confusion matrix is such as accuracy 94%, Precision for “ did not leave “ is 0.97 whereas recall is 0.96, and F1score is 0.97 with the support features of 903. For the churn customers precision is 0.80, recall is 0.81, F1-score is 0.80 and support features is 160, the data analysis report shows that the predictive model is having 94% accuracy whereas at 6% does not predict accurately about the customers status. Finally, the researcher concluded that the predictive model is more accurate and can be capable to handle imbalance dataset. The researchers assure that the predictive model would be benefited for the telecommunication industries to categories the churn/ non-churn customers and accordingly the organization can make changes their business plan and policies which would be benefited for the customers.


Information ◽  
2021 ◽  
Vol 12 (8) ◽  
pp. 291
Author(s):  
Moussa Diallo ◽  
Shengwu Xiong ◽  
Eshete Derb Emiru ◽  
Awet Fesseha ◽  
Aminu Onimisi Abdulsalami ◽  
...  

Classification algorithms have shown exceptional prediction results in the supervised learning area. These classification algorithms are not always efficient when it comes to real-life datasets due to class distributions. As a result, datasets for real-life applications are generally imbalanced. Several methods have been proposed to solve the problem of class imbalance. In this paper, we propose a hybrid method combining the preprocessing techniques and those of ensemble learning. The original training set is undersampled by evaluating the samples by stochastic measurement (SM) and then training these samples selected by Multilayer Perceptron to return a balanced training set. The MLPUS (Multilayer perceptron undersampling) balanced training set is aggregated using the bagging ensemble method. We applied our method to the real-life Niger_Rice dataset and forty-four other imbalanced datasets from the KEEL repository in this study. We also compared our method with six other existing methods in the literature, such as the MLP classifier on the original imbalance dataset, MLPUS, UnderBagging (combining random under-sampling and bagging), RUSBoost, SMOTEBagging (Synthetic Minority Oversampling Technique and bagging), SMOTEBoost. The results show that our method is competitive compared to other methods. The Niger_Rice real-life dataset results are 75.6, 0.73, 0.76, and 0.86, respectively, for accuracy, F-measure, G-mean, and ROC with our proposed method. In contrast, the MLP classifier on the original imbalance Niger_Rice dataset gives results 72.44, 0.82, 0.59, and 0.76 respectively for accuracy, F-measure, G-mean, and ROC.


Classification is a major obstacle in Machine Learning generally and also specific when tackling class imbalance problem. A dataset is said to be imbalanced if a class we are interested in falls to the minority class and appears scanty when compared to the majority class, the minority class is also known as the positive class while the majority class is also known as the negative class. Class imbalance has been a major bottleneck for Machine Learning scientist as it often leads to using wrong model for different purposes, this Survey will lead researchers to choose the right model and the best strategies to handle imbalance dataset in the course of tackling machine learning problems. Proper handling of class imbalance dataset could leads to accurate and good result. Handling class imbalance data in a conventional manner, especially when the level of imbalance is high may leads to accuracy paradox (an assumption of realizing 99% accuracy during evaluation process when the class distribution is highly imbalanced), hence imbalance class distribution requires special consideration, and for this purpose we dealt extensively on handling and solving imbalanced class problem in machine learning, such as Data Sampling Approach, Cost sensitive learning approach and Ensemble Approach.


Author(s):  
Saiful Islam ◽  
Umme Sara ◽  
Abu Kawsar ◽  
Anichur Rahman ◽  
Dipanjali Kundu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document