The Support Vectors and Random Forest Methods Analysis in the Forecasting Customer Churn Problem in Banking Services

2021 ◽  
pp. 329-336
Author(s):  
Roman I. Dzerzhinsky ◽  
M. D. Trifonov ◽  
E. V. Ledovskaya
Author(s):  
Asia Mahdi Naser alzubaidi ◽  
Eman Salih Al-Shamery

A major and demanding issue in the telecommunications industry is the prediction of churn customers. Churn describes the customer who is attrite from one Telecom service provider to competitors searching for better services offers. Companies from the Telco sector frequently have customer relationship management offices it is the main objective in how to win back defecting clients because preserve long-term customers can be much more beneficial to a company than gain newly recruited customers. Researchers and practitioners are paying great attention and investing more in developing a robust customer churn prediction model, especially in the telecommunication business by proposed numerous machine learning approaches. Many approaches of Classification are established, but the most effective in recent times is a tree-based method. The main contribution of this research is to predict churners/non-churners in the Telecom sector based on project pursuit Random Forest (PPForest) that uses discriminant feature analysis as a novelty extension of the conventional Random Forest approach for learning oblique Project Pursuit tree (PPtree). The proposed methodology leverages the advantage of two discriminant analysis methods to calculate the project index used in the construction of PPtree. The first method used Support Vector Machines (SVM) as a classifier in the construction of PPForest to differentiate between churners and non-churners customers. The second method is a Linear Discriminant Analysis (LDA) to achieve linear splitting of variables node during oblique PPtree construction to produce individual classifiers that are robust and more diverse than classical Random Forest. It found that the proposed methods enjoy the best performance measurements e.g. Accuracy, hit rate, ROC curve, Gini coefficient, Kolmogorov-Smirnov statistic and lift coefficient, H-measure, AUC. Moreover, PPForest based on direct applied of LDA on the raw data delivers an effective evaluator for the customer churn prediction model.


Customer Churn Prediction has become one of the eminent topic in the telecom industry, it has gained a lot of attention in the research industry due to fierce competition from the various, and hence companies have focused on the larger size of the data for churning and upselling prediction. The model of customer churn prediction detects and identify the customer who are willing to terminate the subscription, customer churn prediction and upselling can be done through the data mining process. Hence, In this paper we have introduce a model Named MRF(Modified Random Forest), this model helps in enhancing the accuracy and also helps in ignoring the regression issue. Our methodology has been performed on the provided orange Datasets. For the evaluation of our algorithm comparative analysis between the existing and proposed methodology is done considering the two scenario i.e. churn and upselling. Later our model is compared with the various existing churn prediction model, the result of the analysis indicates that our model outperforms the existing method including the standard random forest in terms of AUC and classification accuracy.


Author(s):  
Nana Suryana ◽  
Pratiwi Pratiwi ◽  
Rizki Tri Prasetio

Industri telekomunikasi menghadapi persaingan yang ketat antara penyedia layanan (service provider). Persaingan ini mengakibatkan customer churn atau berpindahnya pelanggan dari satu layanan ke layanan lain. Customer churn menjadi masalah utama karena dapat mempengaruhi pendapatan perusahaan, profitabilitas, serta kelangsungan hidup perusahaan. Oleh karena itu, mengetahui pelanggan yang akan melakukan churn secara dini menjadi salah satu cara yang cukup efektif dilakukan, karena dapat membantu perusahaan dalam membuat rencana yang efektif untuk tetap mempertahankan pelanggannya. Jumlah pelanggan yang mengundurkan diri dari layanannya saat ini biasanya dimiliki perusahaan dalam jumlah yang sedikit. Kondisi kekurangan data ini menyebabkan kesulitan dalam memprediksi customer churn. Tujuan umum dari penelitian ini adalah memprediksi pelanggan yang akan berpindah ke layanan lain atau mengundurkan diri dari layanannya saat ini. Sementara tujuan khusus penelitian Penelitian ini berusaha menangani ketidakseimbangan data dalam prediksi customer churn menggunakan optimasi pada level data melalui metode sampling yaitu Synthetic Minority Over Sampling. Kemudian dikombinasikan dengan optimasi level algoritma melalui pendekatan teknik Boosting. Pada penelitian beberapa algoritma prediksi seperti random forest, naïve bayes, decision tree, k-nearest neighbor dan deep learning yang akan diimplementasikan untuk mengetahui algoritma yang paling baik setelah dilakukan optimasi menggunakan SMOTE dan Boosting. Metode penelitian yang digunakan pada penelitian ini adalah CRISP-DM, yang merupakan kerangka penelitian data mining untuk penelitian lintas industri. Hasil penelitian ini menunjukan bahwa algoritma random forest merupakan algoritma yang menghasilkan akurasi paling optimal setelah dioptimasi menggunakan SMOTE dan Boosting dengan hasil akurasi 89,19%. The telecommunications industry faces stiff competition between service providers. This competition results in customer churn. Customer churn is a major problem because it can affect company revenue, profitability, survival, and service quality of the company. Therefore, knowing which customers will churn in the future early is one of the most effective ways to do it, because it can help companies make an effective plan to keep their customers. The number of customers who withdrew from its current services is usually owned by a small number. This lack of data causes difficulties in predicting customer churn. This problem then becomes a challenging issue in machine learning. The general purpose of this research is to predict customers who will churn. While the specific purpose of this research is to try to deal with data imbalances in predicting customer churn using optimization at the data level through the sampling method, namely Synthetic Minority Over Sampling (SMOTE). Then combined with algorithm level optimization through the Boosting technique approach. In this study, several prediction algorithms like the random forest, naïve Bayes, decision tree, k-nearest neighbor, and deep learning will be implemented to find out the best algorithm after optimization using SMOTE and Boosting. The method used in this study is CRISP-DM, which is a data mining research framework for cross-industry research. The results of this study indicate that the random forest algorithm is an algorithm that produces the most optimal accuracy after being optimized using SMOTE and Boosting with an accuracy of 89.19%.


2019 ◽  
Vol 8 (4) ◽  
pp. 3178-3182

Recently, there is a rapid growth in technological improvement in banking sector. The entire world is using the banking service for managing their financial and property assets. As of now, all the technological advancements are applied to banking sector to facilitate the customers with proper operational excellence. In this view, the bank has complete responsibility in serving the people with their modern application to save their time and wealth. So the customer value analysis is needed for the bank to enrich the marketing growth and turnover of the bank. But still, the prediction of customer churn remains a challenging issue for the banking sector for analyzing the profit growth. With this view, we focus on predicting the customer churn for the banking application. This paper uses the churn modeling data set extracted from UCI Machine Learning Repository. The anaconda Navigator IDE along with Spyder is used for implementing the Python code. Our contribution is folded is folded in three ways. First, the data set is applied to various classifiers like Logistic Regression, KNN, Kernel SVM, Naive Bayes, Decision Tree, Random Forest to analyze the confusion matrix. The Performance analysis is done by comparing the metrics like Precision, Recall, FScore and Accuracy. Second the data set is subjected to dimensionality reduction method using Principal component Analysis and then fitted to the above mentioned classifiers and their performance analysis is done. Third, the performance analysis is done for the dataset by comparing the metrics with and without applying the dimensionality reduction. A Performance analysis is done with various classification algorithms and comparative study is done with the performance metric such as accuracy, precision, recall, and f-score. The implementation is carried out with python code using Anaconda Navigator. Experimental results shows that before applying dimensionality reduction PCA, the Random Forest classifier is found to be effective with the accuracy of 86%, Precision of 0.85, Recall of 0.86 and FScore of 0.84. Experimental results shows that after applying dimensionality reduction, the 2 component PCA with the kernel SVM classifier is found to be effective with the accuracy of 81%, Precision of 0.81, Recall of 0.81 and FScore of 0.74. compared to other classifiers.


2021 ◽  
Vol 9 (1) ◽  
pp. 37-45
Author(s):  
Anggito Wicaksono ◽  
Anita Anita ◽  
Tesa Nur Padilah

Perkembangan industri telekomunikasi sangatlah cepat, hal ini dapat dilihat dari perilaku masyarakat yang menggunakan internet dalam berkomunikasi. Perilaku ini menyebabkan banyaknya perusahaan telekomunikasi dan meningkatnya internet service provider yang dapat menimbulkan persaingan antar provider. Pelanggan memiliki hak dalam memilih provider yang sesuai dan dapat beralih dari provider sebelumnya yang diartikan sebagai customer churn. Peralihan ini dapat menyebabkan berkurangnya pendapatan bagi perusahaan telekomunikasi sehingga penting untuk ditangani. Tujuan dari penelitian ini yaitu untuk mengetahui algoritme klasifikasi terbaik dan sesuai pada permasalahan customer churn. Penelitian ini dilakukan berdasarkan metode CRISP-DM sebagai alur penelitian dengan menerapkan tiga algoritme klasifikasi yaitu Logistic Regression, Decision Tree, dan Random Forest, yang dibantu dengan metode feature selection yaitu Backward Elimination untuk mengurangi variabel yang tidak signifikan. Hasil dari penelitian ini memperoleh bahwa algoritme Logistic Regression dengan Backward Elimination merupakan algoritme terbaik dengan nilai akurasi sebesar 82,23%, recall 57,22%, dan AUC sebesar 0,853 yang termasuk pada pemodelan good classification.


Sign in / Sign up

Export Citation Format

Share Document