The Support Vectors and Random Forest Methods Analysis in the Forecasting Customer Churn Problem in Banking Services

A major and demanding issue in the telecommunications industry is the prediction of churn customers. Churn describes the customer who is attrite from one Telecom service provider to competitors searching for better services offers. Companies from the Telco sector frequently have customer relationship management offices it is the main objective in how to win back defecting clients because preserve long-term customers can be much more beneficial to a company than gain newly recruited customers. Researchers and practitioners are paying great attention and investing more in developing a robust customer churn prediction model, especially in the telecommunication business by proposed numerous machine learning approaches. Many approaches of Classification are established, but the most effective in recent times is a tree-based method. The main contribution of this research is to predict churners/non-churners in the Telecom sector based on project pursuit Random Forest (PPForest) that uses discriminant feature analysis as a novelty extension of the conventional Random Forest approach for learning oblique Project Pursuit tree (PPtree). The proposed methodology leverages the advantage of two discriminant analysis methods to calculate the project index used in the construction of PPtree. The first method used Support Vector Machines (SVM) as a classifier in the construction of PPForest to differentiate between churners and non-churners customers. The second method is a Linear Discriminant Analysis (LDA) to achieve linear splitting of variables node during oblique PPtree construction to produce individual classifiers that are robust and more diverse than classical Random Forest. It found that the proposed methods enjoy the best performance measurements e.g. Accuracy, hit rate, ROC curve, Gini coefficient, Kolmogorov-Smirnov statistic and lift coefficient, H-measure, AUC. Moreover, PPForest based on direct applied of LDA on the raw data delivers an effective evaluator for the customer churn prediction model.

Download Full-text

Customer Churn Prediction and Upselling using MRF (Modified Random Forest) technique

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8392.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 475-482

Keyword(s):

Data Mining ◽

Comparative Analysis ◽

Random Forest ◽

Prediction Model ◽

Classification Accuracy ◽

Churn Prediction ◽

Customer Churn ◽

Customer Churn Prediction ◽

Telecom Industry ◽

Fierce Competition

Customer Churn Prediction has become one of the eminent topic in the telecom industry, it has gained a lot of attention in the research industry due to fierce competition from the various, and hence companies have focused on the larger size of the data for churning and upselling prediction. The model of customer churn prediction detects and identify the customer who are willing to terminate the subscription, customer churn prediction and upselling can be done through the data mining process. Hence, In this paper we have introduce a model Named MRF(Modified Random Forest), this model helps in enhancing the accuracy and also helps in ignoring the regression issue. Our methodology has been performed on the provided orange Datasets. For the evaluation of our algorithm comparative analysis between the existing and proposed methodology is done considering the two scenario i.e. churn and upselling. Later our model is compared with the various existing churn prediction model, the result of the analysis indicates that our model outperforms the existing method including the standard random forest in terms of AUC and classification accuracy.

Download Full-text

Handling imbalanced data in customer churn prediction using combined sampling and weighted random forest

2014 2nd International Conference on Information and Communication Technology (ICoICT) ◽

10.1109/icoict.2014.6914086 ◽

2014 ◽

Cited By ~ 9

Author(s):

Veronikha Effendy ◽

Adiwijaya ◽

Z. K. A. Baizal

Keyword(s):

Random Forest ◽

Imbalanced Data ◽

Churn Prediction ◽

Customer Churn ◽

Customer Churn Prediction

Download Full-text

Application of Feature Extraction Method in Customer Churn Prediction Based on Random Forest and Transduction

Journal of Convergence Information Technology ◽

10.4156/jcit.vol5.issue3.11 ◽

2010 ◽

Vol 5 (3) ◽

pp. 73-78 ◽

Cited By ~ 1

Author(s):

Qiu Yihui ◽

Mi Hong

Keyword(s):

Feature Extraction ◽

Random Forest ◽

Extraction Method ◽

Churn Prediction ◽

Feature Extraction Method ◽

Customer Churn ◽

Customer Churn Prediction

Download Full-text

Research on Ctrip Customer Churn Prediction Model Based on Random Forest

Business Intelligence and Information Technology - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-030-92632-8_48 ◽

2021 ◽

pp. 511-523

Author(s):

Zhijie Zhao ◽

Wanting Zhou ◽

Zeguo Qiu ◽

Ang Li ◽

Jiaying Wang

Keyword(s):

Random Forest ◽

Prediction Model ◽

Churn Prediction ◽

Customer Churn ◽

Model Based ◽

Customer Churn Prediction

Download Full-text

Customer churn prediction based on LASSO and Random Forest models

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/631/5/052008 ◽

2019 ◽

Vol 631 ◽

pp. 052008

Author(s):

Qiannan Zhu ◽

Xinyi Yu ◽

Yuankang Zhao ◽

Deyi Li

Keyword(s):

Random Forest ◽

Churn Prediction ◽

Customer Churn ◽

Forest Models ◽

Random Forest Models ◽

Customer Churn Prediction

Download Full-text

Penanganan Ketidakseimbangan Data pada Prediksi Customer Churn Menggunakan Kombinasi SMOTE dan Boosting

IJCIT (Indonesian Journal on Computer and Information Technology) ◽

10.31294/ijcit.v6i1.9545 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Nana Suryana ◽

Pratiwi Pratiwi ◽

Rizki Tri Prasetio

Keyword(s):

Data Mining ◽

Deep Learning ◽

Random Forest ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Customer Churn ◽

Number Of Customers

Industri telekomunikasi menghadapi persaingan yang ketat antara penyedia layanan (service provider). Persaingan ini mengakibatkan customer churn atau berpindahnya pelanggan dari satu layanan ke layanan lain. Customer churn menjadi masalah utama karena dapat mempengaruhi pendapatan perusahaan, profitabilitas, serta kelangsungan hidup perusahaan. Oleh karena itu, mengetahui pelanggan yang akan melakukan churn secara dini menjadi salah satu cara yang cukup efektif dilakukan, karena dapat membantu perusahaan dalam membuat rencana yang efektif untuk tetap mempertahankan pelanggannya. Jumlah pelanggan yang mengundurkan diri dari layanannya saat ini biasanya dimiliki perusahaan dalam jumlah yang sedikit. Kondisi kekurangan data ini menyebabkan kesulitan dalam memprediksi customer churn. Tujuan umum dari penelitian ini adalah memprediksi pelanggan yang akan berpindah ke layanan lain atau mengundurkan diri dari layanannya saat ini. Sementara tujuan khusus penelitian Penelitian ini berusaha menangani ketidakseimbangan data dalam prediksi customer churn menggunakan optimasi pada level data melalui metode sampling yaitu Synthetic Minority Over Sampling. Kemudian dikombinasikan dengan optimasi level algoritma melalui pendekatan teknik Boosting. Pada penelitian beberapa algoritma prediksi seperti random forest, naïve bayes, decision tree, k-nearest neighbor dan deep learning yang akan diimplementasikan untuk mengetahui algoritma yang paling baik setelah dilakukan optimasi menggunakan SMOTE dan Boosting. Metode penelitian yang digunakan pada penelitian ini adalah CRISP-DM, yang merupakan kerangka penelitian data mining untuk penelitian lintas industri. Hasil penelitian ini menunjukan bahwa algoritma random forest merupakan algoritma yang menghasilkan akurasi paling optimal setelah dioptimasi menggunakan SMOTE dan Boosting dengan hasil akurasi 89,19%. The telecommunications industry faces stiff competition between service providers. This competition results in customer churn. Customer churn is a major problem because it can affect company revenue, profitability, survival, and service quality of the company. Therefore, knowing which customers will churn in the future early is one of the most effective ways to do it, because it can help companies make an effective plan to keep their customers. The number of customers who withdrew from its current services is usually owned by a small number. This lack of data causes difficulties in predicting customer churn. This problem then becomes a challenging issue in machine learning. The general purpose of this research is to predict customers who will churn. While the specific purpose of this research is to try to deal with data imbalances in predicting customer churn using optimization at the data level through the sampling method, namely Synthetic Minority Over Sampling (SMOTE). Then combined with algorithm level optimization through the Boosting technique approach. In this study, several prediction algorithms like the random forest, naïve Bayes, decision tree, k-nearest neighbor, and deep learning will be implemented to find out the best algorithm after optimization using SMOTE and Boosting. The method used in this study is CRISP-DM, which is a data mining research framework for cross-industry research. The results of this study indicate that the random forest algorithm is an algorithm that produces the most optimal accuracy after being optimized using SMOTE and Boosting with an accuracy of 89.19%.

Download Full-text

Customer Churn Prediction In Telecommunication Industry Using Random Forest Classifier

2020 International Conference on System, Computation, Automation and Networking (ICSCAN) ◽

10.1109/icscan49426.2020.9262288 ◽

2020 ◽

Author(s):

V. Geetha ◽

A. Punitha ◽

A. Nandhini ◽

T. Nandhini ◽

S. Shakila ◽

...

Keyword(s):

Random Forest ◽

Random Forest Classifier ◽

Telecommunication Industry ◽

Churn Prediction ◽

Customer Churn ◽

Customer Churn Prediction

Download Full-text

Postulation of Customer Retention in Banking Sector using Machine Learning and Principal Component

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8020.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 3178-3182

Keyword(s):

Machine Learning ◽

Random Forest ◽

Performance Analysis ◽

Dimensionality Reduction ◽

Banking Sector ◽

Principal Component ◽

Experimental Results ◽

Data Set ◽

Customer Churn ◽

Kernel Svm

Recently, there is a rapid growth in technological improvement in banking sector. The entire world is using the banking service for managing their financial and property assets. As of now, all the technological advancements are applied to banking sector to facilitate the customers with proper operational excellence. In this view, the bank has complete responsibility in serving the people with their modern application to save their time and wealth. So the customer value analysis is needed for the bank to enrich the marketing growth and turnover of the bank. But still, the prediction of customer churn remains a challenging issue for the banking sector for analyzing the profit growth. With this view, we focus on predicting the customer churn for the banking application. This paper uses the churn modeling data set extracted from UCI Machine Learning Repository. The anaconda Navigator IDE along with Spyder is used for implementing the Python code. Our contribution is folded is folded in three ways. First, the data set is applied to various classifiers like Logistic Regression, KNN, Kernel SVM, Naive Bayes, Decision Tree, Random Forest to analyze the confusion matrix. The Performance analysis is done by comparing the metrics like Precision, Recall, FScore and Accuracy. Second the data set is subjected to dimensionality reduction method using Principal component Analysis and then fitted to the above mentioned classifiers and their performance analysis is done. Third, the performance analysis is done for the dataset by comparing the metrics with and without applying the dimensionality reduction. A Performance analysis is done with various classification algorithms and comparative study is done with the performance metric such as accuracy, precision, recall, and f-score. The implementation is carried out with python code using Anaconda Navigator. Experimental results shows that before applying dimensionality reduction PCA, the Random Forest classifier is found to be effective with the accuracy of 86%, Precision of 0.85, Recall of 0.86 and FScore of 0.84. Experimental results shows that after applying dimensionality reduction, the 2 component PCA with the kernel SVM classifier is found to be effective with the accuracy of 81%, Precision of 0.81, Recall of 0.81 and FScore of 0.74. compared to other classifiers.

Download Full-text

Uji Performa Teknik Klasifikasi untuk Memprediksi Customer Churn

Bianglala Informatika ◽

10.31294/bi.v9i1.9992 ◽

2021 ◽

Vol 9 (1) ◽

pp. 37-45

Author(s):

Anggito Wicaksono ◽

Anita Anita ◽

Tesa Nur Padilah

Keyword(s):

Logistic Regression ◽

Feature Selection ◽

Random Forest ◽

Decision Tree ◽

Service Provider ◽

Internet Service Provider ◽

Internet Service ◽

Customer Churn ◽

Backward Elimination ◽

Good Classification

Perkembangan industri telekomunikasi sangatlah cepat, hal ini dapat dilihat dari perilaku masyarakat yang menggunakan internet dalam berkomunikasi. Perilaku ini menyebabkan banyaknya perusahaan telekomunikasi dan meningkatnya internet service provider yang dapat menimbulkan persaingan antar provider. Pelanggan memiliki hak dalam memilih provider yang sesuai dan dapat beralih dari provider sebelumnya yang diartikan sebagai customer churn. Peralihan ini dapat menyebabkan berkurangnya pendapatan bagi perusahaan telekomunikasi sehingga penting untuk ditangani. Tujuan dari penelitian ini yaitu untuk mengetahui algoritme klasifikasi terbaik dan sesuai pada permasalahan customer churn. Penelitian ini dilakukan berdasarkan metode CRISP-DM sebagai alur penelitian dengan menerapkan tiga algoritme klasifikasi yaitu Logistic Regression, Decision Tree, dan Random Forest, yang dibantu dengan metode feature selection yaitu Backward Elimination untuk mengurangi variabel yang tidak signifikan. Hasil dari penelitian ini memperoleh bahwa algoritme Logistic Regression dengan Backward Elimination merupakan algoritme terbaik dengan nilai akurasi sebesar 82,23%, recall 57,22%, dan AUC sebesar 0,853 yang termasuk pada pemodelan good classification.

Download Full-text