scholarly journals Prediction of Customer Churn in Telecom Sector using Clustering Technique

These days the data is producing at an incredible rate. Handling and analyzing such a big data in a specific time is the main challenge today. Clustering is majorly familiar with analyzing the data visually and used for efficient decision making process. Clustering is broadly used in a range of applications like education, field of computer science, marketing, insurance, surveillance detection, fraud detection and scientific discovery to mine the functional information from the data. This paper concentrates on the unsupervised learning k-means clustering algorithm to perform the analysis on churn prediction on telecom sector. The selection of distance measures and the category of data that a clustering algorithm cans effort is a decisive step in clustering. It defines how two elements are resemblance with each other and how this resemblance will impact the outline of the clusters. Another foremost difficulty in clustering process is to determine the goodness or validity of the cluster. Hence this paper discusses and addresses the different issues with K-means clustering. Experimentation was done on china telecom data to identify analogous group of clients who more likely to prone from the services is a major task. The results were analyzed to identify best feature, distance measures and validity indices to get qualitative clusters.

2005 ◽  
Vol 17 (06) ◽  
pp. 324-331 ◽  
Author(s):  
KUANG-CHIUNG CHANG ◽  
CHENG WEN ◽  
MING-FENG YEH ◽  
REN-GUEY LEE

Similarity or distance measures play important role in the performance of algorithms for ECG clustering problems. This paper compares four similarity measures such as the city block (L1-norm), Euclidean (L2-norm), normalized correlation coefficient, and simplified grey relational grade for clustering of QRS complexes. Performances of the measures include classification accuracy, threshold value selection, noise robustness, execution time, and the capability of automated selection of templates. The clustering algorithm used is the so-called two-step unsupervised method. The best out of the 10 independent runs of the clustering algorithm with randomly selected initial template beat for each run is used to compare the performances of each similarity measure. To investigate the capability of automated selection of templates for ECG classification algorithms, we use the cluster centers generated by the clustering algorithm with various measures as templates. Four sets of templates are obtained, each set for a measure. And the four sets of templates are used in the k-nearest neighbor classification method to evaluate the performance of the templates. Tested with MIT/BIH arrhythmia data, we observe that the simplified grey relational grade outperforms the other measures in classification accuracy, threshold value selection, noise robustness, and the capability of automated selection of templates.


Sign in / Sign up

Export Citation Format

Share Document