A review of clustering algorithms: Comparison of DBSCAN and K-mean with oversampling and t-SNE
Abstract: The two most widely used and easily implementable algorithm for clustering and classification-based analysis of data in the unsupervised learning domain are Density-Based Spatial Clustering of Applications with Noise and K-mean cluster analysis. These two techniques can handle most cases effective when the data has a lot of randomness with no clear set to use as a parameter as in case of linear or logistic regression algorithms. However few papers exist that pit these two against each other in a controlled environment to observe which one reigns supreme and conditions required for the same. In this paper, a renal adenocarcinoma dataset is analyzed and thereafter both DBSCAN and K-mean are applied on the dataset with subsequent examination of the results. The efficacy of both the techniques in this study is compared and based on them the merits and demerits observed are enumerated. Further, the interaction of t-SNE with the generated clusters are explored.