Repartitioned Optimized K-Mean Centroid Based Partitioned Clustering using Map Reduce in Analyzing High Dimensional Big Data
With the advent of IoT, number of IOT-devices are deployed in the city to acquisition data. These devices acquire enormous data and to analyze such data one need to configure novel hardware to scale up the existing servers and need to develop an application with précised framework. This work recommends an adapted scale out approach in which huge multi-dimensional datasets can be processed using existing commodity hardware. In this approach, Hadoop Distributed File System (HDFS) holds the huge multi-dimensional data to be processed and it can be processed and analyzed by using MapReduce (MR) framework. In the proposed approach, we implemented an optimized repartitioned K-Means centroid based partitioning clustering algorithm using MR framework for Smart City dataset. This dataset contains 10 million objects and each object has six attributes. The results show that the proposed approach is a scalable approach to compute intra cluster density and inter cluster density effectively.