Big Data Clustering and Hadoop Distributed File System Architecture
Due to advancement in the technological world, there is a great surge in data. The main sources of generating such a large amount of data are social websites, internet sites etc. The large data files are combined together to create a big data architecture. Managing the data file in such a large volume is not easy. Therefore, modern techniques are developed to manage bulk data. To arrange and utilize such big data, Hadoop Distributed File System (HDFS) architecture from Hadoop was presented in the early stage of 2015. This architecture is used when traditional methods are insufficient to manage the data. In this paper, a novel clustering algorithm is implemented to manage a large amount of data. The concepts and frames of Big Data are studied. A novel algorithm is developed using the K means and cosine-based similarity clustering in this paper. The developed clustering algorithm is evaluated using the precision and recall parameters. The prominent results are obtained which successfully manages the big data issue.