EMR: Scalable Clustering of Big HR Data using Evolutionary MapReduce

Author(s):  
Mahdi Bohlouli ◽  
Zhonghua He
Keyword(s):  
Author(s):  
Renchi Yang ◽  
Jieming Shi ◽  
Yin Yang ◽  
Keke Huang ◽  
Shiqi Zhang ◽  
...  

Author(s):  
Mohamed Aymen Ben HajKacem ◽  
Chiheb-Eddine Ben N′Cir ◽  
Nadia Essoussi

Big Data clustering has become an important challenge in data analysis since several applications require scalable clustering methods to organize such data into groups of similar objects. Given the computational cost of most of the existing clustering methods, we propose in this paper a new clustering method, referred to as STiMR [Formula: see text]-means, able to provide good tradeoff between scalability and clustering quality. The proposed method is based on the combination of three acceleration techniques: sampling, triangle inequality and MapReduce. Sampling is used to reduce the number of data points when building cluster prototypes, triangle inequality is used to reduce the number of comparisons when looking for nearest clusters and MapReduce is used to configure a parallel framework for running the proposed method. Experiments performed on simulated and real datasets have shown the effectiveness of the proposed method, with the existing ones, in terms of running time, scalability and internal validity measures.


Sign in / Sign up

Export Citation Format

Share Document