The Research on Large Scale Data Set Clustering Algorithm Based on Tag Set

Part Priority Clustering Algorithm for Large-Scale Data Set

2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics ◽

10.1109/ihmsc.2013.100 ◽

2013 ◽

Author(s):

Zhihao Yin ◽

Bencheng Yu ◽

Zhifeng Wang ◽

Wang Ran

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Data Set ◽

Large Scale Data ◽

Scale Data

Download Full-text

Parallel Implementation of Improved K-Means Based on a Cloud Platform

Information Technology And Control ◽

10.5755/j01.itc.48.4.23881 ◽

2019 ◽

Vol 48 (4) ◽

pp. 673-681

Author(s):

Shufen Zhang ◽

Zhiyu Liu ◽

Xuebin Chen ◽

Changyin Luo

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Programming Model ◽

Parallel Implementation ◽

Clustering Algorithms ◽

Data Set ◽

Large Scale Data ◽

Sample Density ◽

Scale Data ◽

Selection Of

In order to solve the problem of traditional K-Means clustering algorithm in dealing with large-scale data set, a Hadoop K-Means (referred to HKM) clustering algorithm is proposed. Firstly, according to the sample density, the algorithm eliminates the effects of noise points in the data set. Secondly, it optimizes the selection of the initial center point using the thought of the max-min distance. Finally, it uses a MapReduce programming model to realize the parallelization. Experimental results show that the proposed algorithm not only has high accuracy and stability in clustering results, but can also solve the problems of scalability encountered by traditional clustering algorithms in dealing with large scale data.

Download Full-text

Clustering Large-Scale Data Based On Modified Affinity Propagation Algorithm

Journal of Artificial Intelligence and Soft Computing Research ◽

10.1515/jaiscr-2016-0003 ◽

2016 ◽

Vol 6 (1) ◽

pp. 23-33 ◽

Cited By ~ 23

Author(s):

Ahmed M. Serdah ◽

Wesam M. Ashour

Keyword(s):

Data Clustering ◽

Large Scale ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Affinity Propagation ◽

Clustering Method ◽

Data Set ◽

Local Cluster ◽

Large Scale Data ◽

Scale Data

Abstract Traditional clustering algorithms are no longer suitable for use in data mining applications that make use of large-scale data. There have been many large-scale data clustering algorithms proposed in recent years, but most of them do not achieve clustering with high quality. Despite that Affinity Propagation (AP) is effective and accurate in normal data clustering, but it is not effective for large-scale data. This paper proposes two methods for large-scale data clustering that depend on a modified version of AP algorithm. The proposed methods are set to ensure both low time complexity and good accuracy of the clustering method. Firstly, a data set is divided into several subsets using one of two methods random fragmentation or K-means. Secondly, subsets are clustered into K clusters using K-Affinity Propagation (KAP) algorithm to select local cluster exemplars in each subset. Thirdly, the inverse weighted clustering algorithm is performed on all local cluster exemplars to select well-suited global exemplars of the whole data set. Finally, all the data points are clustered by the similarity between all global exemplars and each data point. Results show that the proposed clustering method can significantly reduce the clustering time and produce better clustering result in a way that is more effective and accurate than AP, KAP, and HAP algorithms.

Download Full-text

Affinity propagation clustering algorithm based on large-scale data-set

International Journal of Computers and Applications ◽

10.1080/1206212x.2018.1425184 ◽

2018 ◽

Vol 40 (3) ◽

pp. 1-6 ◽

Cited By ~ 4

Author(s):

Limin Wang ◽

Kaiyue Zheng ◽

Xing Tao ◽

Xuming Han

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Affinity Propagation ◽

Data Set ◽

Large Scale Data ◽

Affinity Propagation Clustering ◽

Scale Data

Download Full-text

Landmark FN-DBSCAN: An Efficient Density-Based Clustering Algorithm with Fuzzy Neighborhood

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2013.p0060 ◽

2013 ◽

Vol 17 (1) ◽

pp. 60-73

Author(s):

Hao Liu ◽

◽

Satoshi Oyama ◽

Masahito Kurihara ◽

Haruhiko Sato

Keyword(s):

Time Complexity ◽

Large Scale ◽

Clustering Algorithm ◽

Data Sets ◽

Clustering Methods ◽

Data Set ◽

Large Scale Data ◽

Density Based Clustering ◽

Scale Data ◽

Large Scale Data Sets

Clustering is an important tool for data analysis and many clustering techniques have been proposed over the past years. Among them are density-based clustering methods, which have several benefits such as the number of clusters is not required before carrying out clustering; the detected clusters can be represented in an arbitrary shape and outliers can be detected and removed. Recently, the density-based algorithms were extended with the fuzzy set theory, which has made these algorithm more robust. However, the density-based clustering algorithms usually require a time complexity ofO(n2) wherenis the number of data in the data set, implying that they are not suitable to work with large scale data sets. In this paper, a novel clustering algorithm called landmark fuzzy neighborhood DBSCAN (landmark FN-DBSCAN) is proposed. The concept, landmark, is used to represent a subset of the input data set which makes the algorithm efficient on large scale data sets. We give a theoretical analysis on time complexity and space complexity, which shows both of them are linear to the size of the data set. The experiments show that the landmark FN-DBSCAN is much faster than FN-DBSCAN and provides a very good quality of clustering.

Download Full-text

ProGen:Provenance database generator for large-scale data set

Journal of Computer Applications ◽

10.3724/sp.j.1087.2008.02737 ◽

2009 ◽

Vol 28 (11) ◽

pp. 2737-2740

Author(s):

Xiao ZHANG ◽

Shan WANG ◽

Na LIAN

Keyword(s):

Large Scale ◽

Data Set ◽

Large Scale Data ◽

Scale Data

Download Full-text

A stratified sampling based clustering algorithm for large-scale data

Knowledge-Based Systems ◽

10.1016/j.knosys.2018.09.007 ◽

2019 ◽

Vol 163 ◽

pp. 416-428 ◽

Cited By ~ 11

Author(s):

Xingwang Zhao ◽

Jiye Liang ◽

Chuangyin Dang

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Stratified Sampling ◽

Large Scale Data ◽

Scale Data

Download Full-text

Matrix-based Kernel Principal Component analysis for large-scale data set

2009 International Joint Conference on Neural Networks ◽

10.1109/ijcnn.2009.5178692 ◽

2009 ◽

Cited By ~ 3

Author(s):

Weiya Shi ◽

Yue-Fei Guo ◽

Xiangyang Xue

Keyword(s):

Principal Component Analysis ◽

Large Scale ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Data Set ◽

Large Scale Data ◽

Scale Data

Download Full-text

Nonlinear Component Analysis for Large-Scale Data Set Using Fixed-Point Algorithm

Advances in Neural Networks – ISNN 2009 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-01513-7_16 ◽

2009 ◽

pp. 144-151

Author(s):

Weiya Shi ◽

Yue-Fei Guo

Keyword(s):

Fixed Point ◽

Large Scale ◽

Component Analysis ◽

Data Set ◽

Fixed Point Algorithm ◽

Nonlinear Component ◽

Large Scale Data ◽

Scale Data

Download Full-text

An Improved Kernel Principal Component Analysis for Large-Scale Data Set

Advances in Neural Networks - ISNN 2010 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-13318-3_2 ◽

2010 ◽

pp. 9-16 ◽

Cited By ~ 1

Author(s):

Weiya Shi ◽

Dexian Zhang

Keyword(s):

Principal Component Analysis ◽

Large Scale ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Data Set ◽

Large Scale Data ◽

Scale Data

Download Full-text