Parallel Implementation of Nonparametric Clustering Algorithm HCA-MS on GPU Using CUDA

An extension of a nonparametric clustering algorithm to derive radiometrically homogeneous objects pointed by seeds

International Journal of Remote Sensing ◽

10.1080/01431160110059927 ◽

2002 ◽

Vol 23 (6) ◽

pp. 1197-1205 ◽

Cited By ~ 2

Author(s):

R. Salvador ◽

J. San-Miguel-Ayanz

Keyword(s):

Clustering Algorithm ◽

Nonparametric Clustering

Download Full-text

A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU–GPU clusters

The Journal of Supercomputing ◽

10.1007/s11227-021-04204-6 ◽

2022 ◽

Author(s):

You Fu ◽

Wei Zhou

Keyword(s):

Biological Networks ◽

Large Scale ◽

Clustering Algorithm ◽

Parallel Implementation ◽

Gpu Clusters ◽

Markov Clustering

Download Full-text

Parallel Implementation of Ant-Based Clustering Algorithm Based on Hadoop

Lecture Notes in Computer Science - Advances in Swarm Intelligence ◽

10.1007/978-3-642-30976-2_23 ◽

2012 ◽

pp. 190-197 ◽

Cited By ~ 5

Author(s):

Yan Yang ◽

Xianhua Ni ◽

Hongjun Wang ◽

Yiteng Zhao

Keyword(s):

Clustering Algorithm ◽

Parallel Implementation

Download Full-text

Parallel implementation of fuzzy minimals clustering algorithm

Expert Systems with Applications ◽

10.1016/j.eswa.2015.11.011 ◽

2016 ◽

Vol 48 ◽

pp. 35-41 ◽

Cited By ~ 16

Author(s):

Isabel Timón ◽

Jesús Soto ◽

Horacio Pérez-Sánchez ◽

José M. Cecilia

Keyword(s):

Clustering Algorithm ◽

Parallel Implementation

Download Full-text

Parallel Cleaning Algorithm for Similar Duplicate Chinese Data Based on BERT

Scientific Programming ◽

10.1155/2021/5916748 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Biqiu Li ◽

Jiabin Wang ◽

Xueli Liu

Keyword(s):

Data Mining ◽

Large Scale ◽

Clustering Algorithm ◽

Parallel Implementation ◽

Data Cleaning ◽

Position Vector ◽

Data Sets ◽

Implementation Scheme ◽

Mining Work ◽

Context Features

Data is an important source of knowledge discovery, but the existence of similar duplicate data not only increases the redundancy of the database but also affects the subsequent data mining work. Cleaning similar duplicate data is helpful to improve work efficiency. Based on the complexity of the Chinese language and the bottleneck of the single machine system to large-scale data computing performance, this paper proposes a Chinese data cleaning method that combines the BERT model and a k-means clustering algorithm and gives a parallel implementation scheme of the algorithm. In the process of text to vector, the position vector is introduced to obtain the context features of words, and the vector is dynamically adjusted according to the semantics so that the polysemous words can obtain different vector representations in different contexts. At the same time, the parallel implementation of the process is designed based on Hadoop. After that, k-means clustering algorithm is used to cluster similar duplicate data to achieve the purpose of cleaning. Experimental results on a variety of data sets show that the parallel cleaning algorithm proposed in this paper not only has good speedup and scalability but also improves the precision and recall of similar duplicate data cleaning, which will be of great significance for subsequent data mining.

Download Full-text

GPUDePiCt: A Parallel Implementation of a Clustering Algorithm for Computing Degenerate Primers on Graphics Processing Units

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2014.2355231 ◽

2015 ◽

Vol 12 (2) ◽

pp. 445-454 ◽

Cited By ~ 1

Author(s):

Trevor Cickovski ◽

Tiffany Flor ◽

Galen Irving-Sachs ◽

Philip Novikov ◽

James Parda ◽

...

Keyword(s):

Graphics Processing Units ◽

Clustering Algorithm ◽

Parallel Implementation ◽

Degenerate Primers ◽

Graphics Processing

Download Full-text

A Nonparametric Clustering Algorithm with a Quantile-Based Likelihood Estimator

Neural Computation ◽

10.1162/neco_a_00628 ◽

2014 ◽

Vol 26 (9) ◽

pp. 2074-2101 ◽

Cited By ~ 2

Author(s):

Hideitsu Hino ◽

Noboru Murata

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Conditional Entropy ◽

Tuning Parameter ◽

Clustering Methods ◽

Sampling Weights ◽

Data Set ◽

Information Theoretic ◽

Nonparametric Clustering ◽

Conditional Information

Clustering is a representative of unsupervised learning and one of the important approaches in exploratory data analysis. By its very nature, clustering without strong assumption on data distribution is desirable. Information-theoretic clustering is a class of clustering methods that optimize information-theoretic quantities such as entropy and mutual information. These quantities can be estimated in a nonparametric manner, and information-theoretic clustering algorithms are capable of capturing various intrinsic data structures. It is also possible to estimate information-theoretic quantities using a data set with sampling weight for each datum. Assuming the data set is sampled from a certain cluster and assigning different sampling weights depending on the clusters, the cluster-conditional information-theoretic quantities are estimated. In this letter, a simple iterative clustering algorithm is proposed based on a nonparametric estimator of the log likelihood for weighted data sets. The clustering algorithm is also derived from the principle of conditional entropy minimization with maximum entropy regularization. The proposed algorithm does not contain a tuning parameter. The algorithm is experimentally shown to be comparable to or outperform conventional nonparametric clustering methods.

Download Full-text

Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit

Frontiers of Information Technology & Electronic Engineering ◽

10.1631/fitee.1601786 ◽

2017 ◽

Vol 18 (7) ◽

pp. 915-927 ◽

Cited By ~ 3

Author(s):

Ke-shi Ge ◽

Hua-you Su ◽

Dong-sheng Li ◽

Xi-cheng Lu

Keyword(s):

Clustering Algorithm ◽

Graphics Processing Unit ◽

Parallel Implementation ◽

Processing Unit ◽

Density Peaks ◽

Density Peaks Clustering ◽

Graphics Processing

Download Full-text

Parallel Implementation of Improved K-Means Based on a Cloud Platform

Information Technology And Control ◽

10.5755/j01.itc.48.4.23881 ◽

2019 ◽

Vol 48 (4) ◽

pp. 673-681

Author(s):

Shufen Zhang ◽

Zhiyu Liu ◽

Xuebin Chen ◽

Changyin Luo

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Programming Model ◽

Parallel Implementation ◽

Clustering Algorithms ◽

Data Set ◽

Large Scale Data ◽

Sample Density ◽

Scale Data ◽

Selection Of

In order to solve the problem of traditional K-Means clustering algorithm in dealing with large-scale data set, a Hadoop K-Means (referred to HKM) clustering algorithm is proposed. Firstly, according to the sample density, the algorithm eliminates the effects of noise points in the data set. Secondly, it optimizes the selection of the initial center point using the thought of the max-min distance. Finally, it uses a MapReduce programming model to realize the parallelization. Experimental results show that the proposed algorithm not only has high accuracy and stability in clustering results, but can also solve the problems of scalability encountered by traditional clustering algorithms in dealing with large scale data.

Download Full-text

Class: A nonparametric clustering algorithm

Pattern Recognition ◽

10.1016/0031-3203(76)90011-x ◽

1976 ◽

Vol 8 (3) ◽

pp. 107-114 ◽

Cited By ~ 19

Author(s):

Frederick R. Fromm ◽

Richard A. Northouse

Keyword(s):

Clustering Algorithm ◽

Class A ◽

Nonparametric Clustering

Download Full-text