HASTA

2014 ◽  
Vol 10 (2) ◽  
pp. 39-54 ◽  
Author(s):  
Shuliang Wang ◽  
Yasen Chen

In this paper, a novel clustering algorithm, HASTA (HierArchical-grid cluStering based on daTA field), is proposed to model the dataset as a data field by assigning all the data objects into qusantized grids. Clustering centers of HASTA are defined to locate where the maximum value of local potential is. Edges of cluster in HASTA are identified by analyzing the first-order partial derivative of potential value, thus the full size of arbitrary shaped clusters can be detected. The experimented case demonstrates that HASTA performs effectively upon different datasets and can find out clusters of arbitrary shapes in noisy circumstance. Besides those, HASTA does not force users to preset the exact amount of clusters inside dataset. Furthermore, HASTA is insensitive to the order of data input. The time complexity of HASTA achieves O(n). Those advantages will potentially benefit the mining of big data.

Author(s):  
Lei Chen ◽  
Qinghua Guo ◽  
Zhaohua Liu ◽  
Long Chen ◽  
HuiQin Ning ◽  
...  

Gravitational clustering algorithm (Gravc) is a novel and excellent dynamic clustering algorithm that can accurately cluster complex dataset with arbitrary shape and distribution. However, high time complexity is a key challenge to the gravitational clustering algorithm. To solve this problem, an improved gravitational clustering algorithm based on the local density is proposed in this paper, called FastGravc. The main contributions of this paper are as follows. First of all, a local density-based data compression strategy is designed to reduce the number of data objects and the number of neighbors of each object participating in the gravitational clustering algorithm. Secondly, the traditional gravity model is optimized to adapt to the quality differences of different objects caused by data compression strategy. And then, the improved gravitational clustering algorithm FastGravc is proposed by integrating the above optimization strategies. Finally, extensive experimental results on synthetic and real-world datasets verify the effectiveness and efficiency of FastGravc algorithm.


2011 ◽  
Vol 7 (4) ◽  
pp. 43-63 ◽  
Author(s):  
Shuliang Wang ◽  
Wenyan Gan ◽  
Deyi Li ◽  
Deren Li

In this paper, data field is proposed to group data objects via simulating their mutual interactions and opposite movements for hierarchical clustering. Enlightened by the field in physical space, data field to simulate nuclear field is presented to illuminate the interaction between objects in data space. In the data field, the self-organized process of equipotential lines on many data objects discovers their hierarchical clustering-characteristics. During the clustering process, a random sample is first generated to optimize the impact factor. The masses of data objects are then estimated to select core data object with nonzero masses. Taking the core data objects as the initial clusters, the clusters are iteratively merged hierarchy by hierarchy with good performance. The results of a case study show that the data field is capable of hierarchical clustering on objects varying size, shape or granularity without user-specified parameters, as well as considering the object features inside the clusters and removing the outliers from noisy data. The comparisons illustrate that the data field clustering performs better than K-means, BIRCH, CURE, and CHAMELEON.


2021 ◽  
Vol 25 (6) ◽  
pp. 1453-1471
Author(s):  
Chunhua Tang ◽  
Han Wang ◽  
Zhiwen Wang ◽  
Xiangkun Zeng ◽  
Huaran Yan ◽  
...  

Most density-based clustering algorithms have the problems of difficult parameter setting, high time complexity, poor noise recognition, and weak clustering for datasets with uneven density. To solve these problems, this paper proposes FOP-OPTICS algorithm (Finding of the Ordering Peaks Based on OPTICS), which is a substantial improvement of OPTICS (Ordering Points To Identify the Clustering Structure). The proposed algorithm finds the demarcation point (DP) from the Augmented Cluster-Ordering generated by OPTICS and uses the reachability-distance of DP as the radius of neighborhood eps of its corresponding cluster. It overcomes the weakness of most algorithms in clustering datasets with uneven densities. By computing the distance of the k-nearest neighbor of each point, it reduces the time complexity of OPTICS; by calculating density-mutation points within the clusters, it can efficiently recognize noise. The experimental results show that FOP-OPTICS has the lowest time complexity, and outperforms other algorithms in parameter setting and noise recognition.


2016 ◽  
Vol 16 (6) ◽  
pp. 27-42 ◽  
Author(s):  
Minghan Yang ◽  
Xuedong Gao ◽  
Ling Li

Abstract Although Clustering Algorithm Based on Sparse Feature Vector (CABOSFV) and its related algorithms are efficient for high dimensional sparse data clustering, there exist several imperfections. Such imperfections as subjective parameter designation and order sensibility of clustering process would eventually aggravate the time complexity and quality of the algorithm. This paper proposes a parameter adjustment method of Bidirectional CABOSFV for optimization purpose. By optimizing Parameter Vector (PV) and Parameter Selection Vector (PSV) with the objective function of clustering validity, an improved Bidirectional CABOSFV algorithm using simulated annealing is proposed, which circumvents the requirement of initial parameter determination. The experiments on UCI data sets show that the proposed algorithm, which can perform multi-adjustment clustering, has a higher accurateness than single adjustment clustering, along with a decreased time complexity through iterations.


Author(s):  
Chengcui Zhang ◽  
Liping Zhou ◽  
Wen Wan ◽  
Jeffrey Birch ◽  
Wei-Bang Chen

Most existing object-based image retrieval systems are based on single object matching, with its main limitation being that one individual image region (object) can hardly represent the user’s retrieval target, especially when more than one object of interest is involved in the retrieval. Integrated Region Matching (IRM) has been used to improve the retrieval accuracy by evaluating the overall similarity between images and incorporating the properties of all the regions in the images. However, IRM does not take the user’s preferred regions into account and has undesirable time complexity. In this article, we present a Feedback-based Image Clustering and Retrieval Framework (FIRM) using a novel image clustering algorithm and integrating it with Integrated Region Matching (IRM) and Relevance Feedback (RF). The performance of the system is evaluated on a large image database, demonstrating the effectiveness of our framework in catching users’ retrieval interests in object-based image retrieval.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Ziqi Jia ◽  
Ling Song

The k-prototypes algorithm is a hybrid clustering algorithm that can process Categorical Data and Numerical Data. In this study, the method of initial Cluster Center selection was improved and a new Hybrid Dissimilarity Coefficient was proposed. Based on the proposed Hybrid Dissimilarity Coefficient, a weighted k-prototype clustering algorithm based on the hybrid dissimilarity coefficient was proposed (WKPCA). The proposed WKPCA algorithm not only improves the selection of initial Cluster Centers, but also puts a new method to calculate the dissimilarity between data objects and Cluster Centers. The real dataset of UCI was used to test the WKPCA algorithm. Experimental results show that WKPCA algorithm is more efficient and robust than other k-prototypes algorithms.


2013 ◽  
Vol 312 ◽  
pp. 714-718
Author(s):  
Zi Qi Zhao ◽  
Xiao Jun Ye ◽  
Chun Ping Li

Multidimensional clustering analysis algorithm is for a class of cell-based clustering method of processing speed quickly, time efficiency, mainly to CLIQUE representatives. With time efficient clustering algorithm CLIQUE algorithm can achieve multi-dimensional k - Anonymous the algorithm KLIQUE, KLIQUE algorithm based CLIQUE efficiently retained their CLIQUE algorithm time complexity of features, can play the CLIQUE multidimensional data for the large amount of data processing advantage.


2010 ◽  
Vol 22 (1) ◽  
pp. 273-288 ◽  
Author(s):  
Florian Landis ◽  
Thomas Ott ◽  
Ruedi Stoop

We propose a Hebbian learning-based data clustering algorithm using spiking neurons. The algorithm is capable of distinguishing between clusters and noisy background data and finds an arbitrary number of clusters of arbitrary shape. These properties render the approach particularly useful for visual scene segmentation into arbitrarily shaped homogeneous regions. We present several application examples, and in order to highlight the advantages and the weaknesses of our method, we systematically compare the results with those from standard methods such as the k-means and Ward's linkage clustering. The analysis demonstrates that not only the clustering ability of the proposed algorithm is more powerful than those of the two concurrent methods, the time complexity of the method is also more modest than that of its generally used strongest competitor.


Author(s):  
Mehak Nigar Shumaila

Clustering, or otherwise known as cluster analysis, is a learning problem that takes place without any human supervision. This technique has often been utilized, much efficiently, in data analysis, and serves for observing and identifying interesting, useful, or desired patterns in the said data. The clustering technique functions by performing a structured division of the data involved, in similar objects based on the characteristics that it identifies. This process results in the formation of groups, and each group that is formed, is called a cluster. A single said cluster consists of objects from the data, that have similarities among other objects found in the same cluster, and resemble differences when compared to objects identified from the data that now exist in other clusters. The process of clustering is very significant in various aspects of data analysis, as it determines and presents the intrinsic grouping of objects present in the data, based on their attributes, in a batch of unlabeled raw data. A textbook or otherwise said, good criteria, does not exist in this method of cluster analysis. That is because this process is so different and so customizable for every user, that needs it in his/her various and different needs. There is no outright best clustering algorithm, as it massively depends on the user’s scenario and needs. This paper is intended to compare and study two different clustering algorithms. The algorithms under investigation are k-mean and mean shift. These algorithms are compared according to the following factors: time complexity, training, prediction performance and accuracy of the clustering algorithms.


Sign in / Sign up

Export Citation Format

Share Document