HASTA

In this paper, a novel clustering algorithm, HASTA (HierArchical-grid cluStering based on daTA field), is proposed to model the dataset as a data field by assigning all the data objects into qusantized grids. Clustering centers of HASTA are defined to locate where the maximum value of local potential is. Edges of cluster in HASTA are identified by analyzing the first-order partial derivative of potential value, thus the full size of arbitrary shaped clusters can be detected. The experimented case demonstrates that HASTA performs effectively upon different datasets and can find out clusters of arbitrary shapes in noisy circumstance. Besides those, HASTA does not force users to preset the exact amount of clusters inside dataset. Furthermore, HASTA is insensitive to the order of data input. The time complexity of HASTA achieves O(n). Those advantages will potentially benefit the mining of big data.

Download Full-text

HierArchical-Grid CluStering Based on DaTA Field in Time-Series and the Influence of the First-Order Partial Derivative Potential Value for the ARIMA-Model

Advanced Data Mining and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-030-05090-0_3 ◽

2018 ◽

pp. 31-41

Author(s):

Krid Jinklub ◽

Jing Geng

Keyword(s):

Time Series ◽

Partial Derivative ◽

Arima Model ◽

Order Partial Derivative ◽

First Order ◽

Potential Value ◽

Data Field

Download Full-text

An Improved Gravitational Clustering Based on Local Density

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.2021010101 ◽

2021 ◽

Vol 12 (1) ◽

pp. 1-22

Author(s):

Lei Chen ◽

Qinghua Guo ◽

Zhaohua Liu ◽

Long Chen ◽

HuiQin Ning ◽

...

Keyword(s):

Data Compression ◽

Time Complexity ◽

Clustering Algorithm ◽

Local Density ◽

Cluster Complex ◽

Gravitational Clustering ◽

Effectiveness And Efficiency ◽

Real World Datasets ◽

Data Objects ◽

Complex Dataset

Gravitational clustering algorithm (Gravc) is a novel and excellent dynamic clustering algorithm that can accurately cluster complex dataset with arbitrary shape and distribution. However, high time complexity is a key challenge to the gravitational clustering algorithm. To solve this problem, an improved gravitational clustering algorithm based on the local density is proposed in this paper, called FastGravc. The main contributions of this paper are as follows. First of all, a local density-based data compression strategy is designed to reduce the number of data objects and the number of neighbors of each object participating in the gravitational clustering algorithm. Secondly, the traditional gravity model is optimized to adapt to the quality differences of different objects caused by data compression strategy. And then, the improved gravitational clustering algorithm FastGravc is proposed by integrating the above optimization strategies. Finally, extensive experimental results on synthetic and real-world datasets verify the effectiveness and efficiency of FastGravc algorithm.

Download Full-text

Data Field for Hierarchical Clustering

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2011100103 ◽

2011 ◽

Vol 7 (4) ◽

pp. 43-63 ◽

Cited By ~ 55

Author(s):

Shuliang Wang ◽

Wenyan Gan ◽

Deyi Li ◽

Deren Li

Keyword(s):

Hierarchical Clustering ◽

Physical Space ◽

Core Data ◽

Data Object ◽

Group Data ◽

Data Field ◽

Data Objects ◽

The Masses ◽

Space Data ◽

The Impact

In this paper, data field is proposed to group data objects via simulating their mutual interactions and opposite movements for hierarchical clustering. Enlightened by the field in physical space, data field to simulate nuclear field is presented to illuminate the interaction between objects in data space. In the data field, the self-organized process of equipotential lines on many data objects discovers their hierarchical clustering-characteristics. During the clustering process, a random sample is first generated to optimize the impact factor. The masses of data objects are then estimated to select core data object with nonzero masses. Taking the core data objects as the initial clusters, the clusters are iteratively merged hierarchy by hierarchy with good performance. The results of a case study show that the data field is capable of hierarchical clustering on objects varying size, shape or granularity without user-specified parameters, as well as considering the object features inside the clusters and removing the outliers from noisy data. The comparisons illustrate that the data field clustering performs better than K-means, BIRCH, CURE, and CHAMELEON.

Download Full-text

An improved OPTICS clustering algorithm for discovering clusters with uneven densities

Intelligent Data Analysis ◽

10.3233/ida-205497 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1453-1471

Author(s):

Chunhua Tang ◽

Han Wang ◽

Zhiwen Wang ◽

Xiangkun Zeng ◽

Huaran Yan ◽

...

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Substantial Improvement ◽

Experimental Results ◽

High Time ◽

Parameter Setting ◽

K Nearest Neighbor ◽

Density Based Clustering

Most density-based clustering algorithms have the problems of difficult parameter setting, high time complexity, poor noise recognition, and weak clustering for datasets with uneven density. To solve these problems, this paper proposes FOP-OPTICS algorithm (Finding of the Ordering Peaks Based on OPTICS), which is a substantial improvement of OPTICS (Ordering Points To Identify the Clustering Structure). The proposed algorithm finds the demarcation point (DP) from the Augmented Cluster-Ordering generated by OPTICS and uses the reachability-distance of DP as the radius of neighborhood eps of its corresponding cluster. It overcomes the weakness of most algorithms in clustering datasets with uneven densities. By computing the distance of the k-nearest neighbor of each point, it reduces the time complexity of OPTICS; by calculating density-mutation points within the clusters, it can efficiently recognize noise. The experimental results show that FOP-OPTICS has the lowest time complexity, and outperforms other algorithms in parameter setting and noise recognition.

Download Full-text

Improved Bidirectional CABOSFV Based on Multi-Adjustment Clustering and Simulated Annealing

Cybernetics and Information Technologies ◽

10.1515/cait-2016-0075 ◽

2016 ◽

Vol 16 (6) ◽

pp. 27-42 ◽

Cited By ~ 1

Author(s):

Minghan Yang ◽

Xuedong Gao ◽

Ling Li

Keyword(s):

Simulated Annealing ◽

Data Clustering ◽

Time Complexity ◽

Clustering Algorithm ◽

Feature Vector ◽

Parameter Determination ◽

Data Sets ◽

Parameter Vector ◽

Clustering Validity

Abstract Although Clustering Algorithm Based on Sparse Feature Vector (CABOSFV) and its related algorithms are efficient for high dimensional sparse data clustering, there exist several imperfections. Such imperfections as subjective parameter designation and order sensibility of clustering process would eventually aggravate the time complexity and quality of the algorithm. This paper proposes a parameter adjustment method of Bidirectional CABOSFV for optimization purpose. By optimizing Parameter Vector (PV) and Parameter Selection Vector (PSV) with the objective function of clustering validity, an improved Bidirectional CABOSFV algorithm using simulated annealing is proposed, which circumvents the requirement of initial parameter determination. The experiments on UCI data sets show that the proposed algorithm, which can perform multi-adjustment clustering, has a higher accurateness than single adjustment clustering, along with a decreased time complexity through iterations.

Download Full-text

An Image Clustering and Feedback-Based Retrieval Framework

Methods and Innovations for Multimedia Database Content Management ◽

10.4018/978-1-4666-1791-9.ch005 ◽

2012 ◽

pp. 62-80

Author(s):

Chengcui Zhang ◽

Liping Zhou ◽

Wen Wan ◽

Jeffrey Birch ◽

Wei-Bang Chen

Keyword(s):

Image Retrieval ◽

Relevance Feedback ◽

Time Complexity ◽

Clustering Algorithm ◽

Image Database ◽

Image Clustering ◽

Image Region ◽

Retrieval Systems ◽

Object Based ◽

Region Matching

Most existing object-based image retrieval systems are based on single object matching, with its main limitation being that one individual image region (object) can hardly represent the user’s retrieval target, especially when more than one object of interest is involved in the retrieval. Integrated Region Matching (IRM) has been used to improve the retrieval accuracy by evaluating the overall similarity between images and incorporating the properties of all the regions in the images. However, IRM does not take the user’s preferred regions into account and has undesirable time complexity. In this article, we present a Feedback-based Image Clustering and Retrieval Framework (FIRM) using a novel image clustering algorithm and integrating it with Integrated Region Matching (IRM) and Relevance Feedback (RF). The performance of the system is evaluated on a large image database, demonstrating the effectiveness of our framework in catching users’ retrieval interests in object-based image retrieval.

Download Full-text

Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient

Mathematical Problems in Engineering ◽

10.1155/2020/5143797 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Ziqi Jia ◽

Ling Song

Keyword(s):

Categorical Data ◽

Clustering Algorithm ◽

Numerical Data ◽

Experimental Results ◽

Cluster Center ◽

Real Dataset ◽

Dissimilarity Coefficient ◽

Initial Cluster ◽

Data Objects ◽

Selection Of

The k-prototypes algorithm is a hybrid clustering algorithm that can process Categorical Data and Numerical Data. In this study, the method of initial Cluster Center selection was improved and a new Hybrid Dissimilarity Coefficient was proposed. Based on the proposed Hybrid Dissimilarity Coefficient, a weighted k-prototype clustering algorithm based on the hybrid dissimilarity coefficient was proposed (WKPCA). The proposed WKPCA algorithm not only improves the selection of initial Cluster Centers, but also puts a new method to calculate the dissimilarity between data objects and Cluster Centers. The real dataset of UCI was used to test the WKPCA algorithm. Experimental results show that WKPCA algorithm is more efficient and robust than other k-prototypes algorithms.

Download Full-text

K-Anonymity Algorithm Based on CLIQUE for Green Manufacturing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.312.714 ◽

2013 ◽

Vol 312 ◽

pp. 714-718

Author(s):

Zi Qi Zhao ◽

Xiao Jun Ye ◽

Chun Ping Li

Keyword(s):

Data Processing ◽

Processing Speed ◽

Clustering Analysis ◽

Time Complexity ◽

Clustering Algorithm ◽

Green Manufacturing ◽

Multidimensional Data ◽

Clustering Method ◽

Analysis Algorithm ◽

Clique Algorithm

Multidimensional clustering analysis algorithm is for a class of cell-based clustering method of processing speed quickly, time efficiency, mainly to CLIQUE representatives. With time efficient clustering algorithm CLIQUE algorithm can achieve multi-dimensional k - Anonymous the algorithm KLIQUE, KLIQUE algorithm based CLIQUE efficiently retained their CLIQUE algorithm time complexity of features, can play the CLIQUE multidimensional data for the large amount of data processing advantage.

Download Full-text

Hebbian Self-Organizing Integrate-and-Fire Networks for Data Clustering

Neural Computation ◽

10.1162/neco.2009.12-08-926 ◽

2010 ◽

Vol 22 (1) ◽

pp. 273-288 ◽

Cited By ~ 16

Author(s):

Florian Landis ◽

Thomas Ott ◽

Ruedi Stoop

Keyword(s):

Data Clustering ◽

Time Complexity ◽

Arbitrary Shape ◽

Clustering Algorithm ◽

Hebbian Learning ◽

Spiking Neurons ◽

Integrate And Fire ◽

Background Data ◽

Homogeneous Regions ◽

Noisy Background

We propose a Hebbian learning-based data clustering algorithm using spiking neurons. The algorithm is capable of distinguishing between clusters and noisy background data and finds an arbitrary number of clusters of arbitrary shape. These properties render the approach particularly useful for visual scene segmentation into arbitrarily shaped homogeneous regions. We present several application examples, and in order to highlight the advantages and the weaknesses of our method, we systematically compare the results with those from standard methods such as the k-means and Ward's linkage clustering. The analysis demonstrates that not only the clustering ability of the proposed algorithm is more powerful than those of the two concurrent methods, the time complexity of the method is also more modest than that of its generally used strongest competitor.

Download Full-text

A Comparison of K-Means and Mean Shift Algorithms

10.20944/preprints202108.0140.v1 ◽

2021 ◽

Author(s):

Mehak Nigar Shumaila

Keyword(s):

Cluster Analysis ◽

Data Analysis ◽

Time Complexity ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Mean Shift ◽

Prediction Performance ◽

Learning Problem ◽

Cluster A ◽

Formation Of Groups

Clustering, or otherwise known as cluster analysis, is a learning problem that takes place without any human supervision. This technique has often been utilized, much efficiently, in data analysis, and serves for observing and identifying interesting, useful, or desired patterns in the said data. The clustering technique functions by performing a structured division of the data involved, in similar objects based on the characteristics that it identifies. This process results in the formation of groups, and each group that is formed, is called a cluster. A single said cluster consists of objects from the data, that have similarities among other objects found in the same cluster, and resemble differences when compared to objects identified from the data that now exist in other clusters. The process of clustering is very significant in various aspects of data analysis, as it determines and presents the intrinsic grouping of objects present in the data, based on their attributes, in a batch of unlabeled raw data. A textbook or otherwise said, good criteria, does not exist in this method of cluster analysis. That is because this process is so different and so customizable for every user, that needs it in his/her various and different needs. There is no outright best clustering algorithm, as it massively depends on the user’s scenario and needs. This paper is intended to compare and study two different clustering algorithms. The algorithms under investigation are k-mean and mean shift. These algorithms are compared according to the following factors: time complexity, training, prediction performance and accuracy of the clustering algorithms.

Download Full-text