An Improved Gravitational Clustering Based on Local Density

Gravitational clustering algorithm (Gravc) is a novel and excellent dynamic clustering algorithm that can accurately cluster complex dataset with arbitrary shape and distribution. However, high time complexity is a key challenge to the gravitational clustering algorithm. To solve this problem, an improved gravitational clustering algorithm based on the local density is proposed in this paper, called FastGravc. The main contributions of this paper are as follows. First of all, a local density-based data compression strategy is designed to reduce the number of data objects and the number of neighbors of each object participating in the gravitational clustering algorithm. Secondly, the traditional gravity model is optimized to adapt to the quality differences of different objects caused by data compression strategy. And then, the improved gravitational clustering algorithm FastGravc is proposed by integrating the above optimization strategies. Finally, extensive experimental results on synthetic and real-world datasets verify the effectiveness and efficiency of FastGravc algorithm.

Download Full-text

Adaptive Initialization Method for K-Means Algorithm

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.740817 ◽

2021 ◽

Vol 4 ◽

Author(s):

Jie Yang ◽

Yu-Kai Wang ◽

Xin Yao ◽

Chin-Teng Lin

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Life ◽

Superior Performance ◽

Local Optima ◽

Initial Cluster ◽

Higher Dimensional ◽

Real World Datasets ◽

Random Method

The K-means algorithm is a widely used clustering algorithm that offers simplicity and efficiency. However, the traditional K-means algorithm uses a random method to determine the initial cluster centers, which make clustering results prone to local optima and then result in worse clustering performance. In this research, we propose an adaptive initialization method for the K-means algorithm (AIMK) which can adapt to the various characteristics in different datasets and obtain better clustering performance with stable results. For larger or higher-dimensional datasets, we even leverage random sampling in AIMK (name as AIMK-RS) to reduce the time complexity. 22 real-world datasets were applied for performance comparisons. The experimental results show AIMK and AIMK-RS outperform the current initialization methods and several well-known clustering algorithms. Specifically, AIMK-RS can significantly reduce the time complexity to O (n). Moreover, we exploit AIMK to initialize K-medoids and spectral clustering, and better performance is also explored. The above results demonstrate superior performance and good scalability by AIMK or AIMK-RS. In the future, we would like to apply AIMK to more partition-based clustering algorithms to solve real-life practical problems.

Download Full-text

A Fuzzy Clustering-Based X-Ray Computed Tomography Simulation under Incomplete Measurements

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.748.651 ◽

2013 ◽

Vol 748 ◽

pp. 651-654

Author(s):

Shi Hong Yue ◽

Xiu Juan Bao ◽

Jin Xin Zhang

Keyword(s):

Computed Tomography ◽

Image Reconstruction ◽

Real Time ◽

Fuzzy Clustering ◽

Time Complexity ◽

Clustering Algorithm ◽

X Ray ◽

Fuzzy Clustering Algorithm ◽

X Ray Computed ◽

Effectiveness And Efficiency

The existing x-ray computed tomography algorithm simulation assume the complete measurements of the investigated objectives to be available, but this is not true in most applications. To overcome the problem, we creatively propose a method of image reconstruction based on fuzzy clustering algorithm under limited measurements. Different from the existing algorithms, we map all measurements into a set of vectors and cluster all vectors for the image reconstruction. The proposed algorithm aims to be easily realized, lower time complexity, and applicable in a real-time manner in case of limited measurements of the investigated objectives. Experiments demonstrate the effectiveness and efficiency of the proposed algorithm.

Download Full-text

Clustering Mixed Data Based on Density Peaks and Stacked Denoising Autoencoders

Symmetry ◽

10.3390/sym11020163 ◽

2019 ◽

Vol 11 (2) ◽

pp. 163

Author(s):

Baobin Duan ◽

Lixin Han ◽

Zhinan Gou ◽

Yi Yang ◽

Shuangshuang Chen

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Feature Space ◽

Original Data ◽

Mixed Data ◽

Feature Representations ◽

Density Peaks ◽

Categorical Attributes ◽

Data Objects

With the universal existence of mixed data with numerical and categorical attributes in real world, a variety of clustering algorithms have been developed to discover the potential information hidden in mixed data. Most existing clustering algorithms often compute the distances or similarities between data objects based on original data, which may cause the instability of clustering results because of noise. In this paper, a clustering framework is proposed to explore the grouping structure of the mixed data. First, the transformed categorical attributes by one-hot encoding technique and normalized numerical attributes are input to a stacked denoising autoencoders to learn the internal feature representations. Secondly, based on these feature representations, all the distances between data objects in feature space can be calculated and the local density and relative distance of each data object can be also computed. Thirdly, the density peaks clustering algorithm is improved and employed to allocate all the data objects into different clusters. Finally, experiments conducted on some UCI datasets have demonstrated that our proposed algorithm for clustering mixed data outperforms three baseline algorithms in terms of the clustering accuracy and the rand index.

Download Full-text

HASTA

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2014040103 ◽

2014 ◽

Vol 10 (2) ◽

pp. 39-54 ◽

Cited By ~ 10

Author(s):

Shuliang Wang ◽

Yasen Chen

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Data Input ◽

Full Size ◽

First Order ◽

Potential Value ◽

Maximum Value ◽

Data Field ◽

Data Objects ◽

Exact Amount

In this paper, a novel clustering algorithm, HASTA (HierArchical-grid cluStering based on daTA field), is proposed to model the dataset as a data field by assigning all the data objects into qusantized grids. Clustering centers of HASTA are defined to locate where the maximum value of local potential is. Edges of cluster in HASTA are identified by analyzing the first-order partial derivative of potential value, thus the full size of arbitrary shaped clusters can be detected. The experimented case demonstrates that HASTA performs effectively upon different datasets and can find out clusters of arbitrary shapes in noisy circumstance. Besides those, HASTA does not force users to preset the exact amount of clusters inside dataset. Furthermore, HASTA is insensitive to the order of data input. The time complexity of HASTA achieves O(n). Those advantages will potentially benefit the mining of big data.

Download Full-text

An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

Mathematical Problems in Engineering ◽

10.1155/2014/486075 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 7

Author(s):

Kang Zhang ◽

Xingsheng Gu

Keyword(s):

Real World ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Affinity Propagation ◽

Mixed Data ◽

Clustering Methods ◽

Affinity Propagation Clustering ◽

Real World Datasets ◽

Data Objects ◽

Clustering Problems

Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP) algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.

Download Full-text

Adaptive density peak clustering based on dimensional-free and reverse k-nearest neighbors

Information Technology And Control ◽

10.5755/j01.itc.49.3.23405 ◽

2020 ◽

Vol 49 (3) ◽

pp. 395-411

Author(s):

Qiannan Wu ◽

Qianqian Zhang ◽

Ruizhi Sun ◽

Li Li ◽

Huiyu Mu ◽

...

Keyword(s):

High Dimension ◽

Clustering Algorithm ◽

Local Density ◽

Nearest Neighbors ◽

Allocation Strategy ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Density Peak ◽

Real World Datasets ◽

Density Peak Clustering

Cluster analysis plays a crucial component in consumer behavior segment. The density peak clustering algorithm (DPC) is a novel density-based clustering method. However, it performs poorly in high-dimension datasets and the local density for boundary points. In addition, its fault tolerance is affected by one-step allocation strategy. To overcome these disadvantages, an adaptive density peak clustering algorithm based on dimensional-free and reverse k-nearest neighbors (ERK-DPC) is proposed in this paper. First, we compute Euler cosine distance to obtain the similarity of sample points in high-dimension datasets. Then, the adaptive local density formula is used to measure the local density of each point. Finally, the reverse k-nearest neighbor idea is added on two-step allocation strategy, which assigns the remaining points accurately and effectively. The proposed clustering algorithm is experiments on several benchmark datasets and real-world datasets. By comparing the benchmarks, the results demonstrate that the ERK-DPC algorithm superior to some state-of- the-art methods.

Download Full-text

A novel bidirectional clustering algorithm based on local density

Scientific Reports ◽

10.1038/s41598-021-93244-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Baicheng Lyu ◽

Wenhua Wu ◽

Zhiqiang Hu

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Cluster Number ◽

Denoising Method ◽

Number Of Clusters ◽

Data Points ◽

Cutoff Distance ◽

Large Clusters ◽

Small Clusters

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.

Download Full-text

Factor-Bounded Nonnegative Matrix Factorization

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451395 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1-18

Author(s):

Kai Liu ◽

Xiangyu Li ◽

Zhihui Zhu ◽

Lodewijk Brand ◽

Hua Wang

Keyword(s):

Matrix Factorization ◽

Clustering Algorithm ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Optimization Methods ◽

Auxiliary Function ◽

Image Clustering ◽

Real World Datasets ◽

The Relationship ◽

Matrix Factors

Nonnegative Matrix Factorization (NMF) is broadly used to determine class membership in a variety of clustering applications. From movie recommendations and image clustering to visual feature extractions, NMF has applications to solve a large number of knowledge discovery and data mining problems. Traditional optimization methods, such as the Multiplicative Updating Algorithm (MUA), solves the NMF problem by utilizing an auxiliary function to ensure that the objective monotonically decreases. Although the objective in MUA converges, there exists no proof to show that the learned matrix factors converge as well. Without this rigorous analysis, the clustering performance and stability of the NMF algorithms cannot be guaranteed. To address this knowledge gap, in this article, we study the factor-bounded NMF problem and provide a solution algorithm with proven convergence by rigorous mathematical analysis, which ensures that both the objective and matrix factors converge. In addition, we show the relationship between MUA and our solution followed by an analysis of the convergence of MUA. Experiments on both toy data and real-world datasets validate the correctness of our proposed method and its utility as an effective clustering algorithm.

Download Full-text

An improved OPTICS clustering algorithm for discovering clusters with uneven densities

Intelligent Data Analysis ◽

10.3233/ida-205497 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1453-1471

Author(s):

Chunhua Tang ◽

Han Wang ◽

Zhiwen Wang ◽

Xiangkun Zeng ◽

Huaran Yan ◽

...

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Substantial Improvement ◽

Experimental Results ◽

High Time ◽

Parameter Setting ◽

K Nearest Neighbor ◽

Density Based Clustering

Most density-based clustering algorithms have the problems of difficult parameter setting, high time complexity, poor noise recognition, and weak clustering for datasets with uneven density. To solve these problems, this paper proposes FOP-OPTICS algorithm (Finding of the Ordering Peaks Based on OPTICS), which is a substantial improvement of OPTICS (Ordering Points To Identify the Clustering Structure). The proposed algorithm finds the demarcation point (DP) from the Augmented Cluster-Ordering generated by OPTICS and uses the reachability-distance of DP as the radius of neighborhood eps of its corresponding cluster. It overcomes the weakness of most algorithms in clustering datasets with uneven densities. By computing the distance of the k-nearest neighbor of each point, it reduces the time complexity of OPTICS; by calculating density-mutation points within the clusters, it can efficiently recognize noise. The experimental results show that FOP-OPTICS has the lowest time complexity, and outperforms other algorithms in parameter setting and noise recognition.

Download Full-text

An Improved Version of K-medoid Algorithm using CRO

Modern Applied Science ◽

10.5539/mas.v12n2p116 ◽

2018 ◽

Vol 12 (2) ◽

pp. 116 ◽

Cited By ~ 2

Author(s):

Amjad Hudaib ◽

Mohammad Khanafseh ◽

Ola Surakhi

Keyword(s):

Breast Cancer ◽

Lung Cancer ◽

Hybrid Algorithm ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Repository ◽

The Mean ◽

Real World Datasets ◽

Actual Point ◽

Learning Data

Clustering is the process of grouping a set of patterns into different disjoint clusters where each cluster contains the alike patterns. Many algorithms had been proposed before for clustering. K-medoid is a variant of k-mean that use an actual point in the cluster to represent it instead of the mean in the k-mean algorithm to get the outliers and reduce noise in the cluster. In order to enhance performance of k-medoid algorithm and get more accurate clusters, a hybrid algorithm is proposed which use CRO algorithm along with k-medoid. In this method, CRO is used to expand searching for the optimal medoid and enhance clustering by getting more precise results. The performance of the new algorithm is evaluated by comparing its results with five clustering algorithms, k-mean, k-medoid, DB/rand/1/bin, CRO based clustering algorithm and hybrid CRO-k-mean by using four real world datasets: Lung cancer, Iris, Breast cancer Wisconsin and Haberman’s survival from UCI machine learning data repository. The results were conducted and compared base on different metrics and show that proposed algorithm enhanced clustering technique by giving more accurate results.

Download Full-text