An Effective Clustering Algorithm Using Adaptive Neighborhood and Border Peeling Method

Traditional clustering methods often cannot avoid the problem of selecting neighborhood parameters and the number of clusters, and the optimal selection of these parameters varies among different shapes of data, which requires prior knowledge. To address the above parameter selection problem, we propose an effective clustering algorithm based on adaptive neighborhood, which can obtain satisfactory clustering results without setting the neighborhood parameters and the number of clusters. The core idea of the algorithm is to first iterate adaptively to a logarithmic stable state and obtain neighborhood information according to the distribution characteristics of the dataset, and then mark and peel the boundary points according to this neighborhood information, and finally cluster the data clusters with the core points as the centers. We have conducted extensive comparative experiments on datasets of different sizes and different distributions and achieved satisfactory experimental results.

Download Full-text

Multi-Radius Density Clustering Algorithm Based on Outlier Factor

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.472.427 ◽

2014 ◽

Vol 472 ◽

pp. 427-431

Author(s):

Zong Lin Ye ◽

Hui Cao ◽

Li Xin Jia ◽

Yan Bin Zhang ◽

Gang Quan Si

Keyword(s):

Clustering Algorithm ◽

Spatial Clustering ◽

Similar Process ◽

The Core ◽

Dbscan Algorithm ◽

Proposed Model ◽

Density Clustering ◽

Relationship Of ◽

Core Points ◽

The Relationship

This paper proposes a novel multi-radius density clustering algorithm based on outlier factor. The algorithm first calculates the density-similar-neighbor-based outlier factor (DSNOF) for each point in the dataset according to the relationship of the density of the point and its neighbors, and then treats the point whose DSNOF is smaller than 1 as a core point. Second, the core points are used for clustering by the similar process of the density based spatial clustering application with noise (DBSCAN) to get some sub-clusters. Third, the proposed algorithm merges the obtained sub-clusters into some clusters. Finally, the points whose DSNOF are larger than 1 are assigned into these clusters. Experiments are performed on some real datasets of the UCI Machine Learning Repository and the experiments results verify that the effectiveness of the proposed model is higher than the DBSCAN algorithm and k-means algorithm and would not be affected by the parameter greatly.

Download Full-text

A Novel Complex Networks Clustering Algorithm Based on the Core Influence of Nodes

The Scientific World JOURNAL ◽

10.1155/2014/801854 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 3

Author(s):

Chao Tong ◽

Jianwei Niu ◽

Bin Dai ◽

Zhongyu Xie

Keyword(s):

Complex Networks ◽

Clustering Algorithm ◽

Cluster Formation ◽

Clustering Algorithms ◽

Cluster Structure ◽

Network Clustering ◽

Clustering Methods ◽

Positive Role ◽

The Core ◽

Final Cluster

In complex networks, cluster structure, identified by the heterogeneity of nodes, has become a common and important topological property. Network clustering methods are thus significant for the study of complex networks. Currently, many typical clustering algorithms have some weakness like inaccuracy and slow convergence. In this paper, we propose a clustering algorithm by calculating the core influence of nodes. The clustering process is a simulation of the process of cluster formation in sociology. The algorithm detects the nodes with core influence through their betweenness centrality, and builds the cluster’s core structure by discriminant functions. Next, the algorithm gets the final cluster structure after clustering the rest of the nodes in the network by optimizing method. Experiments on different datasets show that the clustering accuracy of this algorithm is superior to the classical clustering algorithm (Fast-Newman algorithm). It clusters faster and plays a positive role in revealing the real cluster structure of complex networks precisely.

Download Full-text

Ant Custering Algorithms

Principal Concepts in Applied Evolutionary Computation ◽

10.4018/978-1-4666-1749-0.ch001 ◽

2012 ◽

pp. 1-15

Author(s):

Yu-Chiun Chiou ◽

Shih-Ta Chou

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Small Scale ◽

Solution Stability ◽

Clustering Methods ◽

Clustering Problem ◽

The Core ◽

Genetic Clustering ◽

Fully Connected ◽

Pheromone Trail

This paper proposes three ant clustering algorithms (ACAs): ACA-1, ACA-2 and ACA-3. The core logic of the proposed ACAs is to modify the ant colony metaheuristic by reformulating the clustering problem into a network problem. For a clustering problem of N objects and K clusters, a fully connected network of N nodes is formed with link costs, representing the dissimilarity of any two nodes it connects. K ants are then to collect their own nodes according to the link costs and following the pheromone trail laid by previous ants. The proposed three ACAs have been validated on a small-scale problem solved by a total enumeration method. The solution effectiveness at different problem scales consistently shows that ACA-2 outperforms among these three ACAs. A further comparison of ACA-2 with other commonly used clustering methods, including agglomerative hierarchy clustering algorithm (AHCA), K-means algorithm (KMA) and genetic clustering algorithm (GCA), shows that ACA-2 significantly outperforms them in solution effectiveness for the most of cases and also performs considerably better in solution stability as the problem scales or the number of clusters gets larger.

Download Full-text

An On-Line Agglomerative Clustering Method for Nonstationary Data

Neural Computation ◽

10.1162/089976699300016755 ◽

1999 ◽

Vol 11 (2) ◽

pp. 521-540 ◽

Cited By ~ 41

Author(s):

Isaac David Guedalia ◽

Mickey London ◽

Michael Werman

Keyword(s):

Clustering Algorithm ◽

Small Mass ◽

Good Representation ◽

Clustering Methods ◽

Agglomerative Clustering ◽

Number Of Clusters ◽

Local Distortion ◽

On Line ◽

Nonstationary Data ◽

Computationally Intensive

An on-line agglomerative clustering algorithm for nonstationary data is described. Three issues are addressed. The first regards the temporal aspects of the data. The clustering of stationary data by the proposed algorithm is comparable to the other popular algorithms tested (batch and on-line). The second issue addressed is the number of clusters required to represent the data. The algorithm provides an efficient framework to determine the natural number of clusters given the scale of the problem. Finally, the proposed algorithm implicitly minimizes the local distortion, a measure that takes into account clusters with relatively small mass. In contrast, most existing on-line clustering methods assume stationarity of the data. When used to cluster nonstationary data, these methods fail to generate a good representation. Moreover, most current algorithms are computationally intensive when determining the correct number of clusters. These algorithms tend to neglect clusters of small mass due to their minimization of the global distortion (Energy).

Download Full-text

A Three-Way Clustering Method Based on Ensemble Strategy and Three-Way Decision

Information ◽

10.3390/info10020059 ◽

2019 ◽

Vol 10 (2) ◽

pp. 59 ◽

Cited By ~ 3

Author(s):

Pingxin Wang ◽

Qiang Liu ◽

Gang Xu ◽

Kangkang Wang

Keyword(s):

Clustering Algorithm ◽

Core Region ◽

Data Sets ◽

Clustering Methods ◽

The Core ◽

Ensemble Strategy ◽

Hard Clustering ◽

The Difference ◽

Human Problem ◽

Specific Cluster

Three-way decision is a class of effective ways and heuristics commonly used in human problem solving and information processing. As an application of three-way decision in clustering, three-way clustering uses core region and fringe region to represent a cluster. The identified elements are assigned into the core region and the uncertain elements are assigned into the fringe region in order to reduce decision risk. In this paper, we propose a three-way clustering algorithm based on the ideas of cluster ensemble and three-way decision. In the proposed method, we use hard clustering methods to produce different clustering results and labels matching to align all clustering results to a given order. The intersection of the clusters with the same labels are regarded as the core region. The difference between the union and the intersection of the clusters with the same labels are regarded as the fringe region of the specific cluster. Therefore, a three-way clustering is naturally formed. The results on UCI data sets show that such a strategy is effective in improving the structure of clustering results.

Download Full-text

Atmospheric Circulation Regimes: Can Cluster Analysis Provide the Number?

Journal of Climate ◽

10.1175/jcli4107.1 ◽

2007 ◽

Vol 20 (10) ◽

pp. 2229-2250 ◽

Cited By ~ 86

Author(s):

Bo Christiansen

Keyword(s):

Sample Size ◽

Mixture Model ◽

Clustering Algorithm ◽

Statistical Significance ◽

Skewed Distribution ◽

Clustering Methods ◽

Number Of Clusters ◽

Multiple Regimes ◽

Atmospheric Data ◽

Multiple Clusters

Abstract The existence of multiple regimes in the extratropical tropospheric circulation is a hypothesis of theoretical importance with potential practical consequences. It is also a controversial hypothesis, and an abundance of conflicting results regarding both the existence and the number of regimes can be found in the literature. Studies of atmospheric regime behavior are often based on clustering methods such as k-means and mixture models. In the basic implementation of these methods the number of clusters has to be specified a priori and “How many clusters?” is a highly nontrivial question. For the mixture model a procedure to assess the number of clusters by cross validation has recently been introduced. For the k-means model a Monte Carlo test is introduced that compares the clustering of the original data with the clustering of Gaussian distributed surrogate data. The robustness of these methods and their ability to produce the right number of clusters is critically assessed. The study is based on both idealized data and atmospheric data. It is shown that applying the clustering methods to the Northern Hemisphere winter tropospheric geopotential heights gives conflicting and fragile results. In particular the number of clusters depends both on the clustering algorithm and on the period considered. Furthermore, the clustering methods find multiple clusters when applied to data similar to the atmospheric data but drawn from a unimodal, skewed distribution. It is also shown that both clustering methods report multiple clusters for idealized data drawn from distributions that are skewed or platykurtic but otherwise smooth and without bumps or shoulders. In these cases the number of clusters found depends on the sample size. In particular, for the mixture model the number of clusters increases without bounds with increasing sample size. It is concluded that in the atmospheric dataset studied the clustering methods provide only weak evidence for multiple regimes although the data is non-Gaussian with high statistical significance. It is also concluded that statistical models with basically unknown properties should be approached with utmost care or avoided completely.

Download Full-text

RESAMPLING FOR FUZZY CLUSTERING

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488507004893 ◽

2007 ◽

Vol 15 (05) ◽

pp. 595-614 ◽

Cited By ~ 8

Author(s):

CHRISTIAN BORGELT

Keyword(s):

Fuzzy Clustering ◽

Data Set ◽

Number Of Clusters ◽

Resampling Methods ◽

Probabilistic Clustering ◽

The Core ◽

Core Idea ◽

The Right ◽

The Given ◽

Comparison Measures

Resampling methods are among the best approaches to determine the number of clusters in prototype-based clustering. The core idea is that with the right choice for the number of clusters basically the same cluster structures should be obtained from subsamples of the given data set, while a wrong choice should produce considerably varying cluster structures. In this paper I give an overview how such resampling approaches can be transferred to fuzzy and probabilistic clustering. I study several cluster comparison measures, which can be parameterized with t-norms, and report experiments that provide some guidance which of them may be the best choice.

Download Full-text

Sustainable Development is a Dead-End: The Logic of Modernity and Ecological Crisis

Environmental Values ◽

10.3197/096327120x15916910310518 ◽

2020 ◽

Author(s):

Simon Lumsden

Keyword(s):

Sustainable Development ◽

Ecological Crisis ◽

Self Determination ◽

Ecological Sustainability ◽

Planetary Boundaries ◽

Western Modernity ◽

Conceptual Frame ◽

The Core ◽

Dead End ◽

Core Idea

This paper examines the theory of sustainable development presented by Jeffrey Sachs in The Age of Sustainable Development. While Sustainable Development ostensibly seeks to harmonise the conflict between ecological sustainability and human development, the paper argues this is impossible because of the conceptual frame it employs. Rather than allowing for a re-conceptualisation of the human–nature relation, Sustainable Development is simply the latest and possibly last attempt to advance the core idea of western modernity — the notion of self-determination. Drawing upon Hegel’s account of historical development it is argued that Sustainable Development and the notion of planetary boundaries cannot break out of a dualism of nature and self-determining agents.

Download Full-text

Method for determining optimal number of clusters in K-means clustering algorithm

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.01995 ◽

2010 ◽

Vol 30 (8) ◽

pp. 1995-1998 ◽

Cited By ~ 18

Author(s):

Shi-bing ZHOU ◽

Zhen-yuan XU ◽

Xu-qing TANG

Keyword(s):

Clustering Algorithm ◽

Optimal Number ◽

Number Of Clusters ◽

Optimal Number Of Clusters

Download Full-text

A novel bidirectional clustering algorithm based on local density

Scientific Reports ◽

10.1038/s41598-021-93244-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Baicheng Lyu ◽

Wenhua Wu ◽

Zhiqiang Hu

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Cluster Number ◽

Denoising Method ◽

Number Of Clusters ◽

Data Points ◽

Cutoff Distance ◽

Large Clusters ◽

Small Clusters

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.

Download Full-text