A Novel Complex Networks Clustering Algorithm Based on the Core Influence of Nodes

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>

Download Full-text

Ant Custering Algorithms

Principal Concepts in Applied Evolutionary Computation ◽

10.4018/978-1-4666-1749-0.ch001 ◽

2012 ◽

pp. 1-15

Author(s):

Yu-Chiun Chiou ◽

Shih-Ta Chou

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Small Scale ◽

Solution Stability ◽

Clustering Methods ◽

Clustering Problem ◽

The Core ◽

Genetic Clustering ◽

Fully Connected ◽

Pheromone Trail

This paper proposes three ant clustering algorithms (ACAs): ACA-1, ACA-2 and ACA-3. The core logic of the proposed ACAs is to modify the ant colony metaheuristic by reformulating the clustering problem into a network problem. For a clustering problem of N objects and K clusters, a fully connected network of N nodes is formed with link costs, representing the dissimilarity of any two nodes it connects. K ants are then to collect their own nodes according to the link costs and following the pheromone trail laid by previous ants. The proposed three ACAs have been validated on a small-scale problem solved by a total enumeration method. The solution effectiveness at different problem scales consistently shows that ACA-2 outperforms among these three ACAs. A further comparison of ACA-2 with other commonly used clustering methods, including agglomerative hierarchy clustering algorithm (AHCA), K-means algorithm (KMA) and genetic clustering algorithm (GCA), shows that ACA-2 significantly outperforms them in solution effectiveness for the most of cases and also performs considerably better in solution stability as the problem scales or the number of clusters gets larger.

Download Full-text

VDPC: Variational Density Peak Clustering Algorithm

10.36227/techrxiv.17597669 ◽

2021 ◽

Author(s):

Yizhang Wang ◽

Di Wang ◽

You Zhou ◽

Chai Quek ◽

Xiaofeng Zhang

Keyword(s):

Clustering Algorithm ◽

Cluster Formation ◽

Clustering Algorithms ◽

Data Distribution ◽

Distribution Patterns ◽

Clustering Methods ◽

Density Peak ◽

Global Parameter ◽

Density Peak Clustering ◽

Parameter Values

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>

Download Full-text

Comparison of dimensionality reduction and clustering methods for SARS-CoV-2 genome

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i4.2803 ◽

2021 ◽

Vol 10 (4) ◽

pp. 2170-2180

Author(s):

Untari N. Wisesty ◽

Tati Rajab Mengko

Keyword(s):

Dimensionality Reduction ◽

Dimensional Reduction ◽

Clustering Algorithm ◽

Sequence Data ◽

Clustering Algorithms ◽

Gaussian Mixture Models ◽

Reduction Process ◽

Principal Component ◽

Gaussian Mixture ◽

Clustering Methods

This paper aims to conduct an analysis of the SARS-CoV-2 genome variation was carried out by comparing the results of genome clustering using several clustering algorithms and distribution of sequence in each cluster. The clustering algorithms used are K-means, Gaussian mixture models, agglomerative hierarchical clustering, mean-shift clustering, and DBSCAN. However, the clustering algorithm has a weakness in grouping data that has very high dimensions such as genome data, so that a dimensional reduction process is needed. In this research, dimensionality reduction was carried out using principal component analysis (PCA) and autoencoder method with three models that produce 2, 10, and 50 features. The main contributions achieved were the dimensional reduction and clustering scheme of SARS-CoV-2 sequence data and the performance analysis of each experiment on each scheme and hyper parameters for each method. Based on the results of experiments conducted, PCA and DBSCAN algorithm achieve the highest silhouette score of 0.8770 with three clusters when using two features. However, dimensionality reduction using autoencoder need more iterations to converge. On the testing process with Indonesian sequence data, more than half of them enter one cluster and the rest are distributed in the other two clusters.

Download Full-text

(SET) Smart Energy Management and Throughput Maximization

Security Management in Mobile Cloud Computing - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-0602-7.ch001 ◽

2017 ◽

pp. 1-28 ◽

Cited By ~ 2

Author(s):

Hassan El Alami ◽

Abdellah Najid

Keyword(s):

Energy Management ◽

Routing Protocols ◽

Clustering Algorithm ◽

Cluster Formation ◽

Clustering Algorithms ◽

Cluster Head ◽

Base Station ◽

Throughput Maximization ◽

Cluster Heads ◽

Smart Energy Management

Energy efficiency and throughput are critical factors in the design routing protocols of WSNs. Many routing protocols based on clustering algorithm have been proposed. Current clustering algorithms often use cluster head selection and cluster formation to reduce energy consumption and maximize throughput in WSNs. In this chapter, the authors present a new routing protocol based on smart energy management and throughput maximization for clustered WSNs. The main objective of this protocol is to solve the constraint of closest sensors to the base station which consume relatively more energy in sensed information traffics, and also decrease workload on CHs. This approach divides network field into free area which contains the closest sensors to the base station that communicate directly with, and clustered area which contains the sensors that transmit data to the base station through cluster head. So due to the sensors that communicate directly to the base station, the load on cluster heads is decreased. Thus, the cluster heads consume less energy causing the increase of network lifetime.

Download Full-text

Complex networks clustering algorithm based on the core influence of the nodes

2012 IEEE 31st International Performance Computing and Communications Conference (IPCCC) ◽

10.1109/pccc.2012.6407690 ◽

2012 ◽

Author(s):

Chao Tong ◽

Jianwei Niu ◽

Bin Dai ◽

Jing Peng ◽

Jinyang Fan

Keyword(s):

Complex Networks ◽

Clustering Algorithm ◽

The Core

Download Full-text

Hierarchical kt jet clustering for parallel architectures

Acta Universitatis Sapientiae Informatica ◽

10.1515/ausi-2017-0012 ◽

2017 ◽

Vol 9 (2) ◽

pp. 195-213

Author(s):

Richárd Forster ◽

Ágnes Fülöp

Keyword(s):

Hierarchical Clustering ◽

Particle Physics ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

High Energy ◽

Theoretical Physics ◽

High Energy Particle ◽

Clustering Methods ◽

Hierarchical Clustering Methods ◽

Using Data

AbstractThe reconstruction and analyze of measured data play important role in the research of high energy particle physics. This leads to new results in both experimental and theoretical physics. This requires algorithm improvements and high computer capacity. Clustering algorithm makes it possible to get to know the jet structure more accurately. More granular parallelization of the kt cluster algorithms was explored by combining it with the hierarchical clustering methods used in network evaluations. The kt method allows to know the development of particles due to the collision of high-energy nucleus-nucleus. The hierarchical clustering algorithms works on graphs, so the particle information used by the standard kt algorithm was first transformed into an appropriate graph, representing the network of particles. Testing was done using data samples from the Alice offine library, which contains the required modules to simulate the ALICE detector that is a dedicated Pb-Pb detector. The proposed algorithm was compared to the FastJet toolkit's standard longitudinal invariant kt implementation. Parallelizing the standard non-optimized version of this algorithm utilizing the available CPU architecture proved to be 1:6 times faster, than the standard implementation, while the proposed solution in this paper was able to achieve a 12 times faster computing performance, also being scalable enough to efficiently run on GPUs.

Download Full-text

Tensor Decomposition for Multilayer Networks Clustering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013371 ◽

2019 ◽

Vol 33 ◽

pp. 3371-3378 ◽

Cited By ~ 2

Author(s):

Zitai Chen ◽

Chuan Chen ◽

Zibin Zheng ◽

Yi Zhu

Keyword(s):

Clustering Algorithms ◽

Cluster Structure ◽

Real Life ◽

Nonlinear Least Squares ◽

Tensor Decomposition ◽

Underlying Structure ◽

Network Clustering ◽

Multilayer Networks ◽

Novel Approach ◽

Real World Datasets

Clustering on multilayer networks has been shown to be a promising approach to enhance the accuracy. Various multilayer networks clustering algorithms assume all networks derive from a latent clustering structure, and jointly learn the compatible and complementary information from different networks to excavate one shared underlying structure. However, such an assumption is in conflict with many emerging real-life applications due to the existence of noisy/irrelevant networks. To address this issue, we propose Centroid-based Multilayer Network Clustering (CMNC), a novel approach which can divide irrelevant relationships into different network groups and uncover the cluster structure in each group simultaneously. The multilayer networks is represented within a unified tensor framework for simultaneously capturing multiple types of relationships between a set of entities. By imposing the rank-(Lr,Lr,1) block term decomposition with nonnegativity, we are able to have well interpretations on the multiple clustering results based on graph cut theory. Numerically, we transform this tensor decomposition problem to an unconstrained optimization, thus can solve it efficiently under the nonlinear least squares (NLS) framework. Extensive experimental results on synthetic and real-world datasets show the effectiveness and robustness of our method against noise and irrelevant data.

Download Full-text

Semi-Supervised Clustering Algorithm Based on Small Size of Labeled Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.121-126.4675 ◽

2011 ◽

Vol 121-126 ◽

pp. 4675-4679

Author(s):

Ming Wei Leng ◽

Xiao Yun Chen ◽

Jian Jun Cheng ◽

Long Jie Li

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Nearest Neighbors ◽

Experimental Results ◽

K Nearest Neighbors ◽

Supervised Clustering ◽

The Core ◽

Knn Classification ◽

Core Problem

In many data mining domains, labeled data is very expensive to generate, how to make the best use of labeled data to guide the process of unlabeled clustering is the core problem of semi-supervised clustering. Most of semi-supervised clustering algorithms require a certain amount of labeled data and need set the values of some parameters, different values maybe have different results. In view of this, a new algorithm, called semi-supervised clustering algorithm based on small size of labeled data, is presented, which can use the small size of labeled data to expand labeled dataset by labeling their k-nearest neighbors and only one parameter. We demonstrate our clustering algorithm with three UCI datasets, compared with SSDBSCAN[4] and KNN, the experimental results confirm that accuracy of our clustering algorithm is close to that of KNN classification algorithm.

Download Full-text

Hard c-Means Using Quadratic Penalty-Vector Regularization for Uncertain Data

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2012.p0831 ◽

2012 ◽

Vol 16 (7) ◽

pp. 831-840 ◽

Cited By ~ 1

Author(s):

Yasunori Endo ◽

◽

Arisa Taniguchi ◽

Yukihiro Hamasuna ◽

◽

...

Keyword(s):

Missing Values ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Uncertain Data ◽

Unsupervised Classification ◽

Real Space ◽

Clustering Methods ◽

Cluster Number ◽

Numerical Examples ◽

Classification Technique

Clustering is an unsupervised classification technique for data analysis. In general, each datum in real space is transformed into a point in a pattern space to apply clustering methods. Data cannot often be represented by a point, however, because of its uncertainty, e.g., measurement error margin and missing values in data. In this paper, we will introduce quadratic penalty-vector regularization to handle such uncertain data using Hard c-Means (HCM), which is one of the most typical clustering algorithms. We first propose a new clustering algorithm called hard c-means using quadratic penalty-vector regularization for uncertain data (HCMP). Second, we propose sequential extraction hard c-means using quadratic penalty-vector regularization (SHCMP) to handle datasets whose cluster number is unknown. Furthermore, we verify the effectiveness of our proposed algorithms through numerical examples.

Download Full-text