scholarly journals A Novel Complex Networks Clustering Algorithm Based on the Core Influence of Nodes

2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Chao Tong ◽  
Jianwei Niu ◽  
Bin Dai ◽  
Zhongyu Xie

In complex networks, cluster structure, identified by the heterogeneity of nodes, has become a common and important topological property. Network clustering methods are thus significant for the study of complex networks. Currently, many typical clustering algorithms have some weakness like inaccuracy and slow convergence. In this paper, we propose a clustering algorithm by calculating the core influence of nodes. The clustering process is a simulation of the process of cluster formation in sociology. The algorithm detects the nodes with core influence through their betweenness centrality, and builds the cluster’s core structure by discriminant functions. Next, the algorithm gets the final cluster structure after clustering the rest of the nodes in the network by optimizing method. Experiments on different datasets show that the clustering accuracy of this algorithm is superior to the classical clustering algorithm (Fast-Newman algorithm). It clusters faster and plays a positive role in revealing the real cluster structure of complex networks precisely.

2021 ◽  
Author(s):  
Yizhang Wang ◽  
Di Wang ◽  
You Zhou ◽  
Chai Quek ◽  
Xiaofeng Zhang

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>


Author(s):  
Yu-Chiun Chiou ◽  
Shih-Ta Chou

This paper proposes three ant clustering algorithms (ACAs): ACA-1, ACA-2 and ACA-3. The core logic of the proposed ACAs is to modify the ant colony metaheuristic by reformulating the clustering problem into a network problem. For a clustering problem of N objects and K clusters, a fully connected network of N nodes is formed with link costs, representing the dissimilarity of any two nodes it connects. K ants are then to collect their own nodes according to the link costs and following the pheromone trail laid by previous ants. The proposed three ACAs have been validated on a small-scale problem solved by a total enumeration method. The solution effectiveness at different problem scales consistently shows that ACA-2 outperforms among these three ACAs. A further comparison of ACA-2 with other commonly used clustering methods, including agglomerative hierarchy clustering algorithm (AHCA), K-means algorithm (KMA) and genetic clustering algorithm (GCA), shows that ACA-2 significantly outperforms them in solution effectiveness for the most of cases and also performs considerably better in solution stability as the problem scales or the number of clusters gets larger.


2021 ◽  
Author(s):  
Yizhang Wang ◽  
Di Wang ◽  
You Zhou ◽  
Chai Quek ◽  
Xiaofeng Zhang

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>


2021 ◽  
Vol 10 (4) ◽  
pp. 2170-2180
Author(s):  
Untari N. Wisesty ◽  
Tati Rajab Mengko

This paper aims to conduct an analysis of the SARS-CoV-2 genome variation was carried out by comparing the results of genome clustering using several clustering algorithms and distribution of sequence in each cluster. The clustering algorithms used are K-means, Gaussian mixture models, agglomerative hierarchical clustering, mean-shift clustering, and DBSCAN. However, the clustering algorithm has a weakness in grouping data that has very high dimensions such as genome data, so that a dimensional reduction process is needed. In this research, dimensionality reduction was carried out using principal component analysis (PCA) and autoencoder method with three models that produce 2, 10, and 50 features. The main contributions achieved were the dimensional reduction and clustering scheme of SARS-CoV-2 sequence data and the performance analysis of each experiment on each scheme and hyper parameters for each method. Based on the results of experiments conducted, PCA and DBSCAN algorithm achieve the highest silhouette score of 0.8770 with three clusters when using two features. However, dimensionality reduction using autoencoder need more iterations to converge. On the testing process with Indonesian sequence data, more than half of them enter one cluster and the rest are distributed in the other two clusters.


Author(s):  
Hassan El Alami ◽  
Abdellah Najid

Energy efficiency and throughput are critical factors in the design routing protocols of WSNs. Many routing protocols based on clustering algorithm have been proposed. Current clustering algorithms often use cluster head selection and cluster formation to reduce energy consumption and maximize throughput in WSNs. In this chapter, the authors present a new routing protocol based on smart energy management and throughput maximization for clustered WSNs. The main objective of this protocol is to solve the constraint of closest sensors to the base station which consume relatively more energy in sensed information traffics, and also decrease workload on CHs. This approach divides network field into free area which contains the closest sensors to the base station that communicate directly with, and clustered area which contains the sensors that transmit data to the base station through cluster head. So due to the sensors that communicate directly to the base station, the load on cluster heads is decreased. Thus, the cluster heads consume less energy causing the increase of network lifetime.


2017 ◽  
Vol 9 (2) ◽  
pp. 195-213
Author(s):  
Richárd Forster ◽  
Ágnes Fülöp

AbstractThe reconstruction and analyze of measured data play important role in the research of high energy particle physics. This leads to new results in both experimental and theoretical physics. This requires algorithm improvements and high computer capacity. Clustering algorithm makes it possible to get to know the jet structure more accurately. More granular parallelization of the kt cluster algorithms was explored by combining it with the hierarchical clustering methods used in network evaluations. The kt method allows to know the development of particles due to the collision of high-energy nucleus-nucleus. The hierarchical clustering algorithms works on graphs, so the particle information used by the standard kt algorithm was first transformed into an appropriate graph, representing the network of particles. Testing was done using data samples from the Alice offine library, which contains the required modules to simulate the ALICE detector that is a dedicated Pb-Pb detector. The proposed algorithm was compared to the FastJet toolkit's standard longitudinal invariant kt implementation. Parallelizing the standard non-optimized version of this algorithm utilizing the available CPU architecture proved to be 1:6 times faster, than the standard implementation, while the proposed solution in this paper was able to achieve a 12 times faster computing performance, also being scalable enough to efficiently run on GPUs.


Author(s):  
Zitai Chen ◽  
Chuan Chen ◽  
Zibin Zheng ◽  
Yi Zhu

Clustering on multilayer networks has been shown to be a promising approach to enhance the accuracy. Various multilayer networks clustering algorithms assume all networks derive from a latent clustering structure, and jointly learn the compatible and complementary information from different networks to excavate one shared underlying structure. However, such an assumption is in conflict with many emerging real-life applications due to the existence of noisy/irrelevant networks. To address this issue, we propose Centroid-based Multilayer Network Clustering (CMNC), a novel approach which can divide irrelevant relationships into different network groups and uncover the cluster structure in each group simultaneously. The multilayer networks is represented within a unified tensor framework for simultaneously capturing multiple types of relationships between a set of entities. By imposing the rank-(Lr,Lr,1) block term decomposition with nonnegativity, we are able to have well interpretations on the multiple clustering results based on graph cut theory. Numerically, we transform this tensor decomposition problem to an unconstrained optimization, thus can solve it efficiently under the nonlinear least squares (NLS) framework. Extensive experimental results on synthetic and real-world datasets show the effectiveness and robustness of our method against noise and irrelevant data.


2011 ◽  
Vol 121-126 ◽  
pp. 4675-4679
Author(s):  
Ming Wei Leng ◽  
Xiao Yun Chen ◽  
Jian Jun Cheng ◽  
Long Jie Li

In many data mining domains, labeled data is very expensive to generate, how to make the best use of labeled data to guide the process of unlabeled clustering is the core problem of semi-supervised clustering. Most of semi-supervised clustering algorithms require a certain amount of labeled data and need set the values of some parameters, different values maybe have different results. In view of this, a new algorithm, called semi-supervised clustering algorithm based on small size of labeled data, is presented, which can use the small size of labeled data to expand labeled dataset by labeling their k-nearest neighbors and only one parameter. We demonstrate our clustering algorithm with three UCI datasets, compared with SSDBSCAN[4] and KNN, the experimental results confirm that accuracy of our clustering algorithm is close to that of KNN classification algorithm.


Author(s):  
Yasunori Endo ◽  
◽  
Arisa Taniguchi ◽  
Yukihiro Hamasuna ◽  
◽  
...  

Clustering is an unsupervised classification technique for data analysis. In general, each datum in real space is transformed into a point in a pattern space to apply clustering methods. Data cannot often be represented by a point, however, because of its uncertainty, e.g., measurement error margin and missing values in data. In this paper, we will introduce quadratic penalty-vector regularization to handle such uncertain data using Hard c-Means (HCM), which is one of the most typical clustering algorithms. We first propose a new clustering algorithm called hard c-means using quadratic penalty-vector regularization for uncertain data (HCMP). Second, we propose sequential extraction hard c-means using quadratic penalty-vector regularization (SHCMP) to handle datasets whose cluster number is unknown. Furthermore, we verify the effectiveness of our proposed algorithms through numerical examples.


Sign in / Sign up

Export Citation Format

Share Document