scholarly journals Community Detection Method Based on Node Density, Degree Centrality, and K-Means Clustering in Complex Network

Entropy ◽  
2019 ◽  
Vol 21 (12) ◽  
pp. 1145 ◽  
Author(s):  
Cai ◽  
Zeng ◽  
Wang ◽  
Li ◽  
Hu

Community detection in networks plays a key role in understanding their structures, and the application of clustering algorithms in community detection tasks in complex networks has attracted intensive attention in recent years. In this paper, based on the definition of uncertainty of node community belongings, the node density is proposed first. After that, the DD (the combination of node density and node degree centrality) is proposed for initial node selection in community detection. Finally, based on the DD and k-means clustering algorithm, we proposed a community detection approach, the density-degree centrality-jaccard-k-means method (DDJKM). The DDJKM algorithm can avoid the problem of random selection of initial cluster centers in conventional k-means clustering algorithms, so that isolated nodes will not be selected as initial cluster centers. Additionally, DDJKM can reduce the iteration times in the clustering process and the over-short distances between the initial cluster centers can be avoided by calculating the node similarity. The proposed method is compared with state-of-the-art algorithms on synthetic networks and real-world networks. The experimental results show the effectiveness of the proposed method in accurately describing the community. The results also show that the DDJKM is practical a approach for the detection of communities with large network datasets.

2020 ◽  
Vol 2020 ◽  
pp. 1-6
Author(s):  
Shuxia Ren ◽  
Shubo Zhang ◽  
Tao Wu

The similarity graphs of most spectral clustering algorithms carry lots of wrong community information. In this paper, we propose a probability matrix and a novel improved spectral clustering algorithm based on the probability matrix for community detection. First, the Markov chain is used to calculate the transition probability between nodes, and the probability matrix is constructed by the transition probability. Then, the similarity graph is constructed with the mean probability matrix. Finally, community detection is achieved by optimizing the NCut objective function. The proposed algorithm is compared with SC, WT, FG, FluidC, and SCRW on artificial networks and real networks. Experimental results show that the proposed algorithm can detect communities more accurately and has better clustering performance.


2021 ◽  
Vol 4 ◽  
Author(s):  
Jie Yang ◽  
Yu-Kai Wang ◽  
Xin Yao ◽  
Chin-Teng Lin

The K-means algorithm is a widely used clustering algorithm that offers simplicity and efficiency. However, the traditional K-means algorithm uses a random method to determine the initial cluster centers, which make clustering results prone to local optima and then result in worse clustering performance. In this research, we propose an adaptive initialization method for the K-means algorithm (AIMK) which can adapt to the various characteristics in different datasets and obtain better clustering performance with stable results. For larger or higher-dimensional datasets, we even leverage random sampling in AIMK (name as AIMK-RS) to reduce the time complexity. 22 real-world datasets were applied for performance comparisons. The experimental results show AIMK and AIMK-RS outperform the current initialization methods and several well-known clustering algorithms. Specifically, AIMK-RS can significantly reduce the time complexity to O (n). Moreover, we exploit AIMK to initialize K-medoids and spectral clustering, and better performance is also explored. The above results demonstrate superior performance and good scalability by AIMK or AIMK-RS. In the future, we would like to apply AIMK to more partition-based clustering algorithms to solve real-life practical problems.


2010 ◽  
Vol 29-32 ◽  
pp. 802-808
Author(s):  
Min Min

On analyzing the common problems in fuzzy clustering algorithms, we put forward the combined fuzzy clustering one, which will automatically generate a reasonable clustering numbers and initial cluster center. This clustering algorithm has been tested by real evaluation data of teaching designs. The result proves that the combined fuzzy clustering based on F-statistic is more effective.


2013 ◽  
Vol 462-463 ◽  
pp. 458-461
Author(s):  
Jian Jun Cheng ◽  
Peng Fei Wang ◽  
Qi Bin Zhang ◽  
Zheng Quan Zhang ◽  
Ming Wei Leng ◽  
...  

This paper proposes an algorithm called DDSCDA, which is based on the concepts of the node degree difference and the node similarity. In the algorithm, we iteratively extract the node from the network with larger degree and certified the node as a kernel node, then take the kernel node as the founder or initiator of a community to attract its neighbors to join in that community; by doing so, we obtain a partition corresponding to a coarse-grained community structure of the network. Finally taken the coarse-grained community as a starting point, we use the strategy of LPA to propagate labels through the network further. At the end of the algorithm, we obtain the final community structure. We compared the performance with classical community detection algorithms such as LPA, LPAm, FastQ, etc., the experimental results have manifested that our proposal is a feasible algorithm, can extract higher quality communities from the network, and outperforms the previous algorithms significantly.


2014 ◽  
Vol 998-999 ◽  
pp. 873-877
Author(s):  
Zhen Bo Wang ◽  
Bao Zhi Qiu

To reduce the impact of irrelevant attributes on clustering results, and improve the importance of relevant attributes to clustering, this paper proposes fuzzy C-means clustering algorithm based on coefficient of variation (CV-FCM). In the algorithm, coefficient of variation is used to weigh attributes so as to assign different weights to each attribute in the data set, and the magnitude of weight is used to express the importance of different attributes to clusters. In addition, for the characteristic of fuzzy C-means clustering algorithm that it is susceptible to initial cluster center value, the method for the selection of initial cluster center based on maximum distance is introduced on the basis of weighted coefficient of variation. The result of the experiment based on real data sets shows that this algorithm can select cluster center effectively, with the clustering result superior to general fuzzy C-means clustering algorithms.


Kybernetes ◽  
2016 ◽  
Vol 45 (8) ◽  
pp. 1273-1291 ◽  
Author(s):  
Runhai Jiao ◽  
Shaolong Liu ◽  
Wu Wen ◽  
Biying Lin

Purpose The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on incremental clustering which divides data into series of data chunks and only a small amount of data need to be clustered at each time. Few researches on incremental clustering algorithm address the problem of optimizing cluster center initialization for each data chunk and selecting multiple passing points for each cluster. Design/methodology/approach Through optimizing initial cluster centers, quality of clustering results is improved for each data chunk and then quality of final clustering results is enhanced. Moreover, through selecting multiple passing points, more accurate information is passed down to improve the final clustering results. The method has been proposed to solve those two problems and is applied in the proposed algorithm based on streaming kernel fuzzy c-means (stKFCM) algorithm. Findings Experimental results show that the proposed algorithm demonstrates more accuracy and better performance than streaming kernel stKFCM algorithm. Originality/value This paper addresses the problem of improving the performance of increment clustering through optimizing cluster center initialization and selecting multiple passing points. The paper analyzed the performance of the proposed scheme and proved its effectiveness.


Author(s):  
Peihua Gu

Abstract Clustering based algorithms have been used for machine cells grouping for years and have been considered as a feasible approach because the algorithms are flexible and easy to implement on computers. However, two major deficiencies have been identified for clustering algorithms in this research; they are inconsistency and possible mis-clustering. Inconsistency is usually caused by arbitrary determination of initial cluster centres and mis-clustering, from the manufacturing point of view, is due to pattern similarity as the unique criterion for the cluster-seeking process. A new heuristic clustering algorithm has been developed and applied to formation of machine cells. The algorithm not only overcomes common drawbacks of clustering methods, but also minimizes bottleneck machines required. The algorithm consists of three parts: 1) determination of initial clustering centres, 2) cluster-seeking process, and 3) minimization of number of bottleneck machines. The initial clustering centres are determined by finding a set of components which require the most different machine operations, then the cluster-seeking process is carried out based on manufacturing similarity. The final step is to minimize the number of bottleneck machines required for forming independent cells. Another advantage of the algorithm is that it can generate alternate cells configurations by simply changing input parameters. This provides an opportunity to resolve conflict between formed cells and some physical constraints such as plant space, machine location, cranes available and the like. Examples are provided to illustrate the approach.


2013 ◽  
Vol 273 ◽  
pp. 250-254
Author(s):  
Yang Pan ◽  
An Hua Chen ◽  
Ling Li Jiang

According to the selection difficulties of initial clustering center of k-means clustering algorithm, this paper proposes a method that is to use complex network degree to improve k-means clustering algorithm for fault pattern recognition method, and to improve the accuracy of clustering. Use network to represent fault data structure, with joint connecting matrix to express similarity between nodes, according to the complex concepts of networks degree, calculate the size of every node degree, and select the maximum degree of node as k-means clustering initial center. This method is applied to the rolling bearing clustering diagnosis example, achieving good fault diagnosis effect. This study provides a new method for the selection of initial cluster centers of K-means clustering.


Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


Sign in / Sign up

Export Citation Format

Share Document