scholarly journals CciMST: A Clustering Algorithm Based on Minimum Spanning Tree and Cluster Centers

2018 ◽  
Vol 2018 ◽  
pp. 1-14 ◽  
Author(s):  
Xiaobo Lv ◽  
Yan Ma ◽  
Xiaofu He ◽  
Hui Huang ◽  
Jie Yang

The minimum spanning tree- (MST-) based clustering method can identify clusters of arbitrary shape by removing inconsistent edges. The definition of the inconsistent edges is a major issue that has to be addressed in all MST-based clustering algorithms. In this paper, we propose a novel MST-based clustering algorithm through the cluster center initialization algorithm, called cciMST. First, in order to capture the intrinsic structure of the data sets, we propose the cluster center initialization algorithm based on geodesic distance and dual densities of the points. Second, we propose and demonstrate that the inconsistent edge is located on the shortest path between the cluster centers, so we can find the inconsistent edge with the length of the edges as well as the densities of their endpoints on the shortest path. Correspondingly, we obtain two groups of clustering results. Third, we propose a novel intercluster separation by computing the distance between the points at the intersection of clusters. Furthermore, we propose a new internal clustering validation measure to select the best clustering result. The experimental results on the synthetic data sets, real data sets, and image data sets demonstrate the good performance of the proposed MST-based method.

2018 ◽  
Vol 27 (2) ◽  
pp. 163-182 ◽  
Author(s):  
Ilanthenral Kandasamy

AbstractNeutrosophy (neutrosophic logic) is used to represent uncertain, indeterminate, and inconsistent information available in the real world. This article proposes a method to provide more sensitivity and precision to indeterminacy, by classifying the indeterminate concept/value into two based on membership: one as indeterminacy leaning towards truth membership and the other as indeterminacy leaning towards false membership. This paper introduces a modified form of a neutrosophic set, called Double-Valued Neutrosophic Set (DVNS), which has these two distinct indeterminate values. Its related properties and axioms are defined and illustrated in this paper. An important role is played by clustering in several fields of research in the form of data mining, pattern recognition, and machine learning. DVNS is better equipped at dealing with indeterminate and inconsistent information, with more accuracy, than the Single-Valued Neutrosophic Set, which fuzzy sets and intuitionistic fuzzy sets are incapable of. A generalised distance measure between DVNSs and the related distance matrix is defined, based on which a clustering algorithm is constructed. This article proposes a Double-Valued Neutrosophic Minimum Spanning Tree (DVN-MST) clustering algorithm, to cluster the data represented by double-valued neutrosophic information. Illustrative examples are given to demonstrate the applications and effectiveness of this clustering algorithm. A comparative study of the DVN-MST clustering algorithm with other clustering algorithms like Single-Valued Neutrosophic Minimum Spanning Tree, Intuitionistic Fuzzy Minimum Spanning Tree, and Fuzzy Minimum Spanning Tree is carried out.


2011 ◽  
Vol 20 (01) ◽  
pp. 139-177 ◽  
Author(s):  
YAN ZHOU ◽  
OLEKSANDR GRYGORASH ◽  
THOMAS F. HAIN

We propose two Euclidean minimum spanning tree based clustering algorithms — one a k-constrained, and the other an unconstrained algorithm. Our k-constrained clustering algorithm produces a k-partition of a set of points for any given k. The algorithm constructs a minimum spanning tree of a set of representative points and removes edges that satisfy a predefined criterion. The process is repeated until k clusters are produced. Our unconstrained clustering algorithm partitions a point set into a group of clusters by maximally reducing the overall standard deviation of the edges in the Euclidean minimum spanning tree constructed from a given point set, without prescribing the number of clusters. We present our experimental results comparing our proposed algorithms with k-means, X-means, CURE, Chameleon, and the Expectation-Maximization (EM) algorithm on both artificial data and benchmark data from the UCI repository. We also apply our algorithms to image color clustering and compare them with the standard minimum spanning tree clustering algorithm as well as CURE, Chameleon, and X-means.


2018 ◽  
Vol 30 (6) ◽  
pp. 1624-1646 ◽  
Author(s):  
Qidong Liu ◽  
Ruisheng Zhang ◽  
Zhili Zhao ◽  
Zhenghai Wang ◽  
Mengyao Jiao ◽  
...  

Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.


Author(s):  
Yuancheng Li ◽  
Yaqi Cui ◽  
Xiaolong Zhang

Background: Advanced Metering Infrastructure (AMI) for the smart grid is growing rapidly which results in the exponential growth of data collected and transmitted in the device. By clustering this data, it can give the electricity company a better understanding of the personalized and differentiated needs of the user. Objective: The existing clustering algorithms for processing data generally have some problems, such as insufficient data utilization, high computational complexity and low accuracy of behavior recognition. Methods: In order to improve the clustering accuracy, this paper proposes a new clustering method based on the electrical behavior of the user. Starting with the analysis of user load characteristics, the user electricity data samples were constructed. The daily load characteristic curve was extracted through improved extreme learning machine clustering algorithm and effective index criteria. Moreover, clustering analysis was carried out for different users from industrial areas, commercial areas and residential areas. The improved extreme learning machine algorithm, also called Unsupervised Extreme Learning Machine (US-ELM), is an extension and improvement of the original Extreme Learning Machine (ELM), which realizes the unsupervised clustering task on the basis of the original ELM. Results: Four different data sets have been experimented and compared with other commonly used clustering algorithms by MATLAB programming. The experimental results show that the US-ELM algorithm has higher accuracy in processing power data. Conclusion: The unsupervised ELM algorithm can greatly reduce the time consumption and improve the effectiveness of clustering.


Author(s):  
R. R. Gharieb ◽  
G. Gendy ◽  
H. Selim

In this paper, the standard hard C-means (HCM) clustering approach to image segmentation is modified by incorporating weighted membership Kullback–Leibler (KL) divergence and local data information into the HCM objective function. The membership KL divergence, used for fuzzification, measures the proximity between each cluster membership function of a pixel and the locally-smoothed value of the membership in the pixel vicinity. The fuzzification weight is a function of the pixel to cluster-centers distances. The used pixel to a cluster-center distance is composed of the original pixel data distance plus a fraction of the distance generated from the locally-smoothed pixel data. It is shown that the obtained membership function of a pixel is proportional to the locally-smoothed membership function of this pixel multiplied by an exponentially distributed function of the minus pixel distance relative to the minimum distance provided by the nearest cluster-center to the pixel. Therefore, since incorporating the locally-smoothed membership and data information in addition to the relative distance, which is more tolerant to additive noise than the absolute distance, the proposed algorithm has a threefold noise-handling process. The presented algorithm, named local data and membership KL divergence based fuzzy C-means (LDMKLFCM), is tested by synthetic and real-world noisy images and its results are compared with those of several FCM-based clustering algorithms.


2019 ◽  
Author(s):  
Marcelo Benedito ◽  
Lehilton Pedrosa ◽  
Hugo Rosado

In the Cable-Trench Problem (CTP), the objective is to find a rooted spanning tree of a weighted graph that minimizes the length of the tree, scaled by a non-negative factor , plus the sum of all shortest-path lengths from the root, scaled by another non-negative factor. This is an intermediate optimization problem between the Single-Destination Shortest Path Problem and the Minimum Spanning Tree Problem. In this extended abstract, we consider the Generalized CTP (GCTP), in which some vertices need not be connected to the root, but may serve as cost-saving merging points; this variant also generalizes the Steiner Tree Problem. We present an 8.599-approximation algorithm for GCTP. Before this paper, no constant approximation for the standard CTP was known.


2011 ◽  
pp. 24-32 ◽  
Author(s):  
Nicoleta Rogovschi ◽  
Mustapha Lebbah ◽  
Younès Bennani

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.


2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.


2021 ◽  
pp. 1-18
Author(s):  
Angeliki Koutsimpela ◽  
Konstantinos D. Koutroumbas

Several well known clustering algorithms have their own online counterparts, in order to deal effectively with the big data issue, as well as with the case where the data become available in a streaming fashion. However, very few of them follow the stochastic gradient descent philosophy, despite the fact that the latter enjoys certain practical advantages (such as the possibility of (a) running faster than their batch processing counterparts and (b) escaping from local minima of the associated cost function), while, in addition, strong theoretical convergence results have been established for it. In this paper a novel stochastic gradient descent possibilistic clustering algorithm, called O- PCM 2 is introduced. The algorithm is presented in detail and it is rigorously proved that the gradient of the associated cost function tends to zero in the L 2 sense, based on general convergence results established for the family of the stochastic gradient descent algorithms. Furthermore, an additional discussion is provided on the nature of the points where the algorithm may converge. Finally, the performance of the proposed algorithm is tested against other related algorithms, on the basis of both synthetic and real data sets.


Sign in / Sign up

Export Citation Format

Share Document