A Novel Membrane Clustering Algorithm Based on Tissue-like P System

Author(s):  
Yan Huaning ◽  
Xiang Laisheng ◽  
Liu Xiyu ◽  
Xue Jie

<span lang="EN-US">Clustering is a process of partitioning data points into different clusters due to their similarity, as a powerful technique of data mining, clustering is widely used in many fields. Membrane computing is a computing model abstracting from the biological area, </span><span lang="EN-US">these computing systems are proved to be so powerful that they are equivalent with Turing machines. In this paper, a modified inversion particle swarm optimization was proposed, this method and the mutational mechanism of genetics algorithm were used to combine with the tissue-like P system, through these evolutionary algorithms and the P system, the idea of a novel membrane clustering algorithm could come true. Experiments were tested on six data sets, by comparing the clustering quality with the GA-K-means, PSO-K-means and K-means proved the superiority of our method.</span>

2020 ◽  
Vol 1 (3) ◽  
Author(s):  
Hailong Chen ◽  
Miaomiao Ge ◽  
Yutong Xue

Abstract Density peak clustering (DPC) algorithm is to find clustering centers by calculating the local density and distance of data points based on the distance between data points and the cutoff distance (dc) set manually. Generally, the attribute calculation between data points is simply obtained by Euclidean distance. However, when the density distribution of data points in data sets is uneven, there are high-density and low-density points, and the dc value is set artificially and randomly, this will seriously affect the clustering results of DPC algorithm. For this reason, a clustering algorithm which combines teaching and learning optimization algorithm and density gap is proposed (NSTLBO-DGDPC). First, in order to consider the influence of data point attributes and neighborhoods, the density difference distance is introduced to replace the Euclidean distance of the original algorithm. Secondly, because manual selection of clustering centers may produce incorrect clustering results, the standard deviation of high-density distance is used to determine the clustering centers of clustering algorithm. Finally, using the teaching and learning optimization algorithm (TLBO) to find the optimal value, in order to avoid the algorithm falling into local optimum. When the population density reaches a certain threshold, the niche selection strategy is introduced to discharge the similarity value, and then the nonlinear decreasing strategy is used to update the students in the teaching stage and the learning stage to obtain the optimal dc solution. In this paper, the accuracy and convergence of the improved TLBO algorithm (NSTLBO) are verified by ten benchmark functions. Simulation experiments show that the NSTLBO algorithm has better performance. Clustering algorithm integrating teaching and learning optimization algorithm and density gap proposed in this paper are validated by using eight synthetic data sets and eight real data sets. The simulation results show that the algorithm has better clustering quality and effect.


2011 ◽  
Vol 268-270 ◽  
pp. 811-816
Author(s):  
Yong Zhou ◽  
Yan Xing

Affinity Propagation(AP)is a new clustering algorithm, which is based on the similarity matrix between pairs of data points and messages are exchanged between data points until clustering result emerges. It is efficient and fast , and it can solve the clustering on large data sets. But the traditional Affinity Propagation has many limitations, this paper introduces the Affinity Propagation, and analyzes in depth the advantages and limitations of it, focuses on the improvements of the algorithm — improve the similarity matrix, adjust the preference and the damping-factor, combine with other algorithms. Finally, discusses the development of Affinity Propagation.


Author(s):  
M. EMRE CELEBI ◽  
HASSAN A. KINGRAVI

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. Many of these methods, however, have superlinear complexity in the number of data points, making them impractical for large data sets. On the other hand, linear methods are often random and/or order-sensitive, which renders their results unrepeatable. Recently, Su and Dy proposed two highly successful hierarchical initialization methods named Var-Part and PCA-Part that are not only linear, but also deterministic (nonrandom) and order-invariant. In this paper, we propose a discriminant analysis based approach that addresses a common deficiency of these two methods. Experiments on a large and diverse collection of data sets from the UCI machine learning repository demonstrate that Var-Part and PCA-Part are highly competitive with one of the best random initialization methods to date, i.e. k-means++, and that the proposed approach significantly improves the performance of both hierarchical methods.


2011 ◽  
Vol 34 (7) ◽  
pp. 876-890 ◽  
Author(s):  
Ruochen Liu ◽  
Xiaojuan Sun ◽  
Licheng Jiao ◽  
Yangyang Li

The cluster validity index plays an important role in most clustering algorithm based natural computations. So far, four typical cluster validity indexes have been proposed for clustering data with different structures, including the Euclid distance based Pakhira–Bandyopadhyay–Maulik index, the kernel function induced Chou–Su measure, the point symmetry distance based index and the manifold distance (MD) induced index. However, there is no detailed comparison made among these indexes. This paper compares these four cluster validity indexes by using a simple clustering technique based on particle swarm optimization (PSO). Extensive experiments on a large number of artificial synthesized data sets and UC Irvine data sets, texture images and synthetic-aperture radar images are performed in order to make a comprehensive comparison. Experimental results show that the PSO-based clustering algorithm using the MD induced index has a good performance on most of the data sets.


Author(s):  
SANGHAMITRA BANDYOPADHYAY ◽  
UJJWAL MAULIK ◽  
MALAY KUMAR PAKHIRA

An efficient partitional clustering technique, called SAKM-clustering, that integrates the power of simulated annealing for obtaining minimum energy configuration, and the searching capability of K-means algorithm is proposed in this article. The clustering methodology is used to search for appropriate clusters in multidimensional feature space such that a similarity metric of the resulting clusters is optimized. Data points are redistributed among the clusters probabilistically, so that points that are farther away from the cluster center have higher probabilities of migrating to other clusters than those which are closer to it. The superiority of the SAKM-clustering algorithm over the widely used K-means algorithm is extensively demonstrated for artificial and real life data sets.


Author(s):  
UREERAT WATTANACHON ◽  
CHIDCHANOK LURSINSAP

Existing clustering algorithms, such as single-link clustering, k-means, CURE, and CSM are designed to find clusters based on predefined parameters specified by users. These algorithms may be unsuccessful if the choice of parameters is inappropriate with respect to the data set being clustered. Most of these algorithms work very well for compact and hyper-spherical clusters. In this paper, a new hybrid clustering algorithm called Self-Partition and Self-Merging (SPSM) is proposed. The SPSM algorithm partitions the input data set into several subclusters in the first phase and, then, removes the noisy data in the second phase. In the third phase, the normal subclusters are continuously merged to form the larger clusters based on the inter-cluster distance and intra-cluster distance criteria. From the experimental results, the SPSM algorithm is very efficient to handle the noisy data set, and to cluster the data sets of arbitrary shapes of different density. Several examples for color image show the versatility of the proposed method and compare with results described in the literature for the same images. The computational complexity of the SPSM algorithm is O(N2), where N is the number of data points.


2021 ◽  
Vol 24 (1) ◽  
pp. 42-47
Author(s):  
N. P. Koryshev ◽  
◽  
I. A. Hodashinsky ◽  

The article presents a description of the algorithm for generating fuzzy rules for a fuzzy classifier using data clustering, metaheuristic, and the clustering quality index, as well as the results of performance testing on real data sets.


2018 ◽  
Vol 13 (5) ◽  
pp. 759-771 ◽  
Author(s):  
Guangchun Chen ◽  
Juan Hu ◽  
Hong Peng ◽  
Jun Wang ◽  
Xiangnian Huang

Using spectral clustering algorithm is diffcult to find the clusters in the cases that dataset has a large difference in density and its clustering effect depends on the selection of initial centers. To overcome the shortcomings, we propose a novel spectral clustering algorithm based on membrane computing framework, called MSC algorithm, whose idea is to use membrane clustering algorithm to realize the clustering component in spectral clustering. A tissue-like P system is used as its computing framework, where each object in cells denotes a set of cluster centers and velocity-location model is used as the evolution rules. Under the control of evolutioncommunication mechanism, the tissue-like P system can obtain a good clustering partition for each dataset. The proposed spectral clustering algorithm is evaluated on three artiffcial datasets and ten UCI datasets, and it is further compared with classical spectral clustering algorithms. The comparison results demonstrate the advantage of the proposed spectral clustering algorithm.


Author(s):  
Md. Zakir Hossain ◽  
Md.Nasim Akhtar ◽  
R.B. Ahmad ◽  
Mostafijur Rahman

<span>Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets.  The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed.The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of K-Means and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.</span>


2011 ◽  
Vol 48-49 ◽  
pp. 753-756
Author(s):  
Xin Quan Chen

Facing to the shortcoming of Affinity Propagation algorithm (AP), we present two expanded and improved AP algorithms. In the two algorithms, the AP algorithm based on Grid Cell (APGC) is an effective extension of AP algorithm on the level of grid cells, and the AP clustering algorithm based on Near neighbour Sampling (APNS) is trying to make some improving in time and space complexity. From some simulated comparison experiments of three algorithms, we know that APGC and APNS algorithms have evident improving than AP algorithm in time and space complexity. They can not only get a good clustering quality for massive data sets, but also filtrate noises and isolates well. So we can say they are two effective clustering algorithms with much applied prospect. At last, several research directions are presented.


Sign in / Sign up

Export Citation Format

Share Document