Noises Cutting and Natural Neighbors Spectral Clustering Based on Coupling P System

Clustering analysis, a key step for many data mining problems, can be applied to various fields. However, no matter what kind of clustering method, noise points have always been an important factor affecting the clustering effect. In addition, in spectral clustering, the construction of affinity matrix affects the formation of new samples, which in turn affects the final clustering results. Therefore, this study proposes a noise cutting and natural neighbors spectral clustering method based on coupling P system (NCNNSC-CP) to solve the above problems. The whole algorithm process is carried out in the coupled P system. We propose a natural neighbors searching method without parameters, which can quickly determine the natural neighbors and natural characteristic value of data points. Then, based on it, the critical density and reverse density are obtained, and noise identification and cutting are performed. The affinity matrix constructed using core natural neighbors greatly improve the similarity between data points. Experimental results on nine synthetic data sets and six UCI datasets demonstrate that the proposed algorithm is better than other comparison algorithms.

Download Full-text

Dynamic and Optimized Prototype Clustering for Relational Data based on Multiple Prototypes

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.l2795.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 5630-5634

Keyword(s):

Artificial Intelligence ◽

Data Clustering ◽

Synthetic Data ◽

Medical Data ◽

Relational Data ◽

Data Sets ◽

Optimization Approach ◽

Complex Task ◽

Data Points ◽

Clustering Approach

In artificial intelligence related applications such as bio-medical, bio-informatics, data clustering is an important and complex task with different situations. Prototype based clustering is the reasonable and simplicity to describe and evaluate data which can be treated as non-vertical representation of relational data. Because of Barycentric space present in prototype clustering, maintain and update the structure of the cluster with different data points is still challenging task for different data points in bio-medical relational data. So that in this paper we propose and introduce A Novel Optimized Evidential C-Medoids (NOEC) which is relates to family o prototype based clustering approach for update and proximity of medical relational data. We use Ant Colony Optimization approach to enable the services of similarity with different features for relational update cluster medical data. Perform our approach on different bio-medical related synthetic data sets. Experimental results of proposed approach give better and efficient results with comparison of different parameters in terms of accuracy and time with processing of medical relational data sets.

Download Full-text

A Novel Membrane Clustering Algorithm Based on Tissue-like P System

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v2.i2.pp409-416 ◽

2016 ◽

Vol 2 (2) ◽

pp. 409

Author(s):

Yan Huaning ◽

Xiang Laisheng ◽

Liu Xiyu ◽

Xue Jie

Keyword(s):

Clustering Algorithm ◽

Membrane Computing ◽

P System ◽

Data Sets ◽

Turing Machines ◽

Computing Systems ◽

Swarm Optimization ◽

Computing Model ◽

Clustering Quality ◽

Data Points

<span lang="EN-US">Clustering is a process of partitioning data points into different clusters due to their similarity, as a powerful technique of data mining, clustering is widely used in many fields. Membrane computing is a computing model abstracting from the biological area, </span><span lang="EN-US">these computing systems are proved to be so powerful that they are equivalent with Turing machines. In this paper, a modified inversion particle swarm optimization was proposed, this method and the mutational mechanism of genetics algorithm were used to combine with the tissue-like P system, through these evolutionary algorithms and the P system, the idea of a novel membrane clustering algorithm could come true. Experiments were tested on six data sets, by comparing the clustering quality with the GA-K-means, PSO-K-means and K-means proved the superiority of our method.</span>

Download Full-text

Automatic Scale Parameters in Affinity Matrix Construction for Improved Spectral Clustering

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001416500233 ◽

2016 ◽

Vol 30 (10) ◽

pp. 1650023 ◽

Cited By ~ 1

Author(s):

S. Mohanavalli ◽

S. M. Jaisakthi ◽

Chandrabose Aravindan

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Scale Parameter ◽

Research Work ◽

Error Rates ◽

Scale Factor ◽

Data Sets ◽

Affinity Matrix ◽

Similarity Relationship ◽

Automatic Scale

Spectral clustering partitions data into similar groups in the eigenspace of the affinity matrix. The accuracy of the spectral clustering algorithm is affected by the affine equivariance realized in the translation of distance to similarity relationship. The similarity value computed as a Gaussian of the distance between data objects is sensitive to the scale factor [Formula: see text]. The value of [Formula: see text], a control parameter of drop in affinity value, is generally a fixed constant or determined by manual tuning. In this research work, [Formula: see text] is determined automatically from the distance values i.e. the similarity relationship that exists in the real data space. The affinity value of a data pair is determined as a location estimate of the spread of distance values of the data points with the other points. The scale factor [Formula: see text] corresponding to a data point [Formula: see text] is computed as the trimean of its distance vector and used in fixing the scale to compute the affinity matrix. Our proposed automatic scale parameter for spectral clustering resulted in a robust similarity matrix which is affine equivariant with the distance distribution and also eliminates the overhead of manual tuning to find the best [Formula: see text] value. The performance of spectral clustering using such affinity matrices was analyzed using UCI data sets and image databases. The obtained scores for NMI, ARI, Purity and F-score were observed to be equivalent to those of existing works and better for most of the data sets. The proposed scale factor was used in various state-of-the-art spectral clustering algorithms and it proves to perform well irrespective of the normalization operations applied in the algorithms. A comparison of clustering error rates obtained for various data sets across the algorithms shows that the proposed automatic scale factor is successful in clustering the data sets equivalent to that obtained using manually tuned best [Formula: see text] value. Thus the automatic scale factor proposed in this research work eliminates the need for exhaustive grid search for the best scale parameter that results in best clustering performance.

Download Full-text

Hypergraph Optimization for Multi-Structural Geometric Model Fitting

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018730 ◽

2019 ◽

Vol 33 ◽

pp. 8730-8737

Author(s):

Shuyuan Lin ◽

Guobao Xiao ◽

Yan Yan ◽

David Suter ◽

Hanzi Wang

Keyword(s):

Spectral Clustering ◽

Input Data ◽

State Of The Art ◽

Geometric Model ◽

Model Fitting ◽

Synthetic Data ◽

Estimation Algorithm ◽

Sampling Efficiency ◽

Data Points ◽

Fitting In

Recently, some hypergraph-based methods have been proposed to deal with the problem of model fitting in computer vision, mainly due to the superior capability of hypergraph to represent the complex relationship between data points. However, a hypergraph becomes extremely complicated when the input data include a large number of data points (usually contaminated with noises and outliers), which will significantly increase the computational burden. In order to overcome the above problem, we propose a novel hypergraph optimization based model fitting (HOMF) method to construct a simple but effective hypergraph. Specifically, HOMF includes two main parts: an adaptive inlier estimation algorithm for vertex optimization and an iterative hyperedge optimization algorithm for hyperedge optimization. The proposed method is highly efficient, and it can obtain accurate model fitting results within a few iterations. Moreover, HOMF can then directly apply spectral clustering, to achieve good fitting performance. Extensive experimental results show that HOMF outperforms several state-of-the-art model fitting methods on both synthetic data and real images, especially in sampling efficiency and in handling data with severe outliers.

Download Full-text

Estimation of mutual information for real-valued data with error bars and controlled bias

10.1101/589929 ◽

2019 ◽

Cited By ~ 2

Author(s):

Caroline M. Holmes ◽

Ilya Nemenman

Keyword(s):

Mutual Information ◽

Free Parameter ◽

Probability Distributions ◽

Synthetic Data ◽

Quantum Systems ◽

Data Sets ◽

Hard Problem ◽

Self Consistent ◽

Data Points ◽

Error Bars

Estimation of mutual information between (multidimensional) real-valued variables is used in analysis of complex systems, biological systems, and recently also quantum systems. This estimation is a hard problem, and universally good estimators provably do not exist. Kraskov et al. (PRE, 2004) introduced a successful mutual information estimation approach based on the statistics of distances between neighboring data points, which empirically works for a wide class of underlying probability distributions. Here we improve this estimator by (i) expanding its range of applicability, and by providing (ii) a self-consistent way of verifying the absence of bias, (iii) a method for estimation of its variance, and (iv) a criterion for choosing the values of the free parameter of the estimator. We demonstrate the performance of our estimator on synthetic data sets, as well as on neurophysiological and systems biology data sets.

Download Full-text

AN ALGORITHMIC COMPUTATION OF CORRELATION DIMENSION FROM TIME SERIES

Modern Physics Letters B ◽

10.1142/s0217984907012517 ◽

2007 ◽

Vol 21 (02n03) ◽

pp. 129-138 ◽

Cited By ~ 5

Author(s):

K. P. HARIKRISHNAN ◽

G. AMBIKA ◽

R. MISRA

Keyword(s):

Time Series ◽

Hypothesis Testing ◽

Correlation Dimension ◽

Visual Inspection ◽

Chaotic Systems ◽

Synthetic Data ◽

Data Sets ◽

Data Points ◽

Scaling Region ◽

Low Dimensional

We present an algorithmic scheme to compute the correlation dimension D2 of a time series, without requiring the visual inspection of the scaling region in the correlation sum. It is based on the standard Grassberger–Proccacia [GP] algorithm for computing D2. The scheme is tested using synthetic data sets from several standard chaotic systems as well as by adding noise to low-dimensional chaotic data. We show that the scheme is efficient with a few thousand data points and is most suitable when a nonsubjective comparison of D2 values of two time series is required, such as, in hypothesis testing.

Download Full-text

A posteriori noise estimation in variable data sets

Astronomy and Astrophysics ◽

10.1051/0004-6361/201730618 ◽

2018 ◽

Vol 609 ◽

pp. A39 ◽

Cited By ~ 7

Author(s):

S. Czesla ◽

T. Molle ◽

J. H. M. M. Schmitt

Keyword(s):

Standard Deviation ◽

Synthetic Data ◽

Light Curves ◽

Weighted Sums ◽

Data Sets ◽

A Posteriori ◽

Sampled Data ◽

Data Set ◽

Specific Parameter ◽

Data Points

Most physical data sets contain a stochastic contribution produced by measurement noise or other random sources along with the signal. Usually, neither the signal nor the noise are accurately known prior to the measurement so that both have to be estimated a posteriori. We have studied a procedure to estimate the standard deviation of the stochastic contribution assuming normality and independence, requiring a sufficiently well-sampled data set to yield reliable results. This procedure is based on estimating the standard deviation in a sample of weighted sums of arbitrarily sampled data points and is identical to the so-called DER_SNR algorithm for specific parameter settings. To demonstrate the applicability of our procedure, we present applications to synthetic data, high-resolution spectra, and a large sample of space-based light curves and, finally, give guidelines to apply the procedure in situation not explicitly considered here to promote its adoption in data analysis.

Download Full-text

A new affinity matrix weighted k-nearest neighbors graph to improve spectral clustering accuracy

PeerJ Computer Science ◽

10.7717/peerj-cs.692 ◽

2021 ◽

Vol 7 ◽

pp. e692

Author(s):

Muhammad Jamal Ahmed ◽

Faisal Saeed ◽

Anand Paul ◽

Sadeeq Jan ◽

Hyuncheol Seo

Keyword(s):

Spectral Clustering ◽

Large Data ◽

Nearest Neighbors ◽

Data Sets ◽

Affinity Matrix ◽

Clustering Methods ◽

K Nearest Neighbors ◽

Learning Techniques ◽

Interesting Task ◽

Similarity Graph

Researchers have thought about clustering approaches that incorporate traditional clustering methods and deep learning techniques. These approaches normally boost the performance of clustering. Getting knowledge from large data-sets is quite an interesting task. In this case, we use some dimensionality reduction and clustering techniques. Spectral clustering is gaining popularity recently because of its performance. Lately, numerous techniques have been introduced to boost spectral clustering performance. One of the most significant part of these techniques is to construct a similarity graph. We introduced weighted k-nearest neighbors technique for the construction of similarity graph. Using this new metric for the construction of affinity matrix, we achieved good results as we tested it both on real and artificial data-sets.

Download Full-text

New K-means Clustering Method Using Minkowski’s Distance as its Metric

British Journal of Computer, Networking and Information Technology ◽

10.52589/bjcnit-xepsjbwx ◽

2021 ◽

Vol 4 (1) ◽

pp. 28-41

Author(s):

Eric U.O. ◽

Michael O.O. ◽

Oberhiri-Orumah G. ◽

Chike H. N.

Keyword(s):

Euclidean Distance ◽

Real Life ◽

Simulated Data ◽

Manhattan Distance ◽

Data Sets ◽

Normed Vector Space ◽

Clustering Methods ◽

Clustering Method ◽

Real Life Data ◽

Data Points

Cluster analysis is an unsupervised learning method that classifies data points, usually multidimensional into groups (called clusters) such that members of one cluster are more similar (in some sense) to each other than those in other clusters. In this paper, we propose a new k-means clustering method that uses Minkowski’s distance as its metric in a normed vector space which is the generalization of both the Euclidean distance and the Manhattan distance. The k-means clustering methods discussed in this paper are Forgy’s method, Lloyd’s method, MacQueen’s method, Hartigan and Wong’s method, Likas’ method and Faber’s method which uses the usual Euclidean distance. It was observed that the new k-means clustering method performed favourably in comparison with the existing methods in terms of minimization of the total intra-cluster variance using simulated data and real-life data sets.

Download Full-text

Development of a Spectral Clustering Method for the Analysis of Molecular Data Sets.

ChemInform ◽

10.1002/chin.200749224 ◽

2007 ◽

Vol 38 (49) ◽

Author(s):

Mark L. Brewer

Keyword(s):

Spectral Clustering ◽

Molecular Data ◽

Data Sets ◽

Clustering Method ◽

Spectral Clustering Method

Download Full-text