Constrained spectral clustering via multi–layer graph embeddings on a grassmann manifold

Abstract We present two algorithms in which constrained spectral clustering is implemented as unconstrained spectral clustering on a multi-layer graph where constraints are represented as graph layers. By using the Nystrom approximation in one of the algorithms, we obtain time and memory complexities which are linear in the number of data points regardless of the number of constraints. Our algorithms achieve superior or comparative accuracy on real world data sets, compared with the existing state-of-the-art solutions. However, the complexity of these algorithms is squared with the number of vertices, while our technique, based on the Nyström approximation method, has linear time complexity. The proposed algorithms efficiently use both soft and hard constraints since the time complexity of the algorithms does not depend on the size of the set of constraints.

Download Full-text

Enhancing Both Efficiency and Representational Capability of Isomap by Extensive Landmark Selection

Mathematical Problems in Engineering ◽

10.1155/2015/241436 ◽

2015 ◽

Vol 2015 ◽

pp. 1-18 ◽

Cited By ~ 2

Author(s):

Dong Liang ◽

Chen Qiao ◽

Zongben Xu

Keyword(s):

Manifold Learning ◽

Computational Efficiency ◽

New Method ◽

Data Sets ◽

Learning Approaches ◽

Real World Data ◽

Efficiency Property ◽

Data Points ◽

Low Dimensional ◽

High Computational Efficiency

The problems of improving computational efficiency and extending representational capability are the two hottest topics in approaches of global manifold learning. In this paper, a new method called extensive landmark Isomap (EL-Isomap) is presented, addressing both topics simultaneously. On one hand, originated from landmark Isomap (L-Isomap), which is known for its high computational efficiency property, EL-Isomap also possesses high computational efficiency through utilizing a small set of landmarks to embed all data points. On the other hand, EL-Isomap significantly extends the representational capability of L-Isomap and other global manifold learning approaches by utilizing only an available subset from the whole landmark set instead of all to embed each point. Particularly, compared with other manifold learning approaches, the data manifolds with intrinsic low-dimensional concave topologies and essential loops can be unwrapped by the new method more successfully, which are shown by simulation results on a series of synthetic and real-world data sets. Moreover, the accuracy, robustness, and computational complexity of EL-Isomap are analyzed in this paper, and the relation between EL-Isomap and L-Isomap is also discussed theoretically.

Download Full-text

Anchors Bring Ease: An Embarrassingly Simple Approach to Partial Multi-View Clustering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301118 ◽

2019 ◽

Vol 33 ◽

pp. 118-125 ◽

Cited By ~ 2

Author(s):

Jun Guo ◽

Jiahui Ye

Keyword(s):

Spectral Clustering ◽

Time Complexity ◽

State Of The Art ◽

Research Problem ◽

Simple Approach ◽

Clustering Methods ◽

Real World Data ◽

The Past ◽

Benchmark Datasets ◽

In Virtue Of

Clustering on multi-view data has attracted much more attention in the past decades. Most previous studies assume that each instance appears in all views, or there is at least one view containing all instances. However, real world data often suffers from missing some instances in each view, leading to the research problem of partial multi-view clustering. To address this issue, this paper proposes a simple yet effective Anchorbased Partial Multi-view Clustering (APMC) method, which utilizes anchors to reconstruct instance-to-instance relationships for clustering. APMC is conceptually simple and easy to implement in practice, besides it has clear intuitions and non-trivial empirical guarantees. Specifically, APMC firstly integrates intra- and inter- view similarities through anchors. Then, spectral clustering is performed on the fused similarities to obtain a unified clustering result. Compared with existing partial multi-view clustering methods, APMC has three notable advantages: 1) it can capture more non-linear relations among instances with the help of kernel-based similarities; 2) it has a much lower time complexity in virtue of a noniterative scheme; 3) it can inherently handle data with negative entries as well as be extended to more than two views. Finally, we extensively evaluate the proposed method on five benchmark datasets. Experimental results demonstrate the superiority of APMC over state-of-the-art approaches.

Download Full-text

Noises Cutting and Natural Neighbors Spectral Clustering Based on Coupling P System

Processes ◽

10.3390/pr9030439 ◽

2021 ◽

Vol 9 (3) ◽

pp. 439

Author(s):

Xiaoling Zhang ◽

Xiyu Liu

Keyword(s):

Spectral Clustering ◽

Critical Density ◽

Synthetic Data ◽

P System ◽

Data Sets ◽

Affinity Matrix ◽

Clustering Method ◽

Data Points ◽

Comparison Algorithms ◽

Searching Method

Clustering analysis, a key step for many data mining problems, can be applied to various fields. However, no matter what kind of clustering method, noise points have always been an important factor affecting the clustering effect. In addition, in spectral clustering, the construction of affinity matrix affects the formation of new samples, which in turn affects the final clustering results. Therefore, this study proposes a noise cutting and natural neighbors spectral clustering method based on coupling P system (NCNNSC-CP) to solve the above problems. The whole algorithm process is carried out in the coupled P system. We propose a natural neighbors searching method without parameters, which can quickly determine the natural neighbors and natural characteristic value of data points. Then, based on it, the critical density and reverse density are obtained, and noise identification and cutting are performed. The affinity matrix constructed using core natural neighbors greatly improve the similarity between data points. Experimental results on nine synthetic data sets and six UCI datasets demonstrate that the proposed algorithm is better than other comparison algorithms.

Download Full-text

Affinity Learning for Mixed Data Clustering

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/302 ◽

2017 ◽

Cited By ~ 1

Author(s):

Nan Li ◽

Longin Jan Latecki

Keyword(s):

Data Clustering ◽

Mixed Type ◽

Original Data ◽

Mixed Data ◽

Abstract Objects ◽

Data Sets ◽

Process Data ◽

Real World Data ◽

Specific Data ◽

Data Points

In this paper, we propose a novel affinity learning based framework for mixed data clustering, which includes: how to process data with mixed-type attributes, how to learn affinities between data points, and how to exploit the learned affinities for clustering. In the proposed framework, each original data attribute is represented with several abstract objects defined according to the specific data type and values. Each attribute value is transformed into the initial affinities between the data point and the abstract objects of attribute. We refine these affinities and infer the unknown affinities between data points by taking into account the interconnections among the attribute values of all data points. The inferred affinities between data points can be exploited for clustering. Alternatively, the refined affinities between data points and the abstract objects of attributes can be transformed into new data features for clustering. Experimental results on many real world data sets demonstrate that the proposed framework is effective for mixed data clustering.

Download Full-text

Beyond the Nystrom Approximation: Speeding up Spectral Clustering using Uniform Sampling and Weighted Kernel k-means

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/347 ◽

2017 ◽

Cited By ~ 1

Author(s):

Mahesh Mohan ◽

Claire Monteleoni

Keyword(s):

Spectral Clustering ◽

Computation Time ◽

Kernel Space ◽

Simple Scheme ◽

Uniform Sampling ◽

Space Experiments ◽

Nyström Approximation ◽

Weighted Kernel ◽

Data Points ◽

Set Of Points

In this paper we present a framework for spectral clustering based on the following simple scheme: sample a subset of the input points, compute the clusters for the sampled subset using weighted kernel k-means (Dhillon et al. 2004) and use the resulting centers to compute a clustering for the remaining data points. For the case where the points are sampled uniformly at random without replacement, we show that the number of samples required depends mainly on the number of clusters and the diameter of the set of points in the kernel space. Experiments show that the proposed framework outperforms the approaches based on the Nystrom approximation both in terms of accuracy and computation time.

Download Full-text

LINEAR TIME RELATIONAL PROTOTYPE BASED LEARNING

International Journal of Neural Systems ◽

10.1142/s0129065712500219 ◽

2012 ◽

Vol 22 (05) ◽

pp. 1250021 ◽

Cited By ~ 14

Author(s):

ANDREJ GISBRECHT ◽

BASSAM MOKBEL ◽

FRANK-MICHAEL SCHLEIF ◽

XIBIN ZHU ◽

BARBARA HAMMER

Keyword(s):

Linear Time ◽

Approximation Technique ◽

Data Sets ◽

Generative Topographic Mapping ◽

Dissimilarity Matrix ◽

Dissimilarity Data ◽

Classification Technique ◽

Nyström Approximation ◽

The One ◽

Intuitive Interface

Prototype based learning offers an intuitive interface to inspect large quantities of electronic data in supervised or unsupervised settings. Recently, many techniques have been extended to data described by general dissimilarities rather than Euclidean vectors, so-called relational data settings. Unlike the Euclidean counterparts, the techniques have quadratic time complexity due to the underlying quadratic dissimilarity matrix. Thus, they are infeasible already for medium sized data sets. The contribution of this article is twofold: On the one hand we propose a novel supervised prototype based classification technique for dissimilarity data based on popular learning vector quantization (LVQ), on the other hand we transfer a linear time approximation technique, the Nyström approximation, to this algorithm and an unsupervised counterpart, the relational generative topographic mapping (GTM). This way, linear time and space methods result. We evaluate the techniques on three examples from the biomedical domain.

Download Full-text

Subspace Clustering of High-Dimensional Data: An Evolutionary Approach

Applied Computational Intelligence and Soft Computing ◽

10.1155/2013/863146 ◽

2013 ◽

Vol 2013 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Singh Vijendra ◽

Sahoo Laxman

Keyword(s):

Clustering Algorithm ◽

Dimensional Space ◽

Clustering Algorithms ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Data Sets ◽

Real World Data ◽

Data Set ◽

Data Points

Clustering high-dimensional data has been a major challenge due to the inherent sparsity of the points. Most existing clustering algorithms become substantially inefficient if the required similarity measure is computed between data points in the full-dimensional space. In this paper, we have presented a robust multi objective subspace clustering (MOSCL) algorithm for the challenging problem of high-dimensional clustering. The first phase of MOSCL performs subspace relevance analysis by detecting dense and sparse regions with their locations in data set. After detection of dense regions it eliminates outliers. MOSCL discovers subspaces in dense regions of data set and produces subspace clusters. In thorough experiments on synthetic and real-world data sets, we demonstrate that MOSCL for subspace clustering is superior to PROCLUS clustering algorithm. Additionally we investigate the effects of first phase for detecting dense regions on the results of subspace clustering. Our results indicate that removing outliers improves the accuracy of subspace clustering. The clustering results are validated by clustering error (CE) distance on various data sets. MOSCL can discover the clusters in all subspaces with high quality, and the efficiency of MOSCL outperforms PROCLUS.

Download Full-text

Robust MST-Based Clustering Algorithm

Neural Computation ◽

10.1162/neco_a_01081 ◽

2018 ◽

Vol 30 (6) ◽

pp. 1624-1646 ◽

Cited By ~ 1

Author(s):

Qidong Liu ◽

Ruisheng Zhang ◽

Zhili Zhao ◽

Zhenghai Wang ◽

Mengyao Jiao ◽

...

Keyword(s):

Clustering Algorithm ◽

Minimum Spanning Tree ◽

Clustering Algorithms ◽

Low Rank ◽

Data Sets ◽

Real World Data ◽

Data Set ◽

Rank Matrix ◽

Data Points ◽

Low Rank Matrix

Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.

Download Full-text

An improved spectral clustering algorithm based on local neighbors in kernel space

Computer Science and Information Systems ◽

10.2298/csis110415064l ◽

2011 ◽

Vol 8 (4) ◽

pp. 1143-1157 ◽

Cited By ~ 5

Author(s):

Xinyue Liu ◽

Xing Yong ◽

Hongfei Lin

Keyword(s):

Real World ◽

Spectral Clustering ◽

Clustering Algorithm ◽

Sparse Matrix ◽

Feature Space ◽

Data Sets ◽

Kernel Space ◽

Real World Data ◽

World Data ◽

Linear Reconstruction

Similarity matrix is critical to the performance of spectral clustering. Mercer kernels have become popular largely due to its successes in applying kernel methods such as kernel PCA. A novel spectral clustering method is proposed based on local neighborhood in kernel space (SC-LNK), which assumes that each data point can be linearly reconstructed from its neighbors. The SC-LNK algorithm tries to project the data to a feature space by the Mercer kernel, and then learn a sparse matrix using linear reconstruction as the similarity graph for spectral clustering. Experiments have been performed on synthetic and real world data sets and have shown that spectral clustering based on linear reconstruction in kernel space outperforms the conventional spectral clustering and the other two algorithms, especially in real world data sets.

Download Full-text

Constrained Dual Graph Regularized Orthogonal Nonnegative Matrix Tri-Factorization for Co-Clustering

Mathematical Problems in Engineering ◽

10.1155/2019/7565640 ◽

2019 ◽

Vol 2019 ◽

pp. 1-17

Author(s):

Shaodi Ge ◽

Hongjun Li ◽

Liuhong Luo

Keyword(s):

Nearest Neighbor ◽

Clustering Algorithms ◽

Dual Graph ◽

Nonnegative Matrix ◽

Data Sets ◽

Optimization Scheme ◽

Hard Constraints ◽

Label Information ◽

Data Points ◽

Learning Data

Coclustering approaches for grouping data points and features have recently been receiving extensive attention. In this paper, we propose a constrained dual graph regularized orthogonal nonnegative matrix trifactorization (CDONMTF) algorithm to solve the coclustering problems. The new method improves the clustering performance obviously by employing hard constraints to retain the priori label information of samples, establishing two nearest neighbor graphs to encode the geometric structure of data manifold and feature manifold, and combining with biorthogonal constraints as well. In addition, we have also derived the iterative optimization scheme of CDONMTF and proved its convergence. Clustering experiments on 5 UCI machine-learning data sets and 7 image benchmark data sets show that the achievement of the proposed algorithm is superior to that of some existing clustering algorithms.

Download Full-text