scholarly journals Spectral Clustering with Local Projection Distance Measurement

2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
Chen Diao ◽  
Ai-Hua Zhang ◽  
Bin Wang

Constructing a rational affinity matrix is crucial for spectral clustering. In this paper, a novel spectral clustering via local projection distance measure (LPDM) is proposed. In this method, the Local-Projection-Neighborhood (LPN) is defined, which is a region between a pair of data, and other data in the LPN are projected onto the straight line among the data pairs. Utilizing the Euclidean distance between projective points, the local spatial structure of data can be well detected to measure the similarity of objects. Then the affinity matrix can be obtained by using a new similarity measurement, which can squeeze or widen the projective distance with the different spatial structure of data. Experimental results show that the LPDM algorithm can obtain desirable results with high performance on synthetic datasets, real-world datasets, and images.

Author(s):  
Feiping Nie ◽  
Jing Li ◽  
Xuelong Li

In multiview learning, it is essential to assign a reasonable weight to each view according to its importance. Thus, for multiview clustering task, a wise and elegant method should achieve clustering multiview data while learning the view weights. In this paper, we address this problem by exploring a Laplacian rank constrained graph, which can be approximately as the centroid of the built graph for each view with different confidences. We start our work with a natural thought that the weights can be learned by introducing a hyperparameter. By analyzing the weakness of it, we further propose a new multiview clustering method which is totally self-weighted. Furthermore, once the target graph is obtained in our models, we can directly assign the cluster label to each data point and do not need any postprocessing such as $K$-means in standard spectral clustering. Evaluations on two synthetic datasets prove the effectiveness of our methods. Compared with several representative graph-based multiview clustering approaches on four real-world datasets, experimental results demonstrate that the proposed methods achieve the better performances and our new clustering method is more practical to use.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
João Lobo ◽  
Rui Henriques ◽  
Sara C. Madeira

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.


2018 ◽  
Vol 7 (12) ◽  
pp. 467 ◽  
Author(s):  
Mengyu Ma ◽  
Ye Wu ◽  
Wenze Luo ◽  
Luo Chen ◽  
Jun Li ◽  
...  

Buffer analysis, a fundamental function in a geographic information system (GIS), identifies areas by the surrounding geographic features within a given distance. Real-time buffer analysis for large-scale spatial data remains a challenging problem since the computational scales of conventional data-oriented methods expand rapidly with increasing data volume. In this paper, we introduce HiBuffer, a visualization-oriented model for real-time buffer analysis. An efficient buffer generation method is proposed which introduces spatial indexes and a corresponding query strategy. Buffer results are organized into a tile-pyramid structure to enable stepless zooming. Moreover, a fully optimized hybrid parallel processing architecture is proposed for the real-time buffer analysis of large-scale spatial data. Experiments using real-world datasets show that our approach can reduce computation time by up to several orders of magnitude while preserving superior visualization effects. Additional experiments were conducted to analyze the influence of spatial data density, buffer radius, and request rate on HiBuffer performance, and the results demonstrate the adaptability and stability of HiBuffer. The parallel scalability of HiBuffer was also tested, showing that HiBuffer achieves high performance of parallel acceleration. Experimental results verify that HiBuffer is capable of handling 10-million-scale data.


Data clustering is an active topic of research as it has applications in various fields such as biology, management, statistics, pattern recognition, etc. Spectral Clustering (SC) has gained popularity in recent times due to its ability to handle complex data and ease of implementation. A crucial step in spectral clustering is the construction of the affinity matrix, which is based on a pairwise similarity measure. The varied characteristics of datasets affect the performance of a spectral clustering technique. In this paper, we have proposed an affinity measure based on Topological Node Features (TNFs) viz., Clustering Coefficient (CC) and Summation index (SI) to define the notion of density and local structure. It has been shown that these features improve the performance of SC in clustering the data. The experiments were conducted on synthetic datasets, UCI datasets, and the MNIST handwritten datasets. The results show that the proposed affinity metric outperforms several recent spectral clustering methods in terms of accuracy.


2020 ◽  
Vol 34 (04) ◽  
pp. 6837-6844
Author(s):  
Xiaojin Zhang ◽  
Honglei Zhuang ◽  
Shengyu Zhang ◽  
Yuan Zhou

We study a variant of the thresholding bandit problem (TBP) in the context of outlier detection, where the objective is to identify the outliers whose rewards are above a threshold. Distinct from the traditional TBP, the threshold is defined as a function of the rewards of all the arms, which is motivated by the criterion for identifying outliers. The learner needs to explore the rewards of the arms as well as the threshold. We refer to this problem as "double exploration for outlier detection". We construct an adaptively updated confidence interval for the threshold, based on the estimated value of the threshold in the previous rounds. Furthermore, by automatically trading off exploring the individual arms and exploring the outlier threshold, we provide an efficient algorithm in terms of the sample complexity. Experimental results on both synthetic datasets and real-world datasets demonstrate the efficiency of our algorithm.


1988 ◽  
Vol 53 (11) ◽  
pp. 2627-2636 ◽  
Author(s):  
Giorgio Fassina ◽  
Michal Lebl ◽  
Irwin M. Chaiken

Analytical high performance liquid affinity chromatography with immobilized neurophysin was used as a molecular screen to evaluate the effects of peptide hormone structure modification on protein recognition. Immobilization of neurophysin on silica and highly cross-linked agarose occurred with retention of oxytocin and vasopressin binding properties. The effects of one-residue-at-a-time mutation, multi-site sequence simplification, and sequence randomization of critical contact residues were evaluated by extent of binding of the peptides on the affinity matrix. The analytical chromatography method also was used as a stereoselective detector to identify racemic contaminants in peptide hormone preparations.


2019 ◽  
Vol 170 ◽  
pp. 26-42 ◽  
Author(s):  
Xinmin Tao ◽  
Ruotong Wang ◽  
Rui Chang ◽  
Chenxi Li ◽  
Rui Liu ◽  
...  

2012 ◽  
Vol 155-156 ◽  
pp. 342-347 ◽  
Author(s):  
Xun Biao Zhong ◽  
Xiao Xia Huang

In order to solve the density based outlier detection problem with low accuracy and high computation, a variance of distance and density (VDD) measure is proposed in this paper. And the k-means clustering and score based VDD (KSVDD) approach proposed can efficiently detect outliers with high performance. For illustration, two real-world datasets are utilized to show the feasibility of the approach. Empirical results show that KSVDD has a good detection precision.


Sign in / Sign up

Export Citation Format

Share Document