Spectral Clustering with Local Projection Distance Measurement

Constructing a rational affinity matrix is crucial for spectral clustering. In this paper, a novel spectral clustering via local projection distance measure (LPDM) is proposed. In this method, the Local-Projection-Neighborhood (LPN) is defined, which is a region between a pair of data, and other data in the LPN are projected onto the straight line among the data pairs. Utilizing the Euclidean distance between projective points, the local spatial structure of data can be well detected to measure the similarity of objects. Then the affinity matrix can be obtained by using a new similarity measurement, which can squeeze or widen the projective distance with the different spatial structure of data. Experimental results show that the LPDM algorithm can obtain desirable results with high performance on synthetic datasets, real-world datasets, and images.

Download Full-text

Self-weighted Multiview Clustering with Multiple Graphs

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/357 ◽

2017 ◽

Cited By ~ 44

Author(s):

Feiping Nie ◽

Jing Li ◽

Xuelong Li

Keyword(s):

Real World ◽

Spectral Clustering ◽

Experimental Results ◽

Clustering Method ◽

Elegant Method ◽

Multiview Learning ◽

Cluster Label ◽

Real World Datasets ◽

Synthetic Datasets ◽

Multiview Clustering

In multiview learning, it is essential to assign a reasonable weight to each view according to its importance. Thus, for multiview clustering task, a wise and elegant method should achieve clustering multiview data while learning the view weights. In this paper, we address this problem by exploring a Laplacian rank constrained graph, which can be approximately as the centroid of the built graph for each view with different confidences. We start our work with a natural thought that the weights can be learned by introducing a hyperparameter. By analyzing the weakness of it, we further propose a new multiview clustering method which is totally self-weighted. Furthermore, once the target graph is obtained in our models, we can directly assign the cluster label to each data point and do not need any postprocessing such as $K$-means in standard spectral clustering. Evaluations on two synthetic datasets prove the effectiveness of our methods. Compared with several representative graph-based multiview clustering approaches on four real-world datasets, experimental results demonstrate that the proposed methods achieve the better performances and our new clustering method is more practical to use.

Download Full-text

G-Tric: generating three-way synthetic datasets with triclustering solutions

BMC Bioinformatics ◽

10.1186/s12859-020-03925-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

João Lobo ◽

Rui Henriques ◽

Sara C. Madeira

Keyword(s):

State Of The Art ◽

Synthetic Data ◽

Ground Truth ◽

Real Data ◽

Three Dimensions ◽

Additional Advantage ◽

Urban Dynamics ◽

Data Generator ◽

Real World Datasets ◽

Synthetic Datasets

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.

Download Full-text

HiBuffer: Buffer Analysis of 10-Million-Scale Spatial Data in Real Time

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7120467 ◽

2018 ◽

Vol 7 (12) ◽

pp. 467 ◽

Cited By ~ 3

Author(s):

Mengyu Ma ◽

Ye Wu ◽

Wenze Luo ◽

Luo Chen ◽

Jun Li ◽

...

Keyword(s):

Real Time ◽

Spatial Data ◽

High Performance ◽

Large Scale ◽

Computation Time ◽

Buffer Analysis ◽

Data Volume ◽

Time Buffer ◽

Real World Datasets ◽

Spatial Indexes

Buffer analysis, a fundamental function in a geographic information system (GIS), identifies areas by the surrounding geographic features within a given distance. Real-time buffer analysis for large-scale spatial data remains a challenging problem since the computational scales of conventional data-oriented methods expand rapidly with increasing data volume. In this paper, we introduce HiBuffer, a visualization-oriented model for real-time buffer analysis. An efficient buffer generation method is proposed which introduces spatial indexes and a corresponding query strategy. Buffer results are organized into a tile-pyramid structure to enable stepless zooming. Moreover, a fully optimized hybrid parallel processing architecture is proposed for the real-time buffer analysis of large-scale spatial data. Experiments using real-world datasets show that our approach can reduce computation time by up to several orders of magnitude while preserving superior visualization effects. Additional experiments were conducted to analyze the influence of spatial data density, buffer radius, and request rate on HiBuffer performance, and the results demonstrate the adaptability and stability of HiBuffer. The parallel scalability of HiBuffer was also tested, showing that HiBuffer achieves high performance of parallel acceleration. Experimental results verify that HiBuffer is capable of handling 10-million-scale data.

Download Full-text

Enhanced Affinity for Spectral Clustering using Topological Node Features (TNFS)

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a9450.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 974-987

Keyword(s):

Local Structure ◽

Data Clustering ◽

Spectral Clustering ◽

Clustering Coefficient ◽

Complex Data ◽

Clustering Methods ◽

Pairwise Similarity ◽

Synthetic Datasets ◽

Summation Index ◽

Affinity Measure

Data clustering is an active topic of research as it has applications in various fields such as biology, management, statistics, pattern recognition, etc. Spectral Clustering (SC) has gained popularity in recent times due to its ability to handle complex data and ease of implementation. A crucial step in spectral clustering is the construction of the affinity matrix, which is based on a pairwise similarity measure. The varied characteristics of datasets affect the performance of a spectral clustering technique. In this paper, we have proposed an affinity measure based on Topological Node Features (TNFs) viz., Clustering Coefficient (CC) and Summation index (SI) to define the notion of density and local structure. It has been shown that these features improve the performance of SC in clustering the data. The experiments were conducted on synthetic datasets, UCI datasets, and the MNIST handwritten datasets. The results show that the proposed affinity metric outperforms several recent spectral clustering methods in terms of accuracy.

Download Full-text

Spectral clustering on protein-protein interaction networks via constructing affinity matrix using attributed graph embedding

Computers in Biology and Medicine ◽

10.1016/j.compbiomed.2021.104933 ◽

2021 ◽

pp. 104933

Author(s):

Kamal Berahmand ◽

Elahe Nasiri ◽

Rojiar Pir mohammadiani ◽

Yuefeng Li

Keyword(s):

Protein Interaction ◽

Spectral Clustering ◽

Graph Embedding ◽

Protein Interaction Networks ◽

Interaction Networks ◽

Affinity Matrix ◽

Protein Protein Interaction ◽

Attributed Graph ◽

Protein Protein Interaction Networks

Download Full-text

Adaptive Double-Exploration Tradeoff for Outlier Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6164 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6837-6844

Author(s):

Xiaojin Zhang ◽

Honglei Zhuang ◽

Shengyu Zhang ◽

Yuan Zhou

Keyword(s):

Confidence Interval ◽

Outlier Detection ◽

Real World ◽

Efficient Algorithm ◽

Experimental Results ◽

Sample Complexity ◽

Bandit Problem ◽

Real World Datasets ◽

Synthetic Datasets ◽

The Individual

We study a variant of the thresholding bandit problem (TBP) in the context of outlier detection, where the objective is to identify the outliers whose rewards are above a threshold. Distinct from the traditional TBP, the threshold is defined as a function of the rewards of all the arms, which is motivated by the criterion for identifying outliers. The learner needs to explore the rewards of the arms as well as the threshold. We refer to this problem as "double exploration for outlier detection". We construct an adaptively updated confidence interval for the threshold, based on the estimated value of the threshold in the previous rounds. Furthermore, by automatically trading off exploring the individual arms and exploring the outlier threshold, we provide an efficient algorithm in terms of the sample complexity. Experimental results on both synthetic datasets and real-world datasets demonstrate the efficiency of our algorithm.

Download Full-text

Screening the recognition properties of peptide hormone sequence mutants by analytical high performance liquid affinity chromatography on immobilized neurophysin

Collection of Czechoslovak Chemical Communications ◽

10.1135/cccc19882627 ◽

1988 ◽

Vol 53 (11) ◽

pp. 2627-2636 ◽

Cited By ~ 4

Author(s):

Giorgio Fassina ◽

Michal Lebl ◽

Irwin M. Chaiken

Keyword(s):

Affinity Chromatography ◽

High Performance ◽

Peptide Hormone ◽

Protein Recognition ◽

Affinity Matrix ◽

Structure Modification ◽

Binding Properties ◽

Chromatography Method ◽

Contact Residues ◽

Critical Contact

Analytical high performance liquid affinity chromatography with immobilized neurophysin was used as a molecular screen to evaluate the effects of peptide hormone structure modification on protein recognition. Immobilization of neurophysin on silica and highly cross-linked agarose occurred with retention of oxytocin and vasopressin binding properties. The effects of one-residue-at-a-time mutation, multi-site sequence simplification, and sequence randomization of critical contact residues were evaluated by extent of binding of the peptides on the affinity matrix. The analytical chromatography method also was used as a stereoselective detector to identify racemic contaminants in peptide hormone preparations.

Download Full-text

Strain- and sex-specific differences in the glutathione S-transferase class pi in the mouse examined by gradient elution of the glutathione-affinity matrix and reverse-phase high performance liquid chromatography

Biochimica et Biophysica Acta (BBA) - General Subjects ◽

10.1016/0304-4165(94)00138-n ◽

1995 ◽

Vol 1243 (2) ◽

pp. 256-264 ◽

Cited By ~ 6

Author(s):

E Egaas

Keyword(s):

High Performance Liquid Chromatography ◽

Liquid Chromatography ◽

High Performance ◽

Gradient Elution ◽

Affinity Matrix ◽

Glutathione S Transferase ◽

Reverse Phase

Download Full-text

Spectral clustering algorithm using density-sensitive distance measure with global and local consistencies

Knowledge-Based Systems ◽

10.1016/j.knosys.2019.01.026 ◽

2019 ◽

Vol 170 ◽

pp. 26-42 ◽

Cited By ~ 7

Author(s):

Xinmin Tao ◽

Ruotong Wang ◽

Rui Chang ◽

Chenxi Li ◽

Rui Liu ◽

...

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Distance Measure ◽

Global And Local ◽

Spectral Clustering Algorithm

Download Full-text

An Efficient Distance and Density Based Outlier Detection Approach

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.155-156.342 ◽

2012 ◽

Vol 155-156 ◽

pp. 342-347 ◽

Cited By ~ 1

Author(s):

Xun Biao Zhong ◽

Xiao Xia Huang

Keyword(s):

Outlier Detection ◽

Real World ◽

High Performance ◽

Detection Problem ◽

Empirical Results ◽

Detection Approach ◽

Real World Datasets ◽

Good Detection

In order to solve the density based outlier detection problem with low accuracy and high computation, a variance of distance and density (VDD) measure is proposed in this paper. And the k-means clustering and score based VDD (KSVDD) approach proposed can efficiently detect outliers with high performance. For illustration, two real-world datasets are utilized to show the feasibility of the approach. Empirical results show that KSVDD has a good detection precision.

Download Full-text