A framework for evaluating the performance of SMLM cluster analysis algorithms

Mapping Intimacies ◽

10.1101/2021.06.19.449098 ◽

2021 ◽

Author(s):

Daniel J Nieves ◽

Jeremy A. Pike ◽

Florian Levet ◽

Juliette Griffié ◽

Daniel Sage ◽

...

Keyword(s):

Experimental Data ◽

Cluster Analysis ◽

Single Molecule ◽

Systematic Approach ◽

Clustering Algorithms ◽

Adjusted Rand Index ◽

Analysis Algorithm ◽

Optimal Analysis ◽

Meaningful Information ◽

Localisation Microscopy

Single molecule localisation microscopy (SMLM) generates data in the form of Cartesian coordinates of localised fluorophores. Cluster analysis is an attractive route for extracting biologically meaningful information from such data and has been widely applied. Despite the range of developed cluster analysis algorithms, there exists no consensus framework for the evaluation of their performance. Here, we use a systematic approach based on two metrics, the Adjusted Rand Index (ARI) and Intersection over Union (IoU), to score the success of clustering algorithms in diverse simulated clustering scenarios mimicking experimental data. We demonstrate the framework using three analysis algorithms: DBSCAN, ToMATo and KDE, show how to deduce optimal analysis parameters and how they are affected by fluorophore multiple blinking. We propose that these standard conditions and metrics become the basis for future analysis algorithm development and evaluation.

Download Full-text

DRSA: a non-hierarchical clustering algorithm using k-NN graph and its application in vegetation classification

Vegetation of Russia ◽

10.31111/vegrus/2015.27.125 ◽

2015 ◽

pp. 125-138 ◽

Cited By ~ 2

Author(s):

I. V. Goncharenko

Keyword(s):

Cluster Analysis ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Protein Structures ◽

Hierarchical Cluster ◽

Vegetation Classification ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Nearest Neighbor Graph

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classiﬁcation was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.

Download Full-text

Single bacterial resolvases first exploit, then constrain intrinsic dynamics of the Holliday junction to direct recombination

Nucleic Acids Research ◽

10.1093/nar/gkab096 ◽

2021 ◽

Author(s):

Sujay Ray ◽

Nibedita Pal ◽

Nils G Walter

Keyword(s):

Cluster Analysis ◽

Homologous Recombination ◽

Single Molecule ◽

Holliday Junction ◽

Energetic Cost ◽

Genomic Integrity ◽

Dna Molecules ◽

Single Molecule Fluorescence ◽

And Cluster Analysis ◽

Fluorescence Observation

Abstract Homologous recombination forms and resolves an entangled DNA Holliday Junction (HJ) crucial for achieving genetic reshuffling and genome repair. To maintain genomic integrity, specialized resolvase enzymes cleave the entangled DNA into two discrete DNA molecules. However, it is unclear how two similar stacking isomers are distinguished, and how a cognate sequence is found and recognized to achieve accurate recombination. We here use single-molecule fluorescence observation and cluster analysis to examine how prototypic bacterial resolvase RuvC singles out two of the four HJ strands and achieves sequence-specific cleavage. We find that RuvC first exploits, then constrains the dynamics of intrinsic HJ isomer exchange at a sampled branch position to direct cleavage toward the catalytically competent HJ conformation and sequence, thus controlling recombination output at minimal energetic cost. Our model of rapid DNA scanning followed by ‘snap-locking’ of a cognate sequence is strikingly consistent with the conformational proofreading of other DNA-modifying enzymes.

Download Full-text

Dendrogramic Representation of Data: CHSH Violation vs. Nonergodicity

Entropy ◽

10.3390/e23080971 ◽

2021 ◽

Vol 23 (8) ◽

pp. 971

Author(s):

Oded Shor ◽

Felix Benninger ◽

Andrei Khrennikov

Keyword(s):

Experimental Data ◽

Clustering Algorithms ◽

Realistic Model ◽

Minkowski Geometry ◽

Ultrametric Spaces ◽

Chsh Inequality ◽

Classical Quantum ◽

Basic Features ◽

Totally Disconnected ◽

Explicate Order

This paper is devoted to the foundational problems of dendrogramic holographic theory (DH theory). We used the ontic–epistemic (implicate–explicate order) methodology. The epistemic counterpart is based on the representation of data by dendrograms constructed with hierarchic clustering algorithms. The ontic universe is described as a p-adic tree; it is zero-dimensional, totally disconnected, disordered, and bounded (in p-adic ultrametric spaces). Classical–quantum interrelations lose their sharpness; generally, simple dendrograms are “more quantum” than complex ones. We used the CHSH inequality as a measure of quantum-likeness. We demonstrate that it can be violated by classical experimental data represented by dendrograms. The seed of this violation is neither nonlocality nor a rejection of realism, but the nonergodicity of dendrogramic time series. Generally, the violation of ergodicity is one of the basic features of DH theory. The dendrogramic representation leads to the local realistic model that violates the CHSH inequality. We also considered DH theory for Minkowski geometry and monitored the dependence of CHSH violation and nonergodicity on geometry, as well as a Lorentz transformation of data.

Download Full-text

Research on the Construction of College English Online education Resources Based on Cluster Analysis Algorithm

10.1145/3482632.3487510 ◽

2021 ◽

Author(s):

Longqing Hu

Keyword(s):

Cluster Analysis ◽

Online Education ◽

College English ◽

Analysis Algorithm ◽

Education Resources

Download Full-text

Investigating Chromatin Organisation Using Single Molecule Localisation Microscopy

Springer Theses - Chromatin Architecture ◽

10.1007/978-3-319-52183-1_2 ◽

2017 ◽

pp. 25-61 ◽

Cited By ~ 2

Author(s):

Kirti Prakash

Keyword(s):

Single Molecule ◽

Chromatin Organisation ◽

Localisation Microscopy

Download Full-text

A Data Distribution View of Clustering Algorithms

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch059 ◽

2011 ◽

pp. 374-381 ◽

Cited By ~ 1

Author(s):

Junjie Wu ◽

Jian Chen ◽

Hui Xiong

Keyword(s):

Data Mining ◽

Cluster Analysis ◽

Clustering Analysis ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Distribution ◽

Point Of View ◽

Group Method ◽

Data Sets ◽

Distribution Point

Cluster analysis (Jain & Dubes, 1988) provides insight into the data by dividing the objects into groups (clusters), such that objects in a cluster are more similar to each other than objects in other clusters. Cluster analysis has long played an important role in a wide variety of fields, such as psychology, bioinformatics, pattern recognition, information retrieval, machine learning, and data mining. Many clustering algorithms, such as K-means and Unweighted Pair Group Method with Arithmetic Mean (UPGMA), have been wellestablished. A recent research focus on clustering analysis is to understand the strength and weakness of various clustering algorithms with respect to data factors. Indeed, people have identified some data characteristics that may strongly affect clustering analysis including high dimensionality and sparseness, the large size, noise, types of attributes and data sets, and scales of attributes (Tan, Steinbach, & Kumar, 2005). However, further investigation is expected to reveal whether and how the data distributions can have the impact on the performance of clustering algorithms. Along this line, we study clustering algorithms by answering three questions: 1. What are the systematic differences between the distributions of the resultant clusters by different clustering algorithms? 2. How can the distribution of the “true” cluster sizes make impact on the performances of clustering algorithms? 3. How to choose an appropriate clustering algorithm in practice? The answers to these questions can guide us for the better understanding and the use of clustering methods. This is noteworthy, since 1) in theory, people seldom realized that there are strong relationships between the clustering algorithms and the cluster size distributions, and 2) in practice, how to choose an appropriate clustering algorithm is still a challenging task, especially after an algorithm boom in data mining area. This chapter thus tries to fill this void initially. To this end, we carefully select two widely used categories of clustering algorithms, i.e., K-means and Agglomerative Hierarchical Clustering (AHC), as the representative algorithms for illustration. In the chapter, we first show that K-means tends to generate the clusters with a relatively uniform distribution on the cluster sizes. Then we demonstrate that UPGMA, one of the robust AHC methods, acts in an opposite way to K-means; that is, UPGMA tends to generate the clusters with high variation on the cluster sizes. Indeed, the experimental results indicate that the variations of the resultant cluster sizes by K-means and UPGMA, measured by the Coefficient of Variation (CV), are in the specific intervals, say [0.3, 1.0] and [1.0, 2.5] respectively. Finally, we put together K-means and UPGMA for a further comparison, and propose some rules for the better choice of the clustering schemes from the data distribution point of view.

Download Full-text

Exploring the Unknown Nature of Data

Handbook of Research on Machine Learning Applications and Trends ◽

10.4018/978-1-60566-766-9.ch001 ◽

2010 ◽

pp. 1-27

Author(s):

Rui Xu ◽

Donald C. Wunsch II

Keyword(s):

Neural Networks ◽

Cluster Analysis ◽

Data Structures ◽

Clustering Algorithms ◽

Ground Truth ◽

Human Beings ◽

Data Set ◽

Learning Capabilities ◽

Good Learning ◽

Hidden Data

To classify objects based on their features and characteristics is one of the most important and primitive activities of human beings. The task becomes even more challenging when there is no ground truth available. Cluster analysis allows new opportunities in exploring the unknown nature of data through its aim to separate a finite data set, with little or no prior information, into a finite and discrete set of “natural,” hidden data structures. Here, the authors introduce and discuss clustering algorithms that are related to machine learning and computational intelligence, particularly those based on neural networks. Neural networks are well known for their good learning capabilities, adaptation, ease of implementation, parallelization, speed, and flexibility, and they have demonstrated many successful applications in cluster analysis. The applications of cluster analysis in real world problems are also illustrated. Portions of the chapter are taken from Xu and Wunsch (2008).

Download Full-text

Cluster Analysis with Various Algorithms for Mixed Data

Pattern and Data Analysis in Healthcare Settings - Advances in Medical Technologies and Clinical Practice ◽

10.4018/978-1-5225-0536-5.ch014 ◽

2017 ◽

pp. 282-317

Author(s):

Abha Sharma ◽

R. S. Thakur

Keyword(s):

Cluster Analysis ◽

Clustering Algorithms ◽

Complex Problem ◽

Experimental Results ◽

Mixed Data ◽

Data Set ◽

Fuzzy C Means ◽

Conversion Method ◽

Inherent Structure ◽

Numeric Data

Analyzing clustering of mixed data set is a complex problem. Very useful clustering algorithms like k-means, fuzzy c-means, hierarchical methods etc. developed to extract hidden groups from numeric data. In this paper, the mixed data is converted into pure numeric with a conversion method, the various algorithm of numeric data has been applied on various well known mixed datasets, to exploit the inherent structure of the mixed data. Experimental results shows how smoothly the mixed data is giving better results on universally applicable clustering algorithms for numeric data.

Download Full-text

Optimised insert design for improved single-molecule imaging and quantification through CRISPR-Cas9 mediated knock-in

Scientific Reports ◽

10.1038/s41598-019-50733-9 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 12

Author(s):

Abdullah O. Khan ◽

Carl W. White ◽

Jeremy A. Pike ◽

Jack Yule ◽

Alexandre Slater ◽

...

Keyword(s):

Genome Editing ◽

G Protein ◽

Single Molecule ◽

Protein Function ◽

Single Molecule Imaging ◽

Dependent Effect ◽

Systematic Comparison ◽

Selection Strategies ◽

Localisation Microscopy ◽

G Protein Coupled

Abstract The use of CRISPR-Cas9 genome editing to introduce endogenously expressed tags has the potential to address a number of the classical limitations of single molecule localisation microscopy. In this work we present the first systematic comparison of inserts introduced through CRISPR-knock in, with the aim of optimising this approach for single molecule imaging. We show that more highly monomeric and codon optimised variants of mEos result in improved expression at the TubA1B locus, despite the use of identical guides, homology templates, and selection strategies. We apply this approach to target the G protein-coupled receptor (GPCR) CXCR4 and show a further insert dependent effect on expression and protein function. Finally, we show that compared to over-expressed CXCR4, endogenously labelled samples allow for accurate single molecule quantification on ligand treatment. This suggests that despite the complications evident in CRISPR mediated labelling, the development of CRISPR-PALM has substantial quantitative benefits.

Download Full-text

Sequence-dependent cluster analysis of biomineralization peptides

Zeitschrift für Naturforschung C ◽

10.1515/znc-2014-4202 ◽

2015 ◽

Vol 70 (7-8) ◽

pp. 191-195 ◽

Cited By ~ 2

Author(s):

Jose Isagani B. Janairo ◽

Frumencio Co ◽

Jose Santos Carandang ◽

Divina M. Amalin

Keyword(s):

Molecular Weight ◽

Cluster Analysis ◽

Classification System ◽

Clustering Analysis ◽

Systematic Approach ◽

Molecular Weight Heterogeneity ◽

Representative Samples

Abstract A reliable and statistically valid classification of biomineralization peptides is herein presented. 27 biomineralization peptides (BMPep) were randomly selected as representative samples to establish the classification system using k-means method. These biomineralization peptides were either discovered through isolation from various organisms or via phage display. Our findings show that there are two types of biomineralization peptides based on their length, molecular weight, heterogeneity, and aliphatic residues. Type-1 BMPeps are more commonly found and exhibit higher values for these significant clustering variables. In contrast are the type-2 BMPeps, which have lower values for these parameters and are less common. Through our clustering analysis, a more efficient and systematic approach in BMPep selection is possible since previous methods of BMPep classification are unreliable.

Download Full-text