scholarly journals A framework for evaluating the performance of SMLM cluster analysis algorithms

2021 ◽  
Author(s):  
Daniel J Nieves ◽  
Jeremy A. Pike ◽  
Florian Levet ◽  
Juliette Griffié ◽  
Daniel Sage ◽  
...  

Single molecule localisation microscopy (SMLM) generates data in the form of Cartesian coordinates of localised fluorophores. Cluster analysis is an attractive route for extracting biologically meaningful information from such data and has been widely applied. Despite the range of developed cluster analysis algorithms, there exists no consensus framework for the evaluation of their performance. Here, we use a systematic approach based on two metrics, the Adjusted Rand Index (ARI) and Intersection over Union (IoU), to score the success of clustering algorithms in diverse simulated clustering scenarios mimicking experimental data. We demonstrate the framework using three analysis algorithms: DBSCAN, ToMATo and KDE, show how to deduce optimal analysis parameters and how they are affected by fluorophore multiple blinking. We propose that these standard conditions and metrics become the basis for future analysis algorithm development and evaluation.

2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


2021 ◽  
Author(s):  
Sujay Ray ◽  
Nibedita Pal ◽  
Nils G Walter

Abstract Homologous recombination forms and resolves an entangled DNA Holliday Junction (HJ) crucial for achieving genetic reshuffling and genome repair. To maintain genomic integrity, specialized resolvase enzymes cleave the entangled DNA into two discrete DNA molecules. However, it is unclear how two similar stacking isomers are distinguished, and how a cognate sequence is found and recognized to achieve accurate recombination. We here use single-molecule fluorescence observation and cluster analysis to examine how prototypic bacterial resolvase RuvC singles out two of the four HJ strands and achieves sequence-specific cleavage. We find that RuvC first exploits, then constrains the dynamics of intrinsic HJ isomer exchange at a sampled branch position to direct cleavage toward the catalytically competent HJ conformation and sequence, thus controlling recombination output at minimal energetic cost. Our model of rapid DNA scanning followed by ‘snap-locking’ of a cognate sequence is strikingly consistent with the conformational proofreading of other DNA-modifying enzymes.


Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 971
Author(s):  
Oded Shor ◽  
Felix Benninger ◽  
Andrei Khrennikov

This paper is devoted to the foundational problems of dendrogramic holographic theory (DH theory). We used the ontic–epistemic (implicate–explicate order) methodology. The epistemic counterpart is based on the representation of data by dendrograms constructed with hierarchic clustering algorithms. The ontic universe is described as a p-adic tree; it is zero-dimensional, totally disconnected, disordered, and bounded (in p-adic ultrametric spaces). Classical–quantum interrelations lose their sharpness; generally, simple dendrograms are “more quantum” than complex ones. We used the CHSH inequality as a measure of quantum-likeness. We demonstrate that it can be violated by classical experimental data represented by dendrograms. The seed of this violation is neither nonlocality nor a rejection of realism, but the nonergodicity of dendrogramic time series. Generally, the violation of ergodicity is one of the basic features of DH theory. The dendrogramic representation leads to the local realistic model that violates the CHSH inequality. We also considered DH theory for Minkowski geometry and monitored the dependence of CHSH violation and nonergodicity on geometry, as well as a Lorentz transformation of data.


Author(s):  
Junjie Wu ◽  
Jian Chen ◽  
Hui Xiong

Cluster analysis (Jain & Dubes, 1988) provides insight into the data by dividing the objects into groups (clusters), such that objects in a cluster are more similar to each other than objects in other clusters. Cluster analysis has long played an important role in a wide variety of fields, such as psychology, bioinformatics, pattern recognition, information retrieval, machine learning, and data mining. Many clustering algorithms, such as K-means and Unweighted Pair Group Method with Arithmetic Mean (UPGMA), have been wellestablished. A recent research focus on clustering analysis is to understand the strength and weakness of various clustering algorithms with respect to data factors. Indeed, people have identified some data characteristics that may strongly affect clustering analysis including high dimensionality and sparseness, the large size, noise, types of attributes and data sets, and scales of attributes (Tan, Steinbach, & Kumar, 2005). However, further investigation is expected to reveal whether and how the data distributions can have the impact on the performance of clustering algorithms. Along this line, we study clustering algorithms by answering three questions: 1. What are the systematic differences between the distributions of the resultant clusters by different clustering algorithms? 2. How can the distribution of the “true” cluster sizes make impact on the performances of clustering algorithms? 3. How to choose an appropriate clustering algorithm in practice? The answers to these questions can guide us for the better understanding and the use of clustering methods. This is noteworthy, since 1) in theory, people seldom realized that there are strong relationships between the clustering algorithms and the cluster size distributions, and 2) in practice, how to choose an appropriate clustering algorithm is still a challenging task, especially after an algorithm boom in data mining area. This chapter thus tries to fill this void initially. To this end, we carefully select two widely used categories of clustering algorithms, i.e., K-means and Agglomerative Hierarchical Clustering (AHC), as the representative algorithms for illustration. In the chapter, we first show that K-means tends to generate the clusters with a relatively uniform distribution on the cluster sizes. Then we demonstrate that UPGMA, one of the robust AHC methods, acts in an opposite way to K-means; that is, UPGMA tends to generate the clusters with high variation on the cluster sizes. Indeed, the experimental results indicate that the variations of the resultant cluster sizes by K-means and UPGMA, measured by the Coefficient of Variation (CV), are in the specific intervals, say [0.3, 1.0] and [1.0, 2.5] respectively. Finally, we put together K-means and UPGMA for a further comparison, and propose some rules for the better choice of the clustering schemes from the data distribution point of view.


Author(s):  
Rui Xu ◽  
Donald C. Wunsch II

To classify objects based on their features and characteristics is one of the most important and primitive activities of human beings. The task becomes even more challenging when there is no ground truth available. Cluster analysis allows new opportunities in exploring the unknown nature of data through its aim to separate a finite data set, with little or no prior information, into a finite and discrete set of “natural,” hidden data structures. Here, the authors introduce and discuss clustering algorithms that are related to machine learning and computational intelligence, particularly those based on neural networks. Neural networks are well known for their good learning capabilities, adaptation, ease of implementation, parallelization, speed, and flexibility, and they have demonstrated many successful applications in cluster analysis. The applications of cluster analysis in real world problems are also illustrated. Portions of the chapter are taken from Xu and Wunsch (2008).


Author(s):  
Abha Sharma ◽  
R. S. Thakur

Analyzing clustering of mixed data set is a complex problem. Very useful clustering algorithms like k-means, fuzzy c-means, hierarchical methods etc. developed to extract hidden groups from numeric data. In this paper, the mixed data is converted into pure numeric with a conversion method, the various algorithm of numeric data has been applied on various well known mixed datasets, to exploit the inherent structure of the mixed data. Experimental results shows how smoothly the mixed data is giving better results on universally applicable clustering algorithms for numeric data.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Abdullah O. Khan ◽  
Carl W. White ◽  
Jeremy A. Pike ◽  
Jack Yule ◽  
Alexandre Slater ◽  
...  

Abstract The use of CRISPR-Cas9 genome editing to introduce endogenously expressed tags has the potential to address a number of the classical limitations of single molecule localisation microscopy. In this work we present the first systematic comparison of inserts introduced through CRISPR-knock in, with the aim of optimising this approach for single molecule imaging. We show that more highly monomeric and codon optimised variants of mEos result in improved expression at the TubA1B locus, despite the use of identical guides, homology templates, and selection strategies. We apply this approach to target the G protein-coupled receptor (GPCR) CXCR4 and show a further insert dependent effect on expression and protein function. Finally, we show that compared to over-expressed CXCR4, endogenously labelled samples allow for accurate single molecule quantification on ligand treatment. This suggests that despite the complications evident in CRISPR mediated labelling, the development of CRISPR-PALM has substantial quantitative benefits.


2015 ◽  
Vol 70 (7-8) ◽  
pp. 191-195 ◽  
Author(s):  
Jose Isagani B. Janairo ◽  
Frumencio Co ◽  
Jose Santos Carandang ◽  
Divina M. Amalin

Abstract A reliable and statistically valid classification of biomineralization peptides is herein presented. 27 biomineralization peptides (BMPep) were randomly selected as representative samples to establish the classification system using k-means method. These biomineralization peptides were either discovered through isolation from various organisms or via phage display. Our findings show that there are two types of biomineralization peptides based on their length, molecular weight, heterogeneity, and aliphatic residues. Type-1 BMPeps are more commonly found and exhibit higher values for these significant clustering variables. In contrast are the type-2 BMPeps, which have lower values for these parameters and are less common. Through our clustering analysis, a more efficient and systematic approach in BMPep selection is possible since previous methods of BMPep classification are unreliable.


Sign in / Sign up

Export Citation Format

Share Document