scholarly journals Partition Quantitative Assessment (PQA): A Quantitative Methodology to Assess the Embedded Noise in Clustered Omics and Systems Biology Data

2021 ◽  
Vol 11 (13) ◽  
pp. 5999
Author(s):  
Diego A. Camacho-Hernández ◽  
Victor E. Nieto-Caballero ◽  
José E. León-Burguete ◽  
Julio A. Freyre-González

Identifying groups that share common features among datasets through clustering analysis is a typical problem in many fields of science, particularly in post-omics and systems biology research. In respect of this, quantifying how a measure can cluster or organize intrinsic groups is important since currently there is no statistical evaluation of how ordered is, or how much noise is embedded in the resulting clustered vector. Much of the literature focuses on how well the clustering algorithm orders the data, with several measures regarding external and internal statistical validation; but no score has been developed to quantify statistically the noise in an arranged vector posterior to a clustering algorithm, i.e., how much of the clustering is due to randomness. Here, we present a quantitative methodology, based on autocorrelation, in order to assess this problem.

2021 ◽  
Author(s):  
Diego A. Camacho-Hernández ◽  
Victor E. Nieto-Caballero ◽  
José E. León-Burguete ◽  
Julio Augusto Freyre-González

Identifying groups that share common features among datasets through clustering analysis is a typical problem in many fields of science, particularly in post-omics and systems biology research. In respect of this, quantifying how a measure can cluster or organize intrinsic groups is important since currently there is no statistical evaluation of how ordered is, or how much noise is embedded in the resulting clustered vector. Many of the literature focuses on how well the clustering algorithm orders the data, with several measures regarding external and internal statistical measures; but none measure has been developed to statistically quantify the noise in an arranged vector posterior a clustering algorithm, i.e., how much of the clustering is due to randomness. Here, we present a quantitative methodology, based on autocorrelation, to assess this problem.


Author(s):  
Diego A. Camacho-Hernández ◽  
Victor E. Nieto-Caballero ◽  
José E. León-Burguete ◽  
Julio A. Freyre-González

Identifying groups that share common features among datasets through clustering analysis is a typical problem in many fields of science, particularly in post-omics and systems biology research. In respect of this, quantifying how a measure can cluster or organize intrinsic groups is important since currently there is no statistical evaluation of how ordered is, or how much noise is embedded in the resulting clustered vector. Many of the literature focuses on how well the clustering algorithm orders the data, with several measures regarding external and internal statistical measures; but none measure has been developed to statistically quantify the noise in an arranged vector posterior a clustering algorithm, i.e., how much of the clustering is due to randomness. Here, we present a quantitative methodology, based on autocorrelation, to assess this problem.


2012 ◽  
Vol 6 (1) ◽  
pp. 67 ◽  
Author(s):  
Sarp A Coskun ◽  
Xinjian Qi ◽  
Ali Cakmak ◽  
En Cheng ◽  
A Cicek ◽  
...  

2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Cheng Lu ◽  
Shiji Song ◽  
Cheng Wu

The Affinity Propagation (AP) algorithm is an effective algorithm for clustering analysis, but it can not be directly applicable to the case of incomplete data. In view of the prevalence of missing data and the uncertainty of missing attributes, we put forward a modified AP clustering algorithm based onK-nearest neighbor intervals (KNNI) for incomplete data. Based on an Improved Partial Data Strategy, the proposed algorithm estimates the KNNI representation of missing attributes by using the attribute distribution information of the available data. The similarity function can be changed by dealing with the interval data. Then the improved AP algorithm can be applicable to the case of incomplete data. Experiments on several UCI datasets show that the proposed algorithm achieves impressive clustering results.


Author(s):  
Florencio Pazos ◽  
David Guijas ◽  
Manuel J. Gomez ◽  
Almudena Trigo ◽  
Victor de Lorenzo ◽  
...  

Author(s):  
Junjie Wu ◽  
Jian Chen ◽  
Hui Xiong

Cluster analysis (Jain & Dubes, 1988) provides insight into the data by dividing the objects into groups (clusters), such that objects in a cluster are more similar to each other than objects in other clusters. Cluster analysis has long played an important role in a wide variety of fields, such as psychology, bioinformatics, pattern recognition, information retrieval, machine learning, and data mining. Many clustering algorithms, such as K-means and Unweighted Pair Group Method with Arithmetic Mean (UPGMA), have been wellestablished. A recent research focus on clustering analysis is to understand the strength and weakness of various clustering algorithms with respect to data factors. Indeed, people have identified some data characteristics that may strongly affect clustering analysis including high dimensionality and sparseness, the large size, noise, types of attributes and data sets, and scales of attributes (Tan, Steinbach, & Kumar, 2005). However, further investigation is expected to reveal whether and how the data distributions can have the impact on the performance of clustering algorithms. Along this line, we study clustering algorithms by answering three questions: 1. What are the systematic differences between the distributions of the resultant clusters by different clustering algorithms? 2. How can the distribution of the “true” cluster sizes make impact on the performances of clustering algorithms? 3. How to choose an appropriate clustering algorithm in practice? The answers to these questions can guide us for the better understanding and the use of clustering methods. This is noteworthy, since 1) in theory, people seldom realized that there are strong relationships between the clustering algorithms and the cluster size distributions, and 2) in practice, how to choose an appropriate clustering algorithm is still a challenging task, especially after an algorithm boom in data mining area. This chapter thus tries to fill this void initially. To this end, we carefully select two widely used categories of clustering algorithms, i.e., K-means and Agglomerative Hierarchical Clustering (AHC), as the representative algorithms for illustration. In the chapter, we first show that K-means tends to generate the clusters with a relatively uniform distribution on the cluster sizes. Then we demonstrate that UPGMA, one of the robust AHC methods, acts in an opposite way to K-means; that is, UPGMA tends to generate the clusters with high variation on the cluster sizes. Indeed, the experimental results indicate that the variations of the resultant cluster sizes by K-means and UPGMA, measured by the Coefficient of Variation (CV), are in the specific intervals, say [0.3, 1.0] and [1.0, 2.5] respectively. Finally, we put together K-means and UPGMA for a further comparison, and propose some rules for the better choice of the clustering schemes from the data distribution point of view.


2020 ◽  
Vol 17 (166) ◽  
pp. 20200013 ◽  
Author(s):  
Zoe Schofield ◽  
Gabriel N. Meloni ◽  
Peter Tran ◽  
Christian Zerfass ◽  
Giovanni Sena ◽  
...  

The last five decades of molecular and systems biology research have provided unprecedented insights into the molecular and genetic basis of many cellular processes. Despite these insights, however, it is arguable that there is still only limited predictive understanding of cell behaviours. In particular, the basis of heterogeneity in single-cell behaviour and the initiation of many different metabolic, transcriptional or mechanical responses to environmental stimuli remain largely unexplained. To go beyond the status quo , the understanding of cell behaviours emerging from molecular genetics must be complemented with physical and physiological ones, focusing on the intracellular and extracellular conditions within and around cells. Here, we argue that such a combination of genetics, physics and physiology can be grounded on a bioelectrical conceptualization of cells. We motivate the reasoning behind such a proposal and describe examples where a bioelectrical view has been shown to, or can, provide predictive biological understanding. In addition, we discuss how this view opens up novel ways to control cell behaviours by electrical and electrochemical means, setting the stage for the emergence of bioelectrical engineering.


Sign in / Sign up

Export Citation Format

Share Document