high throughput data
Recently Published Documents





eLife ◽  
2021 ◽  
Vol 10 ◽  
Prathitha Kar ◽  
Sriram Tiruvadi-Krishnan ◽  
Jaana Männik ◽  
Jaan Männik ◽  
Ariel Amir

Collection of high-throughput data has become prevalent in biology. Large datasets allow the use of statistical constructs such as binning and linear regression to quantify relationships between variables and hypothesize underlying biological mechanisms based on it. We discuss several such examples in relation to single-cell data and cellular growth. In particular, we show instances where what appears to be ordinary use of these statistical methods leads to incorrect conclusions such as growth being non-exponential as opposed to exponential and vice versa. We propose that the data analysis and its interpretation should be done in the context of a generative model, if possible. In this way, the statistical methods can be validated either analytically or against synthetic data generated via the use of the model, leading to a consistent method for inferring biological mechanisms from data. On applying the validated methods of data analysis to infer cellular growth on our experimental data, we find the growth of length in E. coli to be non-exponential. Our analysis shows that in the later stages of the cell cycle the growth rate is faster than exponential.

Gene ◽  
2021 ◽  
pp. 146111
Erfan Sharifi ◽  
Niusha Khazaei ◽  
Nicholas W. Kieran ◽  
Sahel Jahangiri Esfahani ◽  
Abdulshakour Mohammadnia ◽  

2021 ◽  
Vol 22 (1) ◽  
Xinzhou Ge ◽  
Yiling Elaine Chen ◽  
Dongyuan Song ◽  
MeiLu McDermott ◽  
Kyla Woyshner ◽  

AbstractHigh-throughput biological data analysis commonly involves identifying features such as genes, genomic regions, and proteins, whose values differ between two conditions, from numerous features measured simultaneously. The most widely used criterion to ensure the analysis reliability is the false discovery rate (FDR), which is primarily controlled based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions. Clipper is a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper outperforms existing methods for a broad range of applications in high-throughput data analysis.

2021 ◽  
Zhiting Wei ◽  
Sheng Zhu ◽  
Xiaohan Chen ◽  
Chenyu Zhu ◽  
Bin Duan ◽  

Transcriptional phenotypic drug discovery has achieved great success, and various compound perturbation-based data resources, such as Connectivity Map (CMap) and Library of Integrated Network-Based Cellular Signatures (LINCS), have been presented. Computational strategies fully mining these resources for phenotypic drug discovery have been proposed, and among them, a fundamental issue is to define the proper similarity between the transcriptional profiles to elucidate the drug mechanism of actions and identify new drug indications. Traditionally, this similarity has been defined in an unsupervised way, and due to the high dimensionality and the existence of high noise in those high-throughput data, it lacks robustness with limited performance. In our study, we present Dr. Sim, which is a general learning-based framework that automatically infers similarity measurement rather than being manually designed and can be used to characterize transcriptional phenotypic profiles for drug discovery with generalized good performance. We evaluated Dr. Sim on comprehensively publicly available in vitro and in vivo datasets in drug annotation and repositioning using high-throughput transcriptional perturbation data and indicated that Dr. Sim significantly outperforms the existing methods and is proved to be a conceptual improvement by learning transcriptional similarity to facilitate the broad utility of high-throughput transcriptional perturbation data for phenotypic drug discovery. The source code and usage of Dr. Sim is available at https://github.com/bm2-lab/DrSim/.

Genes ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 1452
Audrey Defosset ◽  
Dorine Merlat ◽  
Laetitia Poidevin ◽  
Yannis Nevers ◽  
Arnaud Kress ◽  

Multiciliogenesis is a complex process that allows the generation of hundreds of motile cilia on the surface of specialized cells, to create fluid flow across epithelial surfaces. Dysfunction of human multiciliated cells is associated with diseases of the brain, airway and reproductive tracts. Despite recent efforts to characterize the transcriptional events responsible for the differentiation of multiciliated cells, a lot of actors remain to be identified. In this work, we capitalize on the ever-growing quantity of high-throughput data to search for new candidate genes involved in multiciliation. After performing a large-scale screening using 10 transcriptomics datasets dedicated to multiciliation, we established a specific evolutionary signature involving Otomorpha fish to use as a criterion to select the most likely targets. Combining both approaches highlighted a list of 114 potential multiciliated candidates. We characterized these genes first by generating protein interaction networks, which showed various clusters of ciliated and multiciliated genes, and then by computing phylogenetic profiles. In the end, we selected 11 poorly characterized genes that seem like particularly promising multiciliated candidates. By combining functional and comparative genomics methods, we developed a novel type of approach to study biological processes and identify new promising candidates linked to that process.

2021 ◽  
Zuguang Gu ◽  
Daniel Huebschmann

Consensus partitioning is an unsupervised method widely used in high throughput data analysis for revealing subgroups and assigns stability for the classification. However, standard consensus partitioning procedures are weak to identify large numbers of stable subgroups. There are two main issues. 1. Subgroups with small differences are difficult to separate if they are simultaneously detected with subgroups with large differences. And 2. stability of classification generally decreases as the number of subgroups increases. In this work, we proposed a new strategy to solve these two issues by applying consensus partitionings in a hierarchical procedure. We demonstrated hierarchical consensus partitioning can be efficient to reveal more subgroups. We also tested the performance of hierarchical consensus partitioning on revealing a great number of subgroups with a DNA methylation dataset. The hierarchical consensus partitioning is implemented in the R package cola with comprehensive functionality for analysis and visualizations. It can also automate the analysis only with a minimum of two lines of code, which generates a detailed HTML report containing the complete analysis.

Sign in / Sign up

Export Citation Format

Share Document