scholarly journals Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates

2017 ◽  
Vol 13 (5) ◽  
pp. e1005515 ◽  
Author(s):  
Andreas Tuerk ◽  
Gregor Wiktorin ◽  
Serhat Güler
2020 ◽  
Vol 16 (12) ◽  
pp. e1008488
Author(s):  
Mirko Ronzio ◽  
Andrea Bernardini ◽  
Giulio Pavesi ◽  
Roberto Mantovani ◽  
Diletta Dolfini

NF-Y is a trimeric Transcription Factor -TF- which binds with high selectivity to the conserved CCAAT element. Individual ChIP-seq analysis as well as ENCODE have progressively identified locations shared by other TFs. Here, we have analyzed data introduced by ENCODE over the last five years in K562, HeLa-S3 and GM12878, including several chromatin features, as well RNA-seq profiling of HeLa cells after NF-Y inactivation. We double the number of sequence-specific TFs and co-factors reported. We catalogue them in 4 classes based on co-association criteria, infer target genes categorizations, identify positional bias of binding sites and gene expression changes. Larger and novel co-associations emerge, specifically concerning subunits of repressive complexes as well as RNA-binding proteins. On the one hand, these data better define NF-Y association with single members of major classes of TFs, on the other, they suggest that it might have a wider role in the control of mRNA production.


2016 ◽  
Author(s):  
Andrea Rau ◽  
Cathy Maugis-Rabusseau

AbstractAlthough a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA-seq data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be quite strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters, and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq datasets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose an R package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.


2007 ◽  
Author(s):  
Danielle L. Cisler ◽  
Gitta H. Lubke
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document