scholarly journals spatzie: An R package for identifying significant transcription factor motif co-enrichment from enhancer-promoter interactions

2021 ◽  
Author(s):  
Jennifer Hammelman ◽  
Konstantin Krismer ◽  
David K. Gifford

Genomic interactions provide important context to our understanding of the state of the genome. One question is whether specific transcription factor interactions give rise to genome organization. We introduce spatzie, an R package and a website that implements statistical tests for significant transcription factor motif cooperativity between enhancer-promoter interactions. We conducted controlled experiments under realistic simulated data from ChIP-seq to confirm spatzie is capable of discovering co-enriched motif interactions even in noisy conditions. We then use spatzie to investigate cell type specific transcription factor cooperativity within recent human ChIA-PET enhancer-promoter interaction data. The method is available online at https://spatzie.mit.edu.

2019 ◽  
Author(s):  
Hongyang Li ◽  
Yuanfang Guan

AbstractDecoding the cell type-specific transcription factor (TF) binding landscape at single-nucleotide resolution is crucial for understanding the regulatory mechanisms underlying many fundamental biological processes and human diseases. However, limits on time and resources restrict the high-resolution experimental measurements of TF binding profiles of all possible TF-cell type combinations. Previous computational approaches either can not distinguish the cell-context-dependent TF binding profiles across diverse cell types, or only provide a relatively low-resolution prediction. Here we present a novel deep learning approach, Leopard, for predicting TF-binding sites at single-nucleotide resolution, achieving the median area under receiver operating characteristic curve (AUROC) of 0.994. Our method substantially outperformed state-of-the-art methods Anchor and FactorNet, improving the performance by 19% and 27% respectively despite evaluated at a lower resolution. Meanwhile, by leveraging a many-to-many neural network architecture, Leopard features hundred-fold to thousand-fold speedup compared to current many-to-one machine learning methods.


F1000Research ◽  
2019 ◽  
Vol 7 ◽  
pp. 1740 ◽  
Author(s):  
Tallulah S. Andrews ◽  
Martin Hemberg

Background: Single-cell RNA-seq is a powerful tool for measuring gene expression at the resolution of individual cells.  A challenge in the analysis of this data is the large amount of zero values, representing either missing data or no expression. Several imputation approaches have been proposed to address this issue, but they generally rely on structure inherent to the dataset under consideration they may not provide any additional information, hence, are limited by the information contained therein and the validity of their assumptions. Methods: We evaluated the risk of generating false positive or irreproducible differential expression when imputing data with six different methods. We applied each method to a variety of simulated datasets as well as to permuted real single-cell RNA-seq datasets and consider the number of false positive gene-gene correlations and differentially expressed genes. Using matched 10X and Smart-seq2 data we examined whether cell-type specific markers were reproducible across datasets derived from the same tissue before and after imputation. Results: The extent of false-positives introduced by imputation varied considerably by method. Data smoothing based methods, MAGIC, knn-smooth and dca, generated many false-positives in both real and simulated data. Model-based imputation methods typically generated fewer false-positives but this varied greatly depending on the diversity of cell-types in the sample. All imputation methods decreased the reproducibility of cell-type specific markers, although this could be mitigated by selecting markers with large effect size and significance. Conclusions: Imputation of single-cell RNA-seq data introduces circularity that can generate false-positive results. Thus, statistical tests applied to imputed data should be treated with care. Additional filtering by effect size can reduce but not fully eliminate these effects. Of the methods we considered, SAVER was the least likely to generate false or irreproducible results, thus should be favoured over alternatives if imputation is necessary.


Sign in / Sign up

Export Citation Format

Share Document