spatzie: An R package for identifying significant transcription factor motif co-enrichment from enhancer-promoter interactions

Genomic interactions provide important context to our understanding of the state of the genome. One question is whether specific transcription factor interactions give rise to genome organization. We introduce spatzie, an R package and a website that implements statistical tests for significant transcription factor motif cooperativity between enhancer-promoter interactions. We conducted controlled experiments under realistic simulated data from ChIP-seq to confirm spatzie is capable of discovering co-enriched motif interactions even in noisy conditions. We then use spatzie to investigate cell type specific transcription factor cooperativity within recent human ChIA-PET enhancer-promoter interaction data. The method is available online at https://spatzie.mit.edu.

Download Full-text

Cell-type-specific transcription factor interactions with cis-elements present in the mouse LDH/C proximal promoter region

Journal of Experimental Zoology ◽

10.1002/(sici)1097-010x(199809/10)282:1/2<179::aid-jez20>3.0.co;2-o ◽

1998 ◽

Vol 282 (1-2) ◽

pp. 179-187 ◽

Cited By ~ 2

Author(s):

Jun Yang ◽

Marsena Riley ◽

Kelwyn Thomas

Keyword(s):

Transcription Factor ◽

Promoter Region ◽

Proximal Promoter ◽

Cis Elements ◽

Cell Type ◽

Proximal Promoter Region ◽

Specific Transcription Factor ◽

Cell Type Specific ◽

Factor Interactions

Download Full-text

Repression of a matrix metalloprotease gene by E1A correlates with its ability to bind to cell type-specific transcription factor AP-2.

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.93.7.3088 ◽

1996 ◽

Vol 93 (7) ◽

pp. 3088-3093 ◽

Cited By ~ 47

Author(s):

K. Somasundaram ◽

G. Jayaraman ◽

T. Williams ◽

E. Moran ◽

S. Frisch ◽

...

Keyword(s):

Transcription Factor ◽

Matrix Metalloprotease ◽

Cell Type ◽

Specific Transcription Factor ◽

Cell Type Specific

Download Full-text

An adenosine nucleotide switch controlling the activity of a cell type-specific transcription factor in B. subtilis

Cell ◽

10.1016/0092-8674(94)90312-3 ◽

1994 ◽

Vol 77 (2) ◽

pp. 195-205 ◽

Cited By ~ 143

Author(s):

Scott Alper ◽

Leonard Duncan ◽

Richard Losick

Keyword(s):

Transcription Factor ◽

Cell Type ◽

Specific Transcription Factor ◽

A Cell ◽

Cell Type Specific ◽

Adenosine Nucleotide

Download Full-text

Accurate prediction of cell type-specific transcription factor binding

Genome Biology ◽

10.1186/s13059-018-1614-y ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 28

Author(s):

Jens Keilwagen ◽

Stefan Posch ◽

Jan Grau

Keyword(s):

Transcription Factor ◽

Transcription Factor Binding ◽

Accurate Prediction ◽

Cell Type ◽

Specific Transcription Factor ◽

Factor Binding ◽

Cell Type Specific

Download Full-text

Sequence and chromatin determinants of cell-type-specific transcription factor binding

Genome Research ◽

10.1101/gr.127712.111 ◽

2012 ◽

Vol 22 (9) ◽

pp. 1723-1734 ◽

Cited By ~ 153

Author(s):

A. Arvey ◽

P. Agius ◽

W. S. Noble ◽

C. Leslie

Keyword(s):

Transcription Factor ◽

Transcription Factor Binding ◽

Cell Type ◽

Specific Transcription Factor ◽

Factor Binding ◽

Cell Type Specific

Download Full-text

Leopard: fast decoding cell type-specific transcription factor binding landscape at single-nucleotide resolution

10.1101/856823 ◽

2019 ◽

Author(s):

Hongyang Li ◽

Yuanfang Guan

Keyword(s):

Transcription Factor ◽

Network Architecture ◽

Characteristic Curve ◽

Cell Types ◽

Cell Type ◽

Single Nucleotide ◽

Specific Transcription Factor ◽

Cell Type Specific ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution

AbstractDecoding the cell type-specific transcription factor (TF) binding landscape at single-nucleotide resolution is crucial for understanding the regulatory mechanisms underlying many fundamental biological processes and human diseases. However, limits on time and resources restrict the high-resolution experimental measurements of TF binding profiles of all possible TF-cell type combinations. Previous computational approaches either can not distinguish the cell-context-dependent TF binding profiles across diverse cell types, or only provide a relatively low-resolution prediction. Here we present a novel deep learning approach, Leopard, for predicting TF-binding sites at single-nucleotide resolution, achieving the median area under receiver operating characteristic curve (AUROC) of 0.994. Our method substantially outperformed state-of-the-art methods Anchor and FactorNet, improving the performance by 19% and 27% respectively despite evaluated at a lower resolution. Meanwhile, by leveraging a many-to-many neural network architecture, Leopard features hundred-fold to thousand-fold speedup compared to current many-to-one machine learning methods.

Download Full-text

Prediction of Cell Type Specific Transcription Factor Binding Site Occupancy

Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics - BCB '16 ◽

10.1145/2975167.2985652 ◽

2016 ◽

Author(s):

Faizy Ahsan ◽

Doina Precup ◽

Mathieu Blanchette

Keyword(s):

Transcription Factor ◽

Binding Site ◽

Transcription Factor Binding Site ◽

Site Occupancy ◽

Transcription Factor Binding ◽

Cell Type ◽

Factor Binding Site ◽

Specific Transcription Factor ◽

Factor Binding ◽

Cell Type Specific

Download Full-text

Fast decoding cell type–specific transcription factor binding landscape at single-nucleotide resolution

Genome Research ◽

10.1101/gr.269613.120 ◽

2021 ◽

Author(s):

Hongyang Li ◽

Yuanfang Guan

Keyword(s):

Transcription Factor ◽

Transcription Factor Binding ◽

Cell Type ◽

Single Nucleotide ◽

Specific Transcription Factor ◽

Factor Binding ◽

Fast Decoding ◽

Cell Type Specific ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution

Download Full-text

False signals induced by single-cell imputation

F1000Research ◽

10.12688/f1000research.16613.2 ◽

2019 ◽

Vol 7 ◽

pp. 1740 ◽

Cited By ~ 26

Author(s):

Tallulah S. Andrews ◽

Martin Hemberg

Keyword(s):

Single Cell ◽

Effect Size ◽

False Positive ◽

Statistical Tests ◽

Simulated Data ◽

False Positives ◽

Rna Seq ◽

Cell Type ◽

Imputation Methods ◽

Cell Type Specific

Background: Single-cell RNA-seq is a powerful tool for measuring gene expression at the resolution of individual cells. A challenge in the analysis of this data is the large amount of zero values, representing either missing data or no expression. Several imputation approaches have been proposed to address this issue, but they generally rely on structure inherent to the dataset under consideration they may not provide any additional information, hence, are limited by the information contained therein and the validity of their assumptions. Methods: We evaluated the risk of generating false positive or irreproducible differential expression when imputing data with six different methods. We applied each method to a variety of simulated datasets as well as to permuted real single-cell RNA-seq datasets and consider the number of false positive gene-gene correlations and differentially expressed genes. Using matched 10X and Smart-seq2 data we examined whether cell-type specific markers were reproducible across datasets derived from the same tissue before and after imputation. Results: The extent of false-positives introduced by imputation varied considerably by method. Data smoothing based methods, MAGIC, knn-smooth and dca, generated many false-positives in both real and simulated data. Model-based imputation methods typically generated fewer false-positives but this varied greatly depending on the diversity of cell-types in the sample. All imputation methods decreased the reproducibility of cell-type specific markers, although this could be mitigated by selecting markers with large effect size and significance. Conclusions: Imputation of single-cell RNA-seq data introduces circularity that can generate false-positive results. Thus, statistical tests applied to imputed data should be treated with care. Additional filtering by effect size can reduce but not fully eliminate these effects. Of the methods we considered, SAVER was the least likely to generate false or irreproducible results, thus should be favoured over alternatives if imputation is necessary.

Download Full-text