atSNP Search: a web resource for statistically evaluating influence of human genetic variation on transcription factor binding

Sunyoung Shin; Rebecca Hudson; Christopher Harrison; Mark Craven; Sündüz Keleş

doi:10.1093/bioinformatics/bty1010

atSNP Search: a web resource for statistically evaluating influence of human genetic variation on transcription factor binding

Bioinformatics ◽

10.1093/bioinformatics/bty1010 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2657-2659 ◽

Cited By ~ 5

Author(s):

Sunyoung Shin ◽

Rebecca Hudson ◽

Christopher Harrison ◽

Mark Craven ◽

Sündüz Keleş

Keyword(s):

Transcription Factor ◽

Association Studies ◽

Statistical Significance ◽

Transcription Factor Binding ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Statistical Significance Testing ◽

Factor Binding ◽

Web Resource ◽

Variant Alleles

AbstractSummaryUnderstanding the regulatory roles of non-coding genetic variants has become a central goal for interpreting results of genome-wide association studies. The regulatory significance of the variants may be interrogated by assessing their influence on transcription factor binding. We have developed atSNP Search, a comprehensive web database for evaluating motif matches to the human genome with both reference and variant alleles and assessing the overall significance of the variant alterations on the motif matches. Convenient search features, comprehensive search outputs and a useful help menu are key components of atSNP Search. atSNP Search enables convenient interpretation of regulatory variants by statistical significance testing and composite logo plots, which are graphical representations of motif matches with the reference and variant alleles. Existing motif-based regulatory variant discovery tools only consider a limited pool of variants due to storage or other limitations. In contrast, atSNP Search users can test more than 37 billion variant-motif pairs with marginal significance in motif matches or match alteration. Computational evidence from atSNP Search, when combined with experimental validation, may help with the discovery of underlying disease mechanisms.Availability and implementationatSNP Search is freely available at http://atsnp.biostat.wisc.edu.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Predicting the effects of SNPs on transcription factor binding affinity

Bioinformatics ◽

10.1093/bioinformatics/btz612 ◽

2019 ◽

Author(s):

Sierra S Nishizaki ◽

Natalie Ng ◽

Shengcheng Dong ◽

Robert S Porter ◽

Cody Morterud ◽

...

Keyword(s):

Transcription Factor ◽

Binding Affinity ◽

Association Studies ◽

Transcription Factor Binding ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Factor Binding ◽

Genome Wide

Abstract Motivation Genome-wide association studies have revealed that 88% of disease-associated single-nucleotide polymorphisms (SNPs) reside in noncoding regions. However, noncoding SNPs remain understudied, partly because they are challenging to prioritize for experimental validation. To address this deficiency, we developed the SNP effect matrix pipeline (SEMpl). Results SEMpl estimates transcription factor-binding affinity by observing differences in chromatin immunoprecipitation followed by deep sequencing signal intensity for SNPs within functional transcription factor-binding sites (TFBSs) genome-wide. By cataloging the effects of every possible mutation within the TFBS motif, SEMpl can predict the consequences of SNPs to transcription factor binding. This knowledge can be used to identify potential disease-causing regulatory loci. Availability and implementation SEMpl is available from https://github.com/Boyle-Lab/SEM_CPP. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GERV: A Statistical Method for Generative Evaluation of Regulatory Variants for Transcription Factor Binding

10.1101/017392 ◽

2015 ◽

Cited By ~ 1

Author(s):

Haoyang Zeng ◽

Tatsunori Hashimoto ◽

Daniel D. Kang ◽

David K. Gifford

Keyword(s):

Transcription Factor ◽

Specific Binding ◽

Association Studies ◽

Transcription Factor Binding ◽

Computational Method ◽

Breast Cancer Cell Lines ◽

Genome Wide Association Studies ◽

Factor Binding ◽

Regulatory Variants ◽

Causal Variants

The majority of disease-associated variants identified in genome-wide association studies (GWAS) reside in noncoding regions of the genome with regulatory roles. Thus being able to interpret the functional consequence of a variant is essential for identifying causal variants in the analysis of GWAS studies. We present GERV (Generative Evaluation of Regulatory Variants), a novel computational method for predicting regulatory variants that affect transcription factor binding. GERV learns a k-mer based generative model of transcription factor binding from ChIP-seq and DNase-seq data, and scores variants by computing the change of predicted ChIP-seq reads between the reference and alternate allele. The k-mers learned by GERV capture more sequence determinants of transcription factor binding than a motif-based approach alone, including both a transcription factor's canonical motif as well as associated co-factor motifs. We show that GERV outperforms existing methods in predicting SNPs associated with allele-specific binding. GERV correctly predicts a validated causal variant among linked SNPs, and prioritizes the variants previously reported to modulate the binding of FOXA1 in breast cancer cell lines. Thus, GERV provides a powerful approach for functionally annotating and prioritizing causal variants for experimental follow-up analysis.

Download Full-text

EpiSAFARI: Sensitive detection of valleys in epigenetic signals for enhancing annotations of functional elements

Bioinformatics ◽

10.1093/bioinformatics/btz702 ◽

2019 ◽

Author(s):

Arif Harmanci ◽

Akdes Serin Harmanci ◽

Jyothishmathi Swaminathan ◽

Vidya Gopalakrishnan

Keyword(s):

Transcription Factor ◽

Regulatory Elements ◽

Transcription Factor Binding ◽

Computational Method ◽

Sensitive Detection ◽

Supplementary Information ◽

Chip Sequencing ◽

Factor Binding ◽

Nucleotide Resolution ◽

Systematic Identification

Abstract Motivation Functional genomics experiments generate genomewide signal profiles that are dense information sources for annotating the regulatory elements. These profiles measure epigenetic activity at the nucleotide resolution and they exhibit distinctive patterns as they fluctuate along the genome. Most notable of these patterns are the valley patterns that are prevalently observed in assays such as ChIP Sequencing and bisulfite sequencing. The genomic positions of valleys pinpoint locations of cis-regulatory elements such as enhancers and insulators. Systematic identification of the valleys provides novel information for delineating the annotation of regulatory elements. Nevertheless, the valleys are not reported by majority of the analysis pipelines. Results We describe EpiSAFARI, a computational method for sensitive detection of valleys from diverse types of epigenetic profiles. EpiSAFARI employs a novel smoothing method for decreasing noise in signal profiles and accounts for technical factors such as sparse signals, mappability, and nucleotide content. In performance comparisons, EpiSAFARI performs favorably in terms of accuracy. The histone modification valleys detected by EpiSAFARI exhibit high conservation, transcription factor binding, and they are enriched in nascent transcription. In addition, the large clusters of histone valleys are found to be enriched at the promoters of the developmentally associated genes. Differential histone valleys exhibit concordance with differential DNase signal at cell line specific valleys. DNA methylation valleys exhibit elevated conservation and high transcription factor binding. Specifically, we observed enriched binding of transcription factors associated with chromatin structure around methyl-valleys. Availability EpiSAFARI is publicly available at https://github.com/harmancilab/EpiSAFARI Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CUBIC: IDENTIFICATION OF REGULATORY BINDING SITES THROUGH DATA CLUSTERING

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720003000162 ◽

2003 ◽

Vol 01 (01) ◽

pp. 21-40 ◽

Cited By ~ 21

Author(s):

VICTOR OLMAN ◽

DONG XU ◽

YING XU

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Statistical Significance ◽

Transcription Factor Binding Sites ◽

Identification Problem ◽

Transcription Factor Binding ◽

Factor Binding ◽

Cluster Identification ◽

Data Clusters ◽

Noisy Background

Transcription factor binding sites are short fragments in the upstream regions of genes, to which transcription factors bind to regulate the transcription of genes into mRNA. Computational identification of transcription factor binding sites remains an unsolved challenging problem though a great amount of effort has been put into the study of this problem. We have recently developed a novel technique for identification of binding sites from a set of upstream regions of genes, that could possibly be transcriptionally co-regulated and hence might share similar transcription factor binding sites. By utilizing two key features of such binding sites (i.e. their high sequence similarities and their relatively high frequencies compared to other sequence fragments), we have formulated this problem as a cluster identification problem. That is to identify and extract data clusters from a noisy background. While the classical data clustering problem (partitioning a data set into clusters sharing common or similar features) has been extensively studied, there is no general algorithm for solving the problem of identifying data clusters from a noisy background. In this paper, we present a novel algorithm for solving such a problem. We have proved that a cluster identification problem, under our definition, can be rigorously and efficiently solved through searching for substrings with special properties in a linear sequence. We have also developed a method for assessing the statistical significance of each identified cluster, which can be used to rule out accidental data clusters. We have implemented the cluster identification algorithm and the statistical significance analysis method as a computer software CUBIC. Extensive testing on CUBIC has been carried out. We present here a few applications of CUBIC on challenging cases of binding site identification.

Download Full-text

vSampler: fast and annotation-based matched variant sampling tool

Bioinformatics ◽

10.1093/bioinformatics/btaa883 ◽

2020 ◽

Author(s):

Dandan Huang ◽

Zhao Wang ◽

Yao Zhou ◽

Qian Liang ◽

Pak Chung Sham ◽

...

Keyword(s):

Association Studies ◽

Statistical Significance ◽

Enrichment Analysis ◽

Supplementary Information ◽

Tissue Cell ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Regulatory Variant ◽

Local Program ◽

Sampling Algorithms

Abstract Summary Sampling of control variants having matched properties with input variants is widely used in enrichment analysis of genome-wide association studies/quantitative trait loci and negative data construction for pathogenic/regulatory variant prediction methods. Spurious enrichment results because of confounding factors, such as minor allele frequency and linkage disequilibrium pattern, can be avoided by calibration of statistical significance based on matched controls. Here, we presented vSampler which can generate sets of randomly drawn variants with comprehensive choices of matching properties, such as tissue/cell type-specific epigenomic features. Importantly, the development of a novel data structure and sampling algorithms for vSampler makes it significantly fast than existing tools. Availability and implementation vSampler web server and local program are available at http://mulinlab.org/vsampler. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Regulatory SNPs: Altered Transcription Factor Binding Sites Implicated in Complex Traits and Diseases

International Journal of Molecular Sciences ◽

10.3390/ijms22126454 ◽

2021 ◽

Vol 22 (12) ◽

pp. 6454

Author(s):

Arina O. Degtyareva ◽

Elena V. Antontseva ◽

Tatiana I. Merkulova

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Genetic Variants ◽

Binding Sites ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Genome Wide Association Studies ◽

Factor Binding ◽

Genome Wide ◽

Regulatory Snps

The vast majority of the genetic variants (mainly SNPs) associated with various human traits and diseases map to a noncoding part of the genome and are enriched in its regulatory compartment, suggesting that many causal variants may affect gene expression. The leading mechanism of action of these SNPs consists in the alterations in the transcription factor binding via creation or disruption of transcription factor binding sites (TFBSs) or some change in the affinity of these regulatory proteins to their cognate sites. In this review, we first focus on the history of the discovery of regulatory SNPs (rSNPs) and systematized description of the existing methodical approaches to their study. Then, we brief the recent comprehensive examples of rSNPs studied from the discovery of the changes in the TFBS sequence as a result of a nucleotide substitution to identification of its effect on the target gene expression and, eventually, to phenotype. We also describe state-of-the-art genome-wide approaches to identification of regulatory variants, including both making molecular sense of genome-wide association studies (GWAS) and the alternative approaches the primary goal of which is to determine the functionality of genetic variants. Among these approaches, special attention is paid to expression quantitative trait loci (eQTLs) analysis and the search for allele-specific events in RNA-seq (ASE events) as well as in ChIP-seq, DNase-seq, and ATAC-seq (ASB events) data.

Download Full-text

An efficient method for statistical significance calculation of transcription factor binding sites

Bioinformation ◽

10.6026/97320630002169 ◽

2007 ◽

Vol 2 (5) ◽

pp. 169-174 ◽

Cited By ~ 3

Author(s):

Ziliang Qian ◽

Lingyi Lu ◽

Liu Qi ◽

Yixue Li

Keyword(s):

Transcription Factor ◽

Efficient Method ◽

Binding Sites ◽

Statistical Significance ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Factor Binding

Download Full-text

Author response for "Transcription factor binding at immunoglobulin enhancers is linked to somatic hypermutation targeting"

10.1002/eji.201948357/v2/response1 ◽

2019 ◽

Author(s):

Ravi K. Dinesh ◽

Benjamin Barnhill ◽

Anoj Ilanges ◽

Lizhen Wu ◽

Daniel A. Michelson ◽

...

Keyword(s):

Transcription Factor ◽

Somatic Hypermutation ◽

Transcription Factor Binding ◽

Author Response ◽

Factor Binding

Download Full-text

Decision letter for "Transcription factor binding at immunoglobulin enhancers is linked to somatic hypermutation targeting"

10.1002/eji.201948357/v2/decision1 ◽

2019 ◽

Keyword(s):

Transcription Factor ◽

Somatic Hypermutation ◽

Transcription Factor Binding ◽

Factor Binding

Download Full-text

Faculty Opinions recommendation of Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1017662.206496 ◽

2004 ◽

Author(s):

Carlos F Barbas

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Noncoding Rnas ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Human Chromosomes ◽

Factor Binding

Download Full-text