scholarly journals atSNP Search: a web resource for statistically evaluating influence of human genetic variation on transcription factor binding

2018 ◽  
Vol 35 (15) ◽  
pp. 2657-2659 ◽  
Author(s):  
Sunyoung Shin ◽  
Rebecca Hudson ◽  
Christopher Harrison ◽  
Mark Craven ◽  
Sündüz Keleş

AbstractSummaryUnderstanding the regulatory roles of non-coding genetic variants has become a central goal for interpreting results of genome-wide association studies. The regulatory significance of the variants may be interrogated by assessing their influence on transcription factor binding. We have developed atSNP Search, a comprehensive web database for evaluating motif matches to the human genome with both reference and variant alleles and assessing the overall significance of the variant alterations on the motif matches. Convenient search features, comprehensive search outputs and a useful help menu are key components of atSNP Search. atSNP Search enables convenient interpretation of regulatory variants by statistical significance testing and composite logo plots, which are graphical representations of motif matches with the reference and variant alleles. Existing motif-based regulatory variant discovery tools only consider a limited pool of variants due to storage or other limitations. In contrast, atSNP Search users can test more than 37 billion variant-motif pairs with marginal significance in motif matches or match alteration. Computational evidence from atSNP Search, when combined with experimental validation, may help with the discovery of underlying disease mechanisms.Availability and implementationatSNP Search is freely available at http://atsnp.biostat.wisc.edu.Supplementary informationSupplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Sierra S Nishizaki ◽  
Natalie Ng ◽  
Shengcheng Dong ◽  
Robert S Porter ◽  
Cody Morterud ◽  
...  

Abstract Motivation Genome-wide association studies have revealed that 88% of disease-associated single-nucleotide polymorphisms (SNPs) reside in noncoding regions. However, noncoding SNPs remain understudied, partly because they are challenging to prioritize for experimental validation. To address this deficiency, we developed the SNP effect matrix pipeline (SEMpl). Results SEMpl estimates transcription factor-binding affinity by observing differences in chromatin immunoprecipitation followed by deep sequencing signal intensity for SNPs within functional transcription factor-binding sites (TFBSs) genome-wide. By cataloging the effects of every possible mutation within the TFBS motif, SEMpl can predict the consequences of SNPs to transcription factor binding. This knowledge can be used to identify potential disease-causing regulatory loci. Availability and implementation SEMpl is available from https://github.com/Boyle-Lab/SEM_CPP. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Author(s):  
Haoyang Zeng ◽  
Tatsunori Hashimoto ◽  
Daniel D. Kang ◽  
David K. Gifford

The majority of disease-associated variants identified in genome-wide association studies (GWAS) reside in noncoding regions of the genome with regulatory roles. Thus being able to interpret the functional consequence of a variant is essential for identifying causal variants in the analysis of GWAS studies. We present GERV (Generative Evaluation of Regulatory Variants), a novel computational method for predicting regulatory variants that affect transcription factor binding. GERV learns a k-mer based generative model of transcription factor binding from ChIP-seq and DNase-seq data, and scores variants by computing the change of predicted ChIP-seq reads between the reference and alternate allele. The k-mers learned by GERV capture more sequence determinants of transcription factor binding than a motif-based approach alone, including both a transcription factor's canonical motif as well as associated co-factor motifs. We show that GERV outperforms existing methods in predicting SNPs associated with allele-specific binding. GERV correctly predicts a validated causal variant among linked SNPs, and prioritizes the variants previously reported to modulate the binding of FOXA1 in breast cancer cell lines. Thus, GERV provides a powerful approach for functionally annotating and prioritizing causal variants for experimental follow-up analysis.


2019 ◽  
Author(s):  
Arif Harmanci ◽  
Akdes Serin Harmanci ◽  
Jyothishmathi Swaminathan ◽  
Vidya Gopalakrishnan

Abstract Motivation Functional genomics experiments generate genomewide signal profiles that are dense information sources for annotating the regulatory elements. These profiles measure epigenetic activity at the nucleotide resolution and they exhibit distinctive patterns as they fluctuate along the genome. Most notable of these patterns are the valley patterns that are prevalently observed in assays such as ChIP Sequencing and bisulfite sequencing. The genomic positions of valleys pinpoint locations of cis-regulatory elements such as enhancers and insulators. Systematic identification of the valleys provides novel information for delineating the annotation of regulatory elements. Nevertheless, the valleys are not reported by majority of the analysis pipelines. Results We describe EpiSAFARI, a computational method for sensitive detection of valleys from diverse types of epigenetic profiles. EpiSAFARI employs a novel smoothing method for decreasing noise in signal profiles and accounts for technical factors such as sparse signals, mappability, and nucleotide content. In performance comparisons, EpiSAFARI performs favorably in terms of accuracy. The histone modification valleys detected by EpiSAFARI exhibit high conservation, transcription factor binding, and they are enriched in nascent transcription. In addition, the large clusters of histone valleys are found to be enriched at the promoters of the developmentally associated genes. Differential histone valleys exhibit concordance with differential DNase signal at cell line specific valleys. DNA methylation valleys exhibit elevated conservation and high transcription factor binding. Specifically, we observed enriched binding of transcription factors associated with chromatin structure around methyl-valleys. Availability EpiSAFARI is publicly available at https://github.com/harmancilab/EpiSAFARI Supplementary information Supplementary data are available at Bioinformatics online.


2003 ◽  
Vol 01 (01) ◽  
pp. 21-40 ◽  
Author(s):  
VICTOR OLMAN ◽  
DONG XU ◽  
YING XU

Transcription factor binding sites are short fragments in the upstream regions of genes, to which transcription factors bind to regulate the transcription of genes into mRNA. Computational identification of transcription factor binding sites remains an unsolved challenging problem though a great amount of effort has been put into the study of this problem. We have recently developed a novel technique for identification of binding sites from a set of upstream regions of genes, that could possibly be transcriptionally co-regulated and hence might share similar transcription factor binding sites. By utilizing two key features of such binding sites (i.e. their high sequence similarities and their relatively high frequencies compared to other sequence fragments), we have formulated this problem as a cluster identification problem. That is to identify and extract data clusters from a noisy background. While the classical data clustering problem (partitioning a data set into clusters sharing common or similar features) has been extensively studied, there is no general algorithm for solving the problem of identifying data clusters from a noisy background. In this paper, we present a novel algorithm for solving such a problem. We have proved that a cluster identification problem, under our definition, can be rigorously and efficiently solved through searching for substrings with special properties in a linear sequence. We have also developed a method for assessing the statistical significance of each identified cluster, which can be used to rule out accidental data clusters. We have implemented the cluster identification algorithm and the statistical significance analysis method as a computer software CUBIC. Extensive testing on CUBIC has been carried out. We present here a few applications of CUBIC on challenging cases of binding site identification.


Author(s):  
Dandan Huang ◽  
Zhao Wang ◽  
Yao Zhou ◽  
Qian Liang ◽  
Pak Chung Sham ◽  
...  

Abstract Summary Sampling of control variants having matched properties with input variants is widely used in enrichment analysis of genome-wide association studies/quantitative trait loci and negative data construction for pathogenic/regulatory variant prediction methods. Spurious enrichment results because of confounding factors, such as minor allele frequency and linkage disequilibrium pattern, can be avoided by calibration of statistical significance based on matched controls. Here, we presented vSampler which can generate sets of randomly drawn variants with comprehensive choices of matching properties, such as tissue/cell type-specific epigenomic features. Importantly, the development of a novel data structure and sampling algorithms for vSampler makes it significantly fast than existing tools. Availability and implementation vSampler web server and local program are available at http://mulinlab.org/vsampler. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 22 (12) ◽  
pp. 6454
Author(s):  
Arina O. Degtyareva ◽  
Elena V. Antontseva ◽  
Tatiana I. Merkulova

The vast majority of the genetic variants (mainly SNPs) associated with various human traits and diseases map to a noncoding part of the genome and are enriched in its regulatory compartment, suggesting that many causal variants may affect gene expression. The leading mechanism of action of these SNPs consists in the alterations in the transcription factor binding via creation or disruption of transcription factor binding sites (TFBSs) or some change in the affinity of these regulatory proteins to their cognate sites. In this review, we first focus on the history of the discovery of regulatory SNPs (rSNPs) and systematized description of the existing methodical approaches to their study. Then, we brief the recent comprehensive examples of rSNPs studied from the discovery of the changes in the TFBS sequence as a result of a nucleotide substitution to identification of its effect on the target gene expression and, eventually, to phenotype. We also describe state-of-the-art genome-wide approaches to identification of regulatory variants, including both making molecular sense of genome-wide association studies (GWAS) and the alternative approaches the primary goal of which is to determine the functionality of genetic variants. Among these approaches, special attention is paid to expression quantitative trait loci (eQTLs) analysis and the search for allele-specific events in RNA-seq (ASE events) as well as in ChIP-seq, DNase-seq, and ATAC-seq (ASB events) data.


Author(s):  
Ravi K. Dinesh ◽  
Benjamin Barnhill ◽  
Anoj Ilanges ◽  
Lizhen Wu ◽  
Daniel A. Michelson ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document