scholarly journals Estimating the functional impact of INDELs in transcription factor binding sites: a genome-wide landscape

2016 ◽  
Author(s):  
Esben Eickhardt ◽  
Thomas Damm Als ◽  
Jakob Grove ◽  
Anders Dupont Boerglum ◽  
Francesco Lescai

AbstractBackgroundVariants in transcription factor binding sites (TFBSs) may have important regulatory effects, as they have the potential to alter transcription factor (TF) binding affinities and thereby affecting gene expression. With recent advances in sequencing technologies the number of variants identified in TFBSs has increased, hence understanding their role is of significant interest when interpreting next generation sequencing data. Current methods have two major limitations: they are limited to predicting the functional impact of single nucleotide variants (SNVs) and often rely on additional experimental data, laborious and expensive to acquire. We propose a purely bioinformatic method that addresses these two limitations while providing comparable results.ResultsOur method uses position weight matrices and a sliding window approach, in order to account for the sequence context of variants, and scores the consequences of both SNVs and INDELs in TFBSs. We tested the accuracy of our method in two different ways. Firstly, we compared it to a recent method based on DNase I hypersensitive sites sequencing (DHS-seq) data designed to predict the effects of SNVs: we found a significant correlation of our score both with their DHS-seq data and their prediction model. Secondly, we called INDELs on publicly available DHS-seq data from ENCODE, and found our score to represent well the experimental data. We concluded that our method is reliable and we used it to describe the landscape of variation in TFBSs in the human genome, by scoring all variants in the 1000 Genomes Project Phase 3. Surprisingly, we found that most insertions have neutral effects on binding sites, while deletions, as expected, were found to have the most severe TFBS-scores. We identified four categories of variants based on their TFBS-scores and tested them for enrichment of variants classified as pathogenic, benign and protective in ClinVar: we found that the variants with the most negative TFBS-scores have the most significant enrichment for pathogenic variants.ConclusionsOur method addresses key shortcomings of currently available bioinformatic tools in predicting the effects of INDELs in TFBSs, and provides an unprecedented window into the genome-wide landscape of INDELs, their predicted influences on TF binding, and potential relevance for human diseases. We thus offer an additional tool to help prioritising non-coding variants in sequencing studies.

PLoS ONE ◽  
2009 ◽  
Vol 4 (10) ◽  
pp. e7526 ◽  
Author(s):  
Alfredo Mendoza-Vargas ◽  
Leticia Olvera ◽  
Maricela Olvera ◽  
Ricardo Grande ◽  
Leticia Vega-Alvarado ◽  
...  

PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2056 ◽  
Author(s):  
Yevgeny Nikolaichik ◽  
Aliaksandr U. Damienikan

The majority of bacterial genome annotations are currently automated and based on a ‘gene by gene’ approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bacterial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft RotEnterobacteriaceae(PectobacteriumandDickeyaspp.) andPseudomonasspp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome ofPectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of theP. atrosepticumchromosome. Reviewing the annotation in cases where it didn’t fit with regulatory information allowed us to correct product and gene names for over 300 loci.


2013 ◽  
Vol 6 (1) ◽  
pp. 30 ◽  
Author(s):  
Daniel Savic ◽  
Jason Gertz ◽  
Preti Jain ◽  
Gregory M Cooper ◽  
Richard M Myers

Sign in / Sign up

Export Citation Format

Share Document