scholarly journals Positional weight matrices have sufficient prediction power for analysis of noncoding variants

F1000Research ◽  
2022 ◽  
Vol 11 ◽  
pp. 33
Author(s):  
Alexandr Boytsov ◽  
Sergey Abramov ◽  
Vsevolod J. Makeev ◽  
Ivan V. Kulakovskiy

The commonly accepted model to quantify the specificity of transcription factor binding to DNA is the position weight matrix, also called the position-specific scoring matrix. Position weight matrices are used in thousands of projects and computational tools in regulatory genomics, including prediction of the regulatory potential of single-nucleotide variants. Yet, recently Yan et al. presented new experimental method for analysis of regulatory variants and, based on its results, reported that "the position weight matrices of most transcription factors lack sufficient predictive power". Here, we re-analyze the rich experimental dataset obtained by Yan et al. and show that appropriately selected position weight matrices in fact can successfully quantify transcription factor binding to alternative alleles.

Author(s):  
Sergey Abramov ◽  
Alexandr Boytsov ◽  
Dariia Bykova ◽  
Dmitry D. Penzar ◽  
Ivan Yevshin ◽  
...  

AbstractSequence variants in gene regulatory regions alter gene expression and contribute to phenotypes of individual cells and the whole organism, including disease susceptibility and progression. Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Differential transcription factor binding in heterozygous genomic loci provides a natural source of information on such regulatory variants. We present a novel approach to call the allele-specific transcription factor binding events at single-nucleotide variants in ChIP-Seq data, taking into account the joint contribution of aneuploidy and local copy number variation, that is estimated directly from variant calls. We have conducted a meta-analysis of more than 7 thousand ChIP-Seq experiments and assembled the database of allele-specific binding events listing more than half a million entries at nearly 270 thousand single-nucleotide polymorphisms for several hundred human transcription factors and cell types. These polymorphisms are enriched for associations with phenotypes of medical relevance and often overlap eQTLs, making candidates for causality by linking variants with molecular mechanisms. Specifically, there is a special class of switching sites, where different transcription factors preferably bind alternative alleles, thus revealing allele-specific rewiring of molecular circuitry.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sebastian Carrasco Pro ◽  
Katia Bulekova ◽  
Brian Gregor ◽  
Adam Labadorf ◽  
Juan Ignacio Fuxman Bass

An amendment to this paper has been published and can be accessed via a link at the top of the paper.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Sergey Abramov ◽  
Alexandr Boytsov ◽  
Daria Bykova ◽  
Dmitry D. Penzar ◽  
Ivan Yevshin ◽  
...  

AbstractSequence variants in gene regulatory regions alter gene expression and contribute to phenotypes of individual cells and the whole organism, including disease susceptibility and progression. Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Differential transcription factor binding in heterozygous genomic loci provides a natural source of information on such regulatory variants. We present a novel approach to call the allele-specific transcription factor binding events at single-nucleotide variants in ChIP-Seq data, taking into account the joint contribution of aneuploidy and local copy number variation, that is estimated directly from variant calls. We have conducted a meta-analysis of more than 7 thousand ChIP-Seq experiments and assembled the database of allele-specific binding events listing more than half a million entries at nearly 270 thousand single-nucleotide polymorphisms for several hundred human transcription factors and cell types. These polymorphisms are enriched for associations with phenotypes of medical relevance and often overlap eQTLs, making candidates for causality by linking variants with molecular mechanisms. Specifically, there is a special class of switching sites, where different transcription factors preferably bind alternative alleles, thus revealing allele-specific rewiring of molecular circuitry.


Epigenomics ◽  
2021 ◽  
Vol 13 (8) ◽  
pp. 613-630
Author(s):  
Vidya Chidambaran ◽  
Xue Zhang ◽  
Valentina Pilipenko ◽  
Xiaoting Chen ◽  
Benjamin Wronowski ◽  
...  

Background: Overlap of pathways enriched by single nucleotide polymorphisms and DNA-methylation underlying chronic postsurgical pain (CPSP), prompted pilot study of CPSP-associated methylation quantitative trait loci (meQTL). Materials & methods: Children undergoing spine-fusion were recruited prospectively. Logistic-regression for genome- and epigenome-wide CPSP association and DNA-methylation-single nucleotide polymorphism association/mediation analyses to identify meQTLs were followed by functional genomics analyses. Results: CPSP (n = 20/58) and non-CPSP groups differed in pain-measures. Of 2753 meQTLs, DNA-methylation at 127 cytosine–guanine dinucleotides mediated association of 470 meQTLs with CPSP (p < 0.05). At PARK16 locus, CPSP risk meQTLs were associated with decreased DNA-methylation at RAB7L1 and increased DNA-methylation at PM20D1. Corresponding RAB7L1/PM20D1 blood eQTLs (GTEx) and cytosine–guanine dinucleotide-loci enrichment for histone marks, transcription factor binding sites and ATAC-seq peaks suggest altered transcription factor-binding. Conclusion: CPSP-associated meQTLs indicate epigenetic mechanisms mediate genetic risk. Clinical trial registration: NCT01839461 , NCT01731873  (ClinicalTrials.gov).


2019 ◽  
Vol 17 ◽  
pp. 1415-1428 ◽  
Author(s):  
Walter Santana-Garcia ◽  
Maria Rocha-Acevedo ◽  
Lucia Ramirez-Navarro ◽  
Yvon Mbouamboua ◽  
Denis Thieffry ◽  
...  

2013 ◽  
Vol 11 (01) ◽  
pp. 1340004 ◽  
Author(s):  
IVAN KULAKOVSKIY ◽  
VICTOR LEVITSKY ◽  
DMITRY OSHCHEPKOV ◽  
LEONID BRYZGALOV ◽  
ILYA VORONTSOV ◽  
...  

Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) became a method of choice to locate DNA segments bound by different regulatory proteins. ChIP-Seq produces extremely valuable information to study transcriptional regulation. The wet-lab workflow is often supported by downstream computational analysis including construction of models of nucleotide sequences of transcription factor binding sites in DNA, which can be used to detect binding sites in ChIP-Seq data at a single base pair resolution. The most popular TFBS model is represented by positional weight matrix (PWM) with statistically independent positional weights of nucleotides in different columns; such PWMs are constructed from a gapless multiple local alignment of sequences containing experimentally identified TFBSs. Modern high-throughput techniques, including ChIP-Seq, provide enough data for careful training of advanced models containing more parameters than PWM. Yet, many suggested multiparametric models often provide only incremental improvement of TFBS recognition quality comparing to traditional PWMs trained on ChIP-Seq data. We present a novel computational tool, diChIPMunk, that constructs TFBS models as optimal dinucleotide PWMs, thus accounting for correlations between nucleotides neighboring in input sequences. diChIPMunk utilizes many advantages of ChIPMunk, its ancestor algorithm, accounting for ChIP-Seq base coverage profiles ("peak shape") and using the effective subsampling-based core procedure which allows processing of large datasets. We demonstrate that diPWMs constructed by diChIPMunk outperform traditional PWMs constructed by ChIPMunk from the same ChIP-Seq data. Software website: http://autosome.ru/dichipmunk/


2015 ◽  
Vol 32 (4) ◽  
pp. 490-496 ◽  
Author(s):  
Haoyang Zeng ◽  
Tatsunori Hashimoto ◽  
Daniel D. Kang ◽  
David K. Gifford

2021 ◽  
Vol 25 (1) ◽  
pp. 7-17
Author(s):  
A. V. Tsukanov ◽  
V. G. Levitsky ◽  
T. I. Merkulova

The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe, can do as much. However, application of these models was usually limited only to comparing their recognition accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their classif ication based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a signif icant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was 26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe, respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity. We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq datasets under study.


Sign in / Sign up

Export Citation Format

Share Document