FROM BINDING MOTIFS IN CHIP-SEQ DATA TO IMPROVED MODELS OF TRANSCRIPTION FACTOR BINDING SITES

Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) became a method of choice to locate DNA segments bound by different regulatory proteins. ChIP-Seq produces extremely valuable information to study transcriptional regulation. The wet-lab workflow is often supported by downstream computational analysis including construction of models of nucleotide sequences of transcription factor binding sites in DNA, which can be used to detect binding sites in ChIP-Seq data at a single base pair resolution. The most popular TFBS model is represented by positional weight matrix (PWM) with statistically independent positional weights of nucleotides in different columns; such PWMs are constructed from a gapless multiple local alignment of sequences containing experimentally identified TFBSs. Modern high-throughput techniques, including ChIP-Seq, provide enough data for careful training of advanced models containing more parameters than PWM. Yet, many suggested multiparametric models often provide only incremental improvement of TFBS recognition quality comparing to traditional PWMs trained on ChIP-Seq data. We present a novel computational tool, diChIPMunk, that constructs TFBS models as optimal dinucleotide PWMs, thus accounting for correlations between nucleotides neighboring in input sequences. diChIPMunk utilizes many advantages of ChIPMunk, its ancestor algorithm, accounting for ChIP-Seq base coverage profiles ("peak shape") and using the effective subsampling-based core procedure which allows processing of large datasets. We demonstrate that diPWMs constructed by diChIPMunk outperform traditional PWMs constructed by ChIPMunk from the same ChIP-Seq data. Software website: http://autosome.ru/dichipmunk/

Download Full-text

Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites

Vavilov Journal of Genetics and Breeding ◽

10.18699/vj21.002 ◽

2021 ◽

Vol 25 (1) ◽

pp. 7-17

Author(s):

A. V. Tsukanov ◽

V. G. Levitsky ◽

T. I. Merkulova

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

De Novo ◽

Transcription Factor Binding Sites ◽

Structural Heterogeneity ◽

Transcription Factor Binding ◽

Factor Binding ◽

Model Training ◽

Motif Recognition ◽

Positional Weight

The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe, can do as much. However, application of these models was usually limited only to comparing their recognition accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their classif ication based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a signif icant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was 26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe, respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity. We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq datasets under study.

Download Full-text

Mining Discriminative Distance Context of Transcription Factor Binding Sites on ChIP Enriched Regions

Bioinformatics Research and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-540-72031-7_31 ◽

2007 ◽

pp. 338-349

Author(s):

Hyunmin Kim ◽

Katherina J. Kechris ◽

Lawrence Hunter

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Factor Binding ◽

On Chip

Download Full-text

Discovering a less-is-more effect to select transcription factor binding sites informative for motif inference

10.1101/2020.11.29.402941 ◽

2020 ◽

Author(s):

Jinrui Xu ◽

Jiahao Gao ◽

Mark Gerstein

Keyword(s):

Transcription Factor ◽

Statistical Methods ◽

Binding Sites ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Binding Motifs ◽

Less Is More ◽

Factor Binding ◽

Strong Binding ◽

Binding Regions

ABSTRACTMany statistical methods have been developed to infer the binding motifs of a transcription factor (TF) from a subset of its numerous binding regions in the genome. We refer to such regions, e.g. detected by ChIP-seq, as binding sites. The sites with strong binding signals are selected for motif inference. However, binding signals do not necessarily indicate the existence of target motifs. Moreover, even strong binding signals can be spurious due to experimental artifacts. Here, we observe that such uninformative sites without target motifs tend to be “crowded” -- i.e. have many other TF binding sites present nearby. In addition, we find that even if a crowded site contains recognizable target motifs, it can still be uninformative for motif inference due to the presence of interfering motifs from other TFs. We propose using less crowded and shorter binding sites in motif interference and develop specific recommendations for carrying this out. We find our recommendations substantially improve the resulting motifs in various contexts by 30%-70%, implying a “less-is-more” effect.

Download Full-text