Pronounced sequence specificity of the TET enzyme catalytic domain guides its cellular function

2021 ◽  
Author(s):  
Mirunalini Ravichandran ◽  
Dominik Rafalski ◽  
Oscar Ortega-Recalde ◽  
Claudia I Davies ◽  
Cassandra R Glanfield ◽  
...  

TET (ten-eleven translocation) enzymes catalyze the oxidation of 5-methylcytosine bases in DNA, thus driving active and passive DNA demethylation. Here, we report that the catalytic cores of mammalian TET enzymes favor CpGs embedded within bHLH and bZIP transcription factor binding sites, with 250-fold preference in vitro. Crystal structures and molecular dynamics calculations show that sequence preference is caused by intra-substrate interactions and CpG flanking sequence indirectly affecting enzyme conformation. TET sequence preferences are physiologically relevant as they explain the rates of DNA demethylation in TET-rescue experiments in culture and in vivo within the zygote and germline. Most and least favorable TET motifs represent DNA sites that are bound by methylation-sensitive immediate-early transcription factors and OCT4, respectively, illuminating TET function in transcriptional responses and pluripotency support. One-Sentence Summary: The catalytic domains of the enzymes that facilitate passive and drive active DNA demethylation have intrinsic sequence preferences that target DNA demethylation to bHLH and bZIP transcription factor binding sites.

2021 ◽  
Vol 11 (11) ◽  
pp. 5123
Author(s):  
Maiada M. Mahmoud ◽  
Nahla A. Belal ◽  
Aliaa Youssif

Transcription factors (TFs) are proteins that control the transcription of a gene from DNA to messenger RNA (mRNA). TFs bind to a specific DNA sequence called a binding site. Transcription factor binding sites have not yet been completely identified, and this is considered to be a challenge that could be approached computationally. This challenge is considered to be a classification problem in machine learning. In this paper, the prediction of transcription factor binding sites of SP1 on human chromosome1 is presented using different classification techniques, and a model using voting is proposed. The highest Area Under the Curve (AUC) achieved is 0.97 using K-Nearest Neighbors (KNN), and 0.95 using the proposed voting technique. However, the proposed voting technique is more efficient with noisy data. This study highlights the applicability of the voting technique for the prediction of binding sites, and highlights the outperformance of KNN on this type of data. The study also highlights the significance of using voting.


Sign in / Sign up

Export Citation Format

Share Document