scholarly journals Database and Statistical Analyses of Transcription Factor Binding Sites in the Non-Coding Control Region of JC Virus

Viruses ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2314
Author(s):  
Kazuo Nakamichi ◽  
Toshio Shimokawa

JC virus (JCV), as an archetype, establishes a lifelong latent or persistent infection in many healthy individuals. In immunocompromised patients, prototype JCV with variable mutations in the non-coding control region (NCCR) causes progressive multifocal leukoencephalopathy (PML), a severe demyelinating disease. This study was conducted to create a database of NCCR sequences annotated with transcription factor binding sites (TFBSs) and statistically analyze the mutational pattern of the JCV NCCR. JCV NCCRs were extracted from >1000 sequences registered in GenBank, and TFBSs within each NCCR were identified by computer simulation, followed by examination of their prevalence, multiplicity, and location by statistical analyses. In the NCCRs of the prototype JCV, the limited types of TFBSs, which are mainly present in regions D through F of archetype JCV, were significantly reduced. By contrast, modeling count data revealed that several TFBSs located in regions C and E tended to overlap in the prototype NCCRs. Based on data from the BioGPS database, genes encoding transcription factors that bind to these TFBSs were expressed not only in the brain but also in the peripheral sites. The database and NCCR patterns obtained in this study could be a suitable platform for analyzing JCV mutations and pathogenicity.

2021 ◽  
Vol 11 (11) ◽  
pp. 5123
Author(s):  
Maiada M. Mahmoud ◽  
Nahla A. Belal ◽  
Aliaa Youssif

Transcription factors (TFs) are proteins that control the transcription of a gene from DNA to messenger RNA (mRNA). TFs bind to a specific DNA sequence called a binding site. Transcription factor binding sites have not yet been completely identified, and this is considered to be a challenge that could be approached computationally. This challenge is considered to be a classification problem in machine learning. In this paper, the prediction of transcription factor binding sites of SP1 on human chromosome1 is presented using different classification techniques, and a model using voting is proposed. The highest Area Under the Curve (AUC) achieved is 0.97 using K-Nearest Neighbors (KNN), and 0.95 using the proposed voting technique. However, the proposed voting technique is more efficient with noisy data. This study highlights the applicability of the voting technique for the prediction of binding sites, and highlights the outperformance of KNN on this type of data. The study also highlights the significance of using voting.


Sign in / Sign up

Export Citation Format

Share Document