scholarly journals enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning

2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Ruifeng Xu ◽  
Jiyun Zhou ◽  
Bin Liu ◽  
Lin Yao ◽  
Yulan He ◽  
...  

DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97–9.52% in ACC and 0.08–0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83–16.63% in terms of ACC and 0.02–0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.

2015 ◽  
Vol 11 (4) ◽  
pp. 1110-1118 ◽  
Author(s):  
Sony Malhotra ◽  
Ramanathan Sowdhamini

The distribution of GO molecular functions across different SCOP DNA-binding folds was studied. Majority of the folds were observed to perform more than one molecular function. This supports the notion that majority of DNA-binding proteins might follow divergent evolution.


RSC Advances ◽  
2018 ◽  
Vol 8 (50) ◽  
pp. 28367-28375 ◽  
Author(s):  
Kuan-Lin Chen ◽  
Jen-Hao Cheng ◽  
Chih-Yang Lin ◽  
Yen-Hua Huang ◽  
Cheng-Yang Huang

Single-stranded DNA-binding proteins (SSBs) are essential to cells as they participate in DNA metabolic processes, such as DNA replication, repair, and recombination.


Author(s):  
Yanping Zhang ◽  
Ya Gao ◽  
Jianwei Ni ◽  
Pengcheng Chen ◽  
Xiaosheng Wang

Aim and Objective: Given the rapidly increasing number of molecular biology data available, computational methods of low complexity are necessary to infer protein structure, function, and evolution. Method: In the work, we proposed a novel mthod, FermatS, which based on the global position information and local position representation from the curve and normalized moments of inertia, respectively, to extract features information of protein sequences. Furthermore, we use the generated features by FermatS method to analyze the similarity/dissimilarity of nine ND5 proteins and establish the prediction model of DNA-binding proteins based on logistic regression with 5-fold crossvalidation. Results: In the similarity/dissimilarity analysis of nine ND5 proteins, the results are consistent with evolutionary theory. Moreover, this method can effectively predict the DNA-binding proteins in realistic situations. Conclusion: The findings demonstrate that the proposed method is effective for comparing, recognizing and predicting protein sequences. The main code and datasets can download from https://github.com/GaoYa1122/FermatS..


2020 ◽  
Vol 17 (4) ◽  
pp. 258-270
Author(s):  
Gaofeng Pan ◽  
Jiandong Wang ◽  
Liang Zhao ◽  
William Hoskins ◽  
Jijun Tang

Background: DNA-binding proteins are very important to many biomolecular functions. The traditional experimental methods are expensive and time-consuming, so, computational methods that can predict whether a protein is a DNA-binding protein or not are very helpful to researchers. Machine learning has been widely used in many research areas. Many researchers have proposed machine learning methods for DNA-binding protein prediction, and this paper highlights their advantages and disadvantages. Objective: There are many computational methods that can predict DNA-binding proteins. Every method uses different features and different classifier algorithms. In this paper, a review of these methods is provided to find out some common procedures that can help researchers to develop more accurate methods. Methods: Firstly, the information stored in the protein sequence and gene sequence is presented. That information is the basis to find out the patterns leading to binding. Then, feature extraction methods and classifier algorithms are discussed. At last, some commonly used benchmark datasets are analysed and evaluated by methods. Conclusion: In this review, we analyzed some popular computational methods to predict DNAbinding protein. From those methods, we highlighted many features necessary to build up an accurate DNA-binding protein classifier. This can also help researchers to build up more useful computational tools. Currently, there are some machine learning methods with good performance in predicting DNAbinding proteins. The performance can be improved by using different kinds of features and classifiers.


Author(s):  
Tural Aksel ◽  
Zanlin Yu ◽  
Yifan Cheng ◽  
Shawn M. Douglas

AbstractCorrect reconstruction of macromolecular structure by cryo-electron microscopy relies on accurate determination of the orientation of single-particle images. For small (<100 kDa) DNA-binding proteins, obtaining particle images with sufficiently asymmetric features to correctly guide alignment is challenging. DNA nanotechnology was conceived as a potential tool for building host nanostructures to prescribe the locations and orientations of docked proteins. We used DNA origami to construct molecular goniometers—instruments to precisely orient objects—to dock a DNA-binding protein on a double-helix stage that has user-programmable tilt and rotation angles. Each protein orientation maps to a distinct barcode pattern specifying particle classification and angle assignment. We used goniometers to obtain a 6.5 Å structure of BurrH, an 82-kDa DNA-binding protein whose helical pseudosymmetry prevents accurate image orientation using classical cryo-EM. Our approach should be adaptable for other DNA-binding proteins, and a wide variety of other small proteins, by fusing DNA binding domains to them.


2014 ◽  
Vol 602-605 ◽  
pp. 1614-1617
Author(s):  
Ming Hai Yao ◽  
Na Wang

The structure of DNA binding proteins is identified that has great significance for the study of gene expression regulation mechanism.The new recognition method is proposed to identify the super-secondary structure and structure domain of DNA-binding protein in this paper. The nucleotide transition probability is calculated by the known DNA-binding protein binding locus sequence. Using mouse data which downloaded from the TRANSFAC establish the binding protein super-secondary structure recognition models. The probability score is calculated by the transition probability of the binding site and the background. This method differs from the conventional method, It is neither the amino acid sequence of the protein, nor the use of homologous proteins. In order to verify the validity of the algorithm, 10 DNA-binding proteins of drosophila and yeast are used to do the experiment. The experimental results show that our method has very good recognition result.


2014 ◽  
Author(s):  
Toshitsugu Fujita ◽  
Hodaka Fujii

Background: Comprehensive understanding of mechanisms of genome functions requires identification of molecules interacting with genomic regions of interest in vivo. We have developed the insertional chromatin immunoprecipitatin (iChIP) technology to isolate specific genomic regions retaining molecular interactions and identify their associated molecules. iChIP consists of locus-tagging and affinity purification. The recognition sequences of an exogenous DNA-binding protein such as LexA are inserted into a genomic region of interest in the cell to be analyzed. The exogenous DNA-binding protein fused with a tag(s) is expressed in the cell and the target genomic region is purified with antibody against the tag(s). In this study, we developed the iChIP system using recombinant DNA-binding proteins to make iChIP more straightforward. Results: In this system, recombinant 3xFNLDD-D (r3xFNLDD-D) consisting of the 3xFLAG-tag, a nuclear localization signal, the DNA-binding domain plus the dimerization domain of the LexA protein, and the Dock-tag is used for isolation of specific genomic regions. 3xFNLDD-D was expressed using a silkworm-baculovirus expression system and purified by affinity purification. iChIP using r3xFNLDD-D could efficiently isolate the single-copy chicken Pax5 (cPax5) locus, in which LexA binding elements were inserted, with negligible contamination of other genomic regions. In addition, we could detect RNA associated with the cPax5 locus using this form of the iChIP system combined with RT-PCR. Conclusions: The iChIP system using r3xFNLDD-D can isolate specific genomic regions retaining molecular interactions without expression of the exogenous DNA-binding protein in the cell to be analyzed. iChIP using r3xFNLDD-D would be more straightforward and useful for analysis of specific genomic regions to elucidate their functions.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yuran Jia ◽  
Shan Huang ◽  
Tianjiao Zhang

DNA-binding protein (DBP) is a protein with a special DNA binding domain that is associated with many important molecular biological mechanisms. Rapid development of computational methods has made it possible to predict DBP on a large scale; however, existing methods do not fully integrate DBP-related features, resulting in rough prediction results. In this article, we develop a DNA-binding protein identification method called KK-DBP. To improve prediction accuracy, we propose a feature extraction method that fuses multiple PSSM features. The experimental results show a prediction accuracy on the independent test dataset PDB186 of 81.22%, which is the highest of all existing methods.


2005 ◽  
Vol 11 (20) ◽  
pp. 7354-7361 ◽  
Author(s):  
Mahmut Yasen ◽  
Kazunori Kajino ◽  
Sayaka Kano ◽  
Hiroshi Tobita ◽  
Junji Yamamoto ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document