enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning

DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97–9.52% in ACC and 0.08–0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83–16.63% in terms of ACC and 0.02–0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.

Download Full-text

Collation and analyses of DNA-binding protein domain families from sequence and structural databanks

Molecular BioSystems ◽

10.1039/c4mb00629a ◽

2015 ◽

Vol 11 (4) ◽

pp. 1110-1118 ◽

Cited By ~ 2

Author(s):

Sony Malhotra ◽

Ramanathan Sowdhamini

Keyword(s):

Dna Binding ◽

Binding Proteins ◽

Binding Protein ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Protein Domain ◽

Divergent Evolution ◽

Molecular Function ◽

Molecular Functions

The distribution of GO molecular functions across different SCOP DNA-binding folds was studied. Majority of the folds were observed to perform more than one molecular function. This supports the notion that majority of DNA-binding proteins might follow divergent evolution.

Download Full-text

Limited proteolysis studies on the Escherichia coli single-stranded DNA binding protein. Evidence for a functionally homologous domain in both the Escherichia coli and T4 DNA binding proteins.

Journal of Biological Chemistry ◽

10.1016/s0021-9258(18)32867-9 ◽

1983 ◽

Vol 258 (5) ◽

pp. 3346-3355 ◽

Cited By ~ 18

Author(s):

K R Williams ◽

E K Spicer ◽

M B LoPresti ◽

R A Guggenheimer ◽

J W Chase

Keyword(s):

Escherichia Coli ◽

Dna Binding ◽

Binding Proteins ◽

Binding Protein ◽

Limited Proteolysis ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Single Stranded Dna

Download Full-text

Characterization of single-stranded DNA-binding protein SsbB fromStaphylococcus aureus: SsbB cannot stimulate PriA helicase

RSC Advances ◽

10.1039/c8ra04392b ◽

2018 ◽

Vol 8 (50) ◽

pp. 28367-28375 ◽

Cited By ~ 5

Author(s):

Kuan-Lin Chen ◽

Jen-Hao Cheng ◽

Chih-Yang Lin ◽

Yen-Hua Huang ◽

Cheng-Yang Huang

Keyword(s):

Dna Replication ◽

Dna Binding ◽

Binding Proteins ◽

Binding Protein ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Metabolic Processes ◽

Single Stranded Dna

Single-stranded DNA-binding proteins (SSBs) are essential to cells as they participate in DNA metabolic processes, such as DNA replication, repair, and recombination.

Download Full-text

FermatS: a novel numerical representation for protein sequence comparison and DNA-binding protein identification

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207323999201117111738 ◽

2020 ◽

Vol 23 ◽

Author(s):

Yanping Zhang ◽

Ya Gao ◽

Jianwei Ni ◽

Pengcheng Chen ◽

Xiaosheng Wang

Keyword(s):

Dna Binding ◽

Protein Identification ◽

Binding Proteins ◽

Protein Sequences ◽

Low Complexity ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Numerical Representation ◽

Position Information ◽

Protein Sequence Comparison

Aim and Objective: Given the rapidly increasing number of molecular biology data available, computational methods of low complexity are necessary to infer protein structure, function, and evolution. Method: In the work, we proposed a novel mthod, FermatS, which based on the global position information and local position representation from the curve and normalized moments of inertia, respectively, to extract features information of protein sequences. Furthermore, we use the generated features by FermatS method to analyze the similarity/dissimilarity of nine ND5 proteins and establish the prediction model of DNA-binding proteins based on logistic regression with 5-fold crossvalidation. Results: In the similarity/dissimilarity analysis of nine ND5 proteins, the results are consistent with evolutionary theory. Moreover, this method can effectively predict the DNA-binding proteins in realistic situations. Conclusion: The findings demonstrate that the proposed method is effective for comparing, recognizing and predicting protein sequences. The main code and datasets can download from https://github.com/GaoYa1122/FermatS..

Download Full-text

Computational Methods for Predicting DNA Binding Proteins

Current Proteomics ◽

10.2174/1570164616666190722141129 ◽

2020 ◽

Vol 17 (4) ◽

pp. 258-270

Author(s):

Gaofeng Pan ◽

Jiandong Wang ◽

Liang Zhao ◽

William Hoskins ◽

Jijun Tang

Keyword(s):

Machine Learning ◽

Dna Binding ◽

Computational Methods ◽

Binding Proteins ◽

Binding Protein ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Learning Methods ◽

Machine Learning Methods ◽

Classifier Algorithms

Background: DNA-binding proteins are very important to many biomolecular functions. The traditional experimental methods are expensive and time-consuming, so, computational methods that can predict whether a protein is a DNA-binding protein or not are very helpful to researchers. Machine learning has been widely used in many research areas. Many researchers have proposed machine learning methods for DNA-binding protein prediction, and this paper highlights their advantages and disadvantages. Objective: There are many computational methods that can predict DNA-binding proteins. Every method uses different features and different classifier algorithms. In this paper, a review of these methods is provided to find out some common procedures that can help researchers to develop more accurate methods. Methods: Firstly, the information stored in the protein sequence and gene sequence is presented. That information is the basis to find out the patterns leading to binding. Then, feature extraction methods and classifier algorithms are discussed. At last, some commonly used benchmark datasets are analysed and evaluated by methods. Conclusion: In this review, we analyzed some popular computational methods to predict DNAbinding protein. From those methods, we highlighted many features necessary to build up an accurate DNA-binding protein classifier. This can also help researchers to build up more useful computational tools. Currently, there are some machine learning methods with good performance in predicting DNAbinding proteins. The performance can be improved by using different kinds of features and classifiers.

Download Full-text

Molecular goniometers for single-particle cryo-EM of DNA-binding proteins

10.1101/2020.02.27.968883 ◽

2020 ◽

Cited By ~ 1

Author(s):

Tural Aksel ◽

Zanlin Yu ◽

Yifan Cheng ◽

Shawn M. Douglas

Keyword(s):

Dna Binding ◽

Single Particle ◽

Binding Proteins ◽

Binding Protein ◽

Accurate Determination ◽

Dna Nanotechnology ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Small Proteins ◽

Particle Images

AbstractCorrect reconstruction of macromolecular structure by cryo-electron microscopy relies on accurate determination of the orientation of single-particle images. For small (<100 kDa) DNA-binding proteins, obtaining particle images with sufficiently asymmetric features to correctly guide alignment is challenging. DNA nanotechnology was conceived as a potential tool for building host nanostructures to prescribe the locations and orientations of docked proteins. We used DNA origami to construct molecular goniometers—instruments to precisely orient objects—to dock a DNA-binding protein on a double-helix stage that has user-programmable tilt and rotation angles. Each protein orientation maps to a distinct barcode pattern specifying particle classification and angle assignment. We used goniometers to obtain a 6.5 Å structure of BurrH, an 82-kDa DNA-binding protein whose helical pseudosymmetry prevents accurate image orientation using classical cryo-EM. Our approach should be adaptable for other DNA-binding proteins, and a wide variety of other small proteins, by fusing DNA binding domains to them.

Download Full-text

The Recognition Method for the Supersecondary Structure of DNA-Binding Protein

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.602-605.1614 ◽

2014 ◽

Vol 602-605 ◽

pp. 1614-1617

Author(s):

Ming Hai Yao ◽

Na Wang

Keyword(s):

Secondary Structure ◽

Dna Binding ◽

Binding Proteins ◽

Gene Expression Regulation ◽

Binding Protein ◽

Transition Probability ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Recognition Method ◽

Homologous Proteins

The structure of DNA binding proteins is identified that has great significance for the study of gene expression regulation mechanism.The new recognition method is proposed to identify the super-secondary structure and structure domain of DNA-binding protein in this paper. The nucleotide transition probability is calculated by the known DNA-binding protein binding locus sequence. Using mouse data which downloaded from the TRANSFAC establish the binding protein super-secondary structure recognition models. The probability score is calculated by the transition probability of the binding site and the background. This method differs from the conventional method, It is neither the amino acid sequence of the protein, nor the use of homologous proteins. In order to verify the validity of the algorithm, 10 DNA-binding proteins of drosophila and yeast are used to do the experiment. The experimental results show that our method has very good recognition result.

Download Full-text

Efficient isolation of specific genomic regions retaining molecular interactions by the iChIP system using recombinant exogenous DNA-binding proteins

10.1101/006080 ◽

2014 ◽

Cited By ~ 1

Author(s):

Toshitsugu Fujita ◽

Hodaka Fujii

Keyword(s):

Dna Binding ◽

Molecular Interactions ◽

Binding Proteins ◽

Binding Protein ◽

Affinity Purification ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Genomic Region ◽

Exogenous Dna ◽

Genomic Regions

Background: Comprehensive understanding of mechanisms of genome functions requires identification of molecules interacting with genomic regions of interest in vivo. We have developed the insertional chromatin immunoprecipitatin (iChIP) technology to isolate specific genomic regions retaining molecular interactions and identify their associated molecules. iChIP consists of locus-tagging and affinity purification. The recognition sequences of an exogenous DNA-binding protein such as LexA are inserted into a genomic region of interest in the cell to be analyzed. The exogenous DNA-binding protein fused with a tag(s) is expressed in the cell and the target genomic region is purified with antibody against the tag(s). In this study, we developed the iChIP system using recombinant DNA-binding proteins to make iChIP more straightforward. Results: In this system, recombinant 3xFNLDD-D (r3xFNLDD-D) consisting of the 3xFLAG-tag, a nuclear localization signal, the DNA-binding domain plus the dimerization domain of the LexA protein, and the Dock-tag is used for isolation of specific genomic regions. 3xFNLDD-D was expressed using a silkworm-baculovirus expression system and purified by affinity purification. iChIP using r3xFNLDD-D could efficiently isolate the single-copy chicken Pax5 (cPax5) locus, in which LexA binding elements were inserted, with negligible contamination of other genomic regions. In addition, we could detect RNA associated with the cPax5 locus using this form of the iChIP system combined with RT-PCR. Conclusions: The iChIP system using r3xFNLDD-D can isolate specific genomic regions retaining molecular interactions without expression of the exogenous DNA-binding protein in the cell to be analyzed. iChIP using r3xFNLDD-D would be more straightforward and useful for analysis of specific genomic regions to elucidate their functions.

Download Full-text

KK-DBP: A Multi-Feature Fusion Method for DNA-Binding Protein Identification Based on Random Forest

Frontiers in Genetics ◽

10.3389/fgene.2021.811158 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yuran Jia ◽

Shan Huang ◽

Tianjiao Zhang

Keyword(s):

Dna Binding ◽

Prediction Accuracy ◽

Large Scale ◽

Protein Identification ◽

Feature Fusion ◽

Binding Protein ◽

Rapid Development ◽

Dna Binding Protein ◽

Feature Extraction Method ◽

Independent Test Dataset

DNA-binding protein (DBP) is a protein with a special DNA binding domain that is associated with many important molecular biological mechanisms. Rapid development of computational methods has made it possible to predict DBP on a large scale; however, existing methods do not fully integrate DBP-related features, resulting in rough prediction results. In this article, we develop a DNA-binding protein identification method called KK-DBP. To improve prediction accuracy, we propose a feature extraction method that fuses multiple PSSM features. The experimental results show a prediction accuracy on the independent test dataset PDB186 of 81.22%, which is the highest of all existing methods.

Download Full-text