PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction

Interactions between proteins and DNAs play essential roles in many biological processes. DNA binding proteins can be classified into two categories. Double-stranded DNA-binding proteins (DSBs) bind to double-stranded DNA and are involved in a series of cell functions such as gene expression and regulation. Single-stranded DNA-binding proteins (SSBs) are necessary for DNA replication, recombination, and repair and are responsible for binding to the single-stranded DNA. Therefore, the effective classification of DNA-binding proteins is helpful for functional annotations of proteins. In this work, we propose PredPSD, a computational method based on sequence information that accurately predicts SSBs and DSBs. It introduces three novel feature extraction algorithms. In particular, we use the autocross-covariance (ACC) transformation to transform feature matrices into fixed-length vectors. Then, we put the optimal feature subset obtained by the minimal-redundancy-maximal-relevance criterion (mRMR) feature selection algorithm into the gradient tree boosting (GTB). In 10-fold cross-validation based on a benchmark dataset, PredPSD achieves promising performances with an AUC score of 0.956 and an accuracy of 0.912, which are better than those of existing methods. Moreover, our method has significantly improved the prediction accuracy in independent testing. The experimental results show that PredPSD can significantly recognize the binding specificity and differentiate DSBs and SSBs.

Download Full-text

Single-Stranded DNA Binding Proteins Unwind the Newly Synthesized Double-Stranded DNA of Model Miniforks

Biochemistry ◽

10.1021/bi101583e ◽

2011 ◽

Vol 50 (6) ◽

pp. 932-944 ◽

Cited By ~ 12

Author(s):

Emmanuelle Delagoutte ◽

Amélie Heneman-Masurel ◽

Giuseppe Baldacci

Keyword(s):

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Single Stranded Dna ◽

Double Stranded Dna

Download Full-text

Identification of DNA-Binding Proteins by Multiple Kernel Support Vector Machine and Sequence Information

Current Proteomics ◽

10.2174/1570164616666190417100509 ◽

2020 ◽

Vol 17 (4) ◽

pp. 302-310

Author(s):

Yijie Ding ◽

Feng Chen ◽

Xiaoyi Guo ◽

Jijun Tang ◽

Hongjie Wu

Keyword(s):

Support Vector Machine ◽

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Computational Method ◽

Support Vector ◽

Sequence Information ◽

Data Sets ◽

Multiple Kernel ◽

Kernel Support Vector Machine

Background: The DNA-binding proteins is an important process in multiple biomolecular functions. However, the tradition experimental methods for DNA-binding proteins identification are still time consuming and extremely expensive. Objective: In past several years, various computational methods have been developed to detect DNAbinding proteins. However, most of them do not integrate multiple information. Methods: In this study, we propose a novel computational method to predict DNA-binding proteins by two steps Multiple Kernel Support Vector Machine (MK-SVM) and sequence information. Firstly, we extract several feature and construct multiple kernels. Then, multiple kernels are linear combined by Multiple Kernel Learning (MKL). At last, a final SVM model, constructed by combined kernel, is built to predict DNA-binding proteins. Results: The proposed method is tested on two benchmark data sets. Compared with other existing method, our approach is comparable, even better than other methods on some data sets. Conclusion: We can conclude that MK-SVM is more suitable than common SVM, as the classifier for DNA-binding proteins identification.

Download Full-text

PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method

BioMed Research International ◽

10.1155/2020/7297631 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Jun Wang ◽

Huiwen Zheng ◽

Yang Yang ◽

Wanyue Xiao ◽

Taigang Liu

Keyword(s):

Dna Binding ◽

Binding Proteins ◽

Transition Probability ◽

Dna Binding Proteins ◽

Computational Method ◽

Superior Performance ◽

Sequence Information ◽

Experimental Approaches ◽

Wet Lab ◽

Two Stages

DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone.

Download Full-text

On the mechanism of pairing of single- and double-stranded DNA molecules by the recA and single-stranded DNA-binding proteins of Escherichia coli.

Journal of Biological Chemistry ◽

10.1016/s0021-9258(17)36047-7 ◽

1986 ◽

Vol 261 (3) ◽

pp. 1025-1030

Author(s):

D A Julin ◽

P W Riddles ◽

I R Lehman

Keyword(s):

Escherichia Coli ◽

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Dna Molecules ◽

Single Stranded Dna ◽

Double Stranded Dna

Download Full-text

Identification of a growth hormone gene promoter repressor element and its cognate double- and single-stranded DNA-binding proteins.

Journal of Biological Chemistry ◽

10.1016/s0021-9258(19)39253-1 ◽

1990 ◽

Vol 265 (12) ◽

pp. 7022-7028

Author(s):

W T Pan ◽

Q R Liu ◽

C Bancroft

Keyword(s):

Growth Hormone ◽

Dna Binding ◽

Binding Proteins ◽

Gene Promoter ◽

Dna Binding Proteins ◽

Growth Hormone Gene ◽

Single Stranded Dna ◽

Repressor Element

Download Full-text

Presynapsis and Synapsis of DNA Promoted by the STPα and Single-stranded DNA-binding Proteins from Saccharomyces cerevisiae

Journal of Biological Chemistry ◽

10.1016/s0021-9258(18)51633-1 ◽

1989 ◽

Vol 264 (22) ◽

pp. 13336-13342

Author(s):

R K Hamatake ◽

C C Dykstra ◽

A Sugino

Keyword(s):

Saccharomyces Cerevisiae ◽

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Single Stranded Dna

Download Full-text

Mitochondrial Single-Stranded DNA-Binding Proteins: in Search for New Functions

Biological Chemistry ◽

10.1515/bc.2001.025 ◽

2001 ◽

Vol 382 (2) ◽

Cited By ~ 7

Author(s):

L'ubomír Tomáka ◽

Jozef Nosek ◽

Blanka Kucejová

Keyword(s):

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Single Stranded Dna

Download Full-text

Identification of single-stranded-DNA-binding proteins that interact with muscle gene elements

Molecular and Cellular Biology ◽

10.1128/mcb.11.4.1944-1953.1991 ◽

1991 ◽

Vol 11 (4) ◽

pp. 1944-1953

Author(s):

I M Santoro ◽

T M Yi ◽

K Walsh

Keyword(s):

Dna Binding ◽

Binding Proteins ◽

Regulatory Element ◽

Dna Binding Proteins ◽

Binding Activity ◽

Sequence Motif ◽

Mobility Shift ◽

Specific Sequence ◽

Single Stranded Dna ◽

Muscle Gene

A sequence-specific DNA-binding protein from skeletal-muscle extracts that binds to probes of three muscle gene DNA elements is identified. This protein, referred to as muscle factor 3, forms the predominant nucleoprotein complex with the MCAT gene sequence motif in an electrophoretic mobility shift assay. This protein also binds to the skeletal actin muscle regulatory element, which contains the conserved CArG motif, and to a creatine kinase enhancer probe, which contains the E-box motif, a MyoD-binding site. Muscle factor 3 has a potent sequence-specific, single-stranded-DNA-binding activity. The specificity of this interaction was demonstrated by sequence-specific competition and by mutations that diminished or eliminated detectable complex formation. MyoD, a myogenic determination factor that is distinct from muscle factor 3, also bound to single-stranded-DNA probes in a sequence-specific manner, but other transcription factors did not. Multiple copies of the MCAT motif activated the expression of a heterologous promoter, and a mutation that eliminated expression was correlated with diminished factor binding. Muscle factor 3 and MyoD may be members of a class of DNA-binding proteins that modulate gene expression by their abilities to recognize DNA with unusual secondary structure in addition to specific sequence.

Download Full-text