scholarly journals PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction

Molecules ◽  
2019 ◽  
Vol 25 (1) ◽  
pp. 98 ◽  
Author(s):  
Changgeng Tan ◽  
Tong Wang ◽  
Wenyi Yang ◽  
Lei Deng

Interactions between proteins and DNAs play essential roles in many biological processes. DNA binding proteins can be classified into two categories. Double-stranded DNA-binding proteins (DSBs) bind to double-stranded DNA and are involved in a series of cell functions such as gene expression and regulation. Single-stranded DNA-binding proteins (SSBs) are necessary for DNA replication, recombination, and repair and are responsible for binding to the single-stranded DNA. Therefore, the effective classification of DNA-binding proteins is helpful for functional annotations of proteins. In this work, we propose PredPSD, a computational method based on sequence information that accurately predicts SSBs and DSBs. It introduces three novel feature extraction algorithms. In particular, we use the autocross-covariance (ACC) transformation to transform feature matrices into fixed-length vectors. Then, we put the optimal feature subset obtained by the minimal-redundancy-maximal-relevance criterion (mRMR) feature selection algorithm into the gradient tree boosting (GTB). In 10-fold cross-validation based on a benchmark dataset, PredPSD achieves promising performances with an AUC score of 0.956 and an accuracy of 0.912, which are better than those of existing methods. Moreover, our method has significantly improved the prediction accuracy in independent testing. The experimental results show that PredPSD can significantly recognize the binding specificity and differentiate DSBs and SSBs.

Biochemistry ◽  
2011 ◽  
Vol 50 (6) ◽  
pp. 932-944 ◽  
Author(s):  
Emmanuelle Delagoutte ◽  
Amélie Heneman-Masurel ◽  
Giuseppe Baldacci

2020 ◽  
Vol 17 (4) ◽  
pp. 302-310
Author(s):  
Yijie Ding ◽  
Feng Chen ◽  
Xiaoyi Guo ◽  
Jijun Tang ◽  
Hongjie Wu

Background: The DNA-binding proteins is an important process in multiple biomolecular functions. However, the tradition experimental methods for DNA-binding proteins identification are still time consuming and extremely expensive. Objective: In past several years, various computational methods have been developed to detect DNAbinding proteins. However, most of them do not integrate multiple information. Methods: In this study, we propose a novel computational method to predict DNA-binding proteins by two steps Multiple Kernel Support Vector Machine (MK-SVM) and sequence information. Firstly, we extract several feature and construct multiple kernels. Then, multiple kernels are linear combined by Multiple Kernel Learning (MKL). At last, a final SVM model, constructed by combined kernel, is built to predict DNA-binding proteins. Results: The proposed method is tested on two benchmark data sets. Compared with other existing method, our approach is comparable, even better than other methods on some data sets. Conclusion: We can conclude that MK-SVM is more suitable than common SVM, as the classifier for DNA-binding proteins identification.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Jun Wang ◽  
Huiwen Zheng ◽  
Yang Yang ◽  
Wanyue Xiao ◽  
Taigang Liu

DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone.


2001 ◽  
Vol 382 (2) ◽  
Author(s):  
L'ubomír Tomáka ◽  
Jozef Nosek ◽  
Blanka Kucejová

1991 ◽  
Vol 11 (4) ◽  
pp. 1944-1953
Author(s):  
I M Santoro ◽  
T M Yi ◽  
K Walsh

A sequence-specific DNA-binding protein from skeletal-muscle extracts that binds to probes of three muscle gene DNA elements is identified. This protein, referred to as muscle factor 3, forms the predominant nucleoprotein complex with the MCAT gene sequence motif in an electrophoretic mobility shift assay. This protein also binds to the skeletal actin muscle regulatory element, which contains the conserved CArG motif, and to a creatine kinase enhancer probe, which contains the E-box motif, a MyoD-binding site. Muscle factor 3 has a potent sequence-specific, single-stranded-DNA-binding activity. The specificity of this interaction was demonstrated by sequence-specific competition and by mutations that diminished or eliminated detectable complex formation. MyoD, a myogenic determination factor that is distinct from muscle factor 3, also bound to single-stranded-DNA probes in a sequence-specific manner, but other transcription factors did not. Multiple copies of the MCAT motif activated the expression of a heterologous promoter, and a mutation that eliminated expression was correlated with diminished factor binding. Muscle factor 3 and MyoD may be members of a class of DNA-binding proteins that modulate gene expression by their abilities to recognize DNA with unusual secondary structure in addition to specific sequence.


1991 ◽  
Vol 11 (4) ◽  
pp. 1944-1953 ◽  
Author(s):  
I M Santoro ◽  
T M Yi ◽  
K Walsh

A sequence-specific DNA-binding protein from skeletal-muscle extracts that binds to probes of three muscle gene DNA elements is identified. This protein, referred to as muscle factor 3, forms the predominant nucleoprotein complex with the MCAT gene sequence motif in an electrophoretic mobility shift assay. This protein also binds to the skeletal actin muscle regulatory element, which contains the conserved CArG motif, and to a creatine kinase enhancer probe, which contains the E-box motif, a MyoD-binding site. Muscle factor 3 has a potent sequence-specific, single-stranded-DNA-binding activity. The specificity of this interaction was demonstrated by sequence-specific competition and by mutations that diminished or eliminated detectable complex formation. MyoD, a myogenic determination factor that is distinct from muscle factor 3, also bound to single-stranded-DNA probes in a sequence-specific manner, but other transcription factors did not. Multiple copies of the MCAT motif activated the expression of a heterologous promoter, and a mutation that eliminated expression was correlated with diminished factor binding. Muscle factor 3 and MyoD may be members of a class of DNA-binding proteins that modulate gene expression by their abilities to recognize DNA with unusual secondary structure in addition to specific sequence.


Sign in / Sign up

Export Citation Format

Share Document