gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence

2016 ◽  
Vol 406 ◽  
pp. 8-16 ◽  
Author(s):  
Yan-ping Zhang ◽  
Wuyunqiqige ◽  
Wei Zheng ◽  
Shuyi Liu ◽  
Chunguang Zhao
2018 ◽  
Vol 21 (2) ◽  
pp. 100-110 ◽  
Author(s):  
Chun Li ◽  
Jialing Zhao ◽  
Changzhong Wang ◽  
Yuhua Yao

Aim and Objective: The rapid increase in the amount of protein sequence data available leads to an urgent need for novel computational algorithms to analyze and compare these sequences. This study is undertaken to develop an efficient computational approach for timely encoding protein sequences and extracting the hidden information. Methods: Based on two physicochemical properties of amino acids, a protein primary sequence was converted into a three-letter sequence, and then a graph without loops and multiple edges and its geometric line adjacency matrix were obtained. A generalized PseAAC (pseudo amino acid composition) model was thus constructed to characterize a protein sequence numerically. Results: By using the proposed mathematical descriptor of a protein sequence, similarity comparisons among β-globin proteins of 17 species and 72 spike proteins of coronaviruses were made, respectively. The resulting clusters agreed well with the established taxonomic groups. In addition, a generalized PseAAC based SVM (support vector machine) model was developed to identify DNA-binding proteins. Experiment results showed that our method performed better than DNAbinder, DNA-Prot, iDNA-Prot and enDNA-Prot by 3.29-10.44% in terms of ACC, 0.056-0.206 in terms of MCC, and 1.45-15.76% in terms of F1M. When the benchmark dataset was expanded with negative samples, the presented approach outperformed the four previous methods with improvement in the range of 2.49-19.12% in terms of ACC, 0.05-0.32 in terms of MCC, and 3.82- 33.85% in terms of F1M. Conclusion: These results suggested that the generalized PseAAC model was very efficient for comparison and analysis of protein sequences, and very competitive in identifying DNA-binding proteins.


2003 ◽  
Vol 14 (05) ◽  
pp. 575-585 ◽  
Author(s):  
CONG-ZHONG CAI ◽  
WAN-LU WANG ◽  
YU-ZONG CHEN

The support vector machine (SVM) is used in the classification of sonar signals and DNA-binding proteins. Our study on the classification of sonar signals shows that SVM produces a result better than that obtained from other classification methods, which is consistent from the findings of other studies. The testing accuracy of classification is 95.19% as compared with that of 90.4% from multilayered neural network and that of 82.7% from nearest neighbor classifier. From our results on the classification of DNA-binding proteins, one finds that SVM gives a testing accuracy of 82.32%, which is slightly better than that obtained from an earlier study of SVM classification of protein–protein interactions. Hence, our study indicates the usefulness of SVM in the identification of DNA-binding proteins. Further improvements in SVM algorithm and parameters are suggested.


2020 ◽  
Vol 17 (4) ◽  
pp. 302-310
Author(s):  
Yijie Ding ◽  
Feng Chen ◽  
Xiaoyi Guo ◽  
Jijun Tang ◽  
Hongjie Wu

Background: The DNA-binding proteins is an important process in multiple biomolecular functions. However, the tradition experimental methods for DNA-binding proteins identification are still time consuming and extremely expensive. Objective: In past several years, various computational methods have been developed to detect DNAbinding proteins. However, most of them do not integrate multiple information. Methods: In this study, we propose a novel computational method to predict DNA-binding proteins by two steps Multiple Kernel Support Vector Machine (MK-SVM) and sequence information. Firstly, we extract several feature and construct multiple kernels. Then, multiple kernels are linear combined by Multiple Kernel Learning (MKL). At last, a final SVM model, constructed by combined kernel, is built to predict DNA-binding proteins. Results: The proposed method is tested on two benchmark data sets. Compared with other existing method, our approach is comparable, even better than other methods on some data sets. Conclusion: We can conclude that MK-SVM is more suitable than common SVM, as the classifier for DNA-binding proteins identification.


2015 ◽  
Vol 9 (Suppl 1) ◽  
pp. S10 ◽  
Author(s):  
Ruifeng Xu ◽  
Jiyun Zhou ◽  
Hongpeng Wang ◽  
Yulan He ◽  
Xiaolong Wang ◽  
...  

2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Xin Ma ◽  
Jiansheng Wu ◽  
Xiaoyun Xue

DNA-binding proteins are fundamentally important in understanding cellular processes. Thus, the identification of DNA-binding proteins has the particularly important practical application in various fields, such as drug design. We have proposed a novel approach method for predicting DNA-binding proteins using only sequence information. The prediction model developed in this study is constructed by support vector machine-sequential minimal optimization (SVM-SMO) algorithm in conjunction with a hybrid feature. The hybrid feature is incorporating evolutionary information feature, physicochemical property feature, and two novel attributes. These two attributes use DNA-binding residues and nonbinding residues in a query protein to obtain DNA-binding propensity and nonbinding propensity. The results demonstrate that our SVM-SMO model achieves 0.67 Matthew's correlation coefficient (MCC) and 89.6% overall accuracy with 88.4% sensitivity and 90.8% specificity, respectively. Performance comparisons on various features indicate that two novel attributes contribute to the performance improvement. In addition, our SVM-SMO model achieves the best performance than state-of-the-art methods on independent test dataset.


Sign in / Sign up

Export Citation Format

Share Document