gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence

Aim and Objective: The rapid increase in the amount of protein sequence data available leads to an urgent need for novel computational algorithms to analyze and compare these sequences. This study is undertaken to develop an efficient computational approach for timely encoding protein sequences and extracting the hidden information. Methods: Based on two physicochemical properties of amino acids, a protein primary sequence was converted into a three-letter sequence, and then a graph without loops and multiple edges and its geometric line adjacency matrix were obtained. A generalized PseAAC (pseudo amino acid composition) model was thus constructed to characterize a protein sequence numerically. Results: By using the proposed mathematical descriptor of a protein sequence, similarity comparisons among β-globin proteins of 17 species and 72 spike proteins of coronaviruses were made, respectively. The resulting clusters agreed well with the established taxonomic groups. In addition, a generalized PseAAC based SVM (support vector machine) model was developed to identify DNA-binding proteins. Experiment results showed that our method performed better than DNAbinder, DNA-Prot, iDNA-Prot and enDNA-Prot by 3.29-10.44% in terms of ACC, 0.056-0.206 in terms of MCC, and 1.45-15.76% in terms of F1M. When the benchmark dataset was expanded with negative samples, the presented approach outperformed the four previous methods with improvement in the range of 2.49-19.12% in terms of ACC, 0.05-0.32 in terms of MCC, and 3.82- 33.85% in terms of F1M. Conclusion: These results suggested that the generalized PseAAC model was very efficient for comparison and analysis of protein sequences, and very competitive in identifying DNA-binding proteins.

Download Full-text

RF‐SVM : Identification of DNA ‐binding proteins based on comprehensive feature representation methods and support vector machine

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.26229 ◽

2021 ◽

Author(s):

Yanping Zhang ◽

Jianwei Ni ◽

Ya Gao

Keyword(s):

Support Vector Machine ◽

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Feature Representation ◽

Support Vector

Download Full-text

SUPPORT VECTOR MACHINE CLASSIFICATION OF PHYSICAL AND BIOLOGICAL DATASETS

International Journal of Modern Physics C ◽

10.1142/s0129183103004759 ◽

2003 ◽

Vol 14 (05) ◽

pp. 575-585 ◽

Cited By ~ 39

Author(s):

CONG-ZHONG CAI ◽

WAN-LU WANG ◽

YU-ZONG CHEN

Keyword(s):

Support Vector Machine ◽

Dna Binding ◽

Protein Interactions ◽

Binding Proteins ◽

Nearest Neighbor ◽

Dna Binding Proteins ◽

Support Vector ◽

Testing Accuracy ◽

Better Than

The support vector machine (SVM) is used in the classification of sonar signals and DNA-binding proteins. Our study on the classification of sonar signals shows that SVM produces a result better than that obtained from other classification methods, which is consistent from the findings of other studies. The testing accuracy of classification is 95.19% as compared with that of 90.4% from multilayered neural network and that of 82.7% from nearest neighbor classifier. From our results on the classification of DNA-binding proteins, one finds that SVM gives a testing accuracy of 82.32%, which is slightly better than that obtained from an earlier study of SVM classification of protein–protein interactions. Hence, our study indicates the usefulness of SVM in the identification of DNA-binding proteins. Further improvements in SVM algorithm and parameters are suggested.

Download Full-text

grDNA-Prot: The Prediction of DNA-Binding Proteins Based on Physicochemical Properties of Amino Acids and Support Vector Machine

Hans Journal of Computational Biology ◽

10.12677/hjcb.2021.111001 ◽

2021 ◽

Vol 11 (01) ◽

pp. 1-11

Author(s):

艳萍张

Keyword(s):

Amino Acids ◽

Support Vector Machine ◽

Dna Binding ◽

Physicochemical Properties ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Support Vector

Download Full-text

Identification of DNA-Binding Proteins by Multiple Kernel Support Vector Machine and Sequence Information

Current Proteomics ◽

10.2174/1570164616666190417100509 ◽

2020 ◽

Vol 17 (4) ◽

pp. 302-310

Author(s):

Yijie Ding ◽

Feng Chen ◽

Xiaoyi Guo ◽

Jijun Tang ◽

Hongjie Wu

Keyword(s):

Support Vector Machine ◽

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Computational Method ◽

Support Vector ◽

Sequence Information ◽

Data Sets ◽

Multiple Kernel ◽

Kernel Support Vector Machine

Background: The DNA-binding proteins is an important process in multiple biomolecular functions. However, the tradition experimental methods for DNA-binding proteins identification are still time consuming and extremely expensive. Objective: In past several years, various computational methods have been developed to detect DNAbinding proteins. However, most of them do not integrate multiple information. Methods: In this study, we propose a novel computational method to predict DNA-binding proteins by two steps Multiple Kernel Support Vector Machine (MK-SVM) and sequence information. Firstly, we extract several feature and construct multiple kernels. Then, multiple kernels are linear combined by Multiple Kernel Learning (MKL). At last, a final SVM model, constructed by combined kernel, is built to predict DNA-binding proteins. Results: The proposed method is tested on two benchmark data sets. Compared with other existing method, our approach is comparable, even better than other methods on some data sets. Conclusion: We can conclude that MK-SVM is more suitable than common SVM, as the classifier for DNA-binding proteins identification.

Download Full-text

Extracting Sequence Features to Predict DNA-Binding Proteins Using Support Vector Machine

2013 International Conference on Computational and Information Sciences ◽

10.1109/iccis.2013.48 ◽

2013 ◽

Cited By ~ 2

Author(s):

Xin Ma ◽

Lefu Hu

Keyword(s):

Support Vector Machine ◽

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Support Vector ◽

Sequence Features

Download Full-text

Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation

BMC Systems Biology ◽

10.1186/1752-0509-9-s1-s10 ◽

2015 ◽

Vol 9 (Suppl 1) ◽

pp. S10 ◽

Cited By ~ 42

Author(s):

Ruifeng Xu ◽

Jiyun Zhou ◽

Hongpeng Wang ◽

Yulan He ◽

Xiaolong Wang ◽

...

Keyword(s):

Support Vector Machine ◽

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Support Vector ◽

Distance Transformation

Download Full-text

Identification of DNA-Binding Proteins Using Support Vector Machine with Sequence Information

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/524502 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 6

Author(s):

Xin Ma ◽

Jiansheng Wu ◽

Xiaoyun Xue

Keyword(s):

Support Vector Machine ◽

Dna Binding ◽

Binding Proteins ◽

Query Protein ◽

Dna Binding Proteins ◽

Evolutionary Information ◽

Support Vector ◽

Sequence Information ◽

Novel Approach ◽

Matthew’S Correlation Coefficient

DNA-binding proteins are fundamentally important in understanding cellular processes. Thus, the identification of DNA-binding proteins has the particularly important practical application in various fields, such as drug design. We have proposed a novel approach method for predicting DNA-binding proteins using only sequence information. The prediction model developed in this study is constructed by support vector machine-sequential minimal optimization (SVM-SMO) algorithm in conjunction with a hybrid feature. The hybrid feature is incorporating evolutionary information feature, physicochemical property feature, and two novel attributes. These two attributes use DNA-binding residues and nonbinding residues in a query protein to obtain DNA-binding propensity and nonbinding propensity. The results demonstrate that our SVM-SMO model achieves 0.67 Matthew's correlation coefficient (MCC) and 89.6% overall accuracy with 88.4% sensitivity and 90.8% specificity, respectively. Performance comparisons on various features indicate that two novel attributes contribute to the performance improvement. In addition, our SVM-SMO model achieves the best performance than state-of-the-art methods on independent test dataset.

Download Full-text