FermatS: a novel numerical representation for protein sequence comparison and DNA-binding protein identification

Aim and Objective: Given the rapidly increasing number of molecular biology data available, computational methods of low complexity are necessary to infer protein structure, function, and evolution. Method: In the work, we proposed a novel mthod, FermatS, which based on the global position information and local position representation from the curve and normalized moments of inertia, respectively, to extract features information of protein sequences. Furthermore, we use the generated features by FermatS method to analyze the similarity/dissimilarity of nine ND5 proteins and establish the prediction model of DNA-binding proteins based on logistic regression with 5-fold crossvalidation. Results: In the similarity/dissimilarity analysis of nine ND5 proteins, the results are consistent with evolutionary theory. Moreover, this method can effectively predict the DNA-binding proteins in realistic situations. Conclusion: The findings demonstrate that the proposed method is effective for comparing, recognizing and predicting protein sequences. The main code and datasets can download from https://github.com/GaoYa1122/FermatS..

Download Full-text

DBP-PSSM: Combination of evolutionary profiles with the XGBoost algorithm to improve the identification of DNA-binding proteins

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207323999201124203531 ◽

2020 ◽

Vol 23 ◽

Author(s):

Yanping Zhang ◽

Pengcheng Chen ◽

Ya Gao ◽

Jianwei Ni ◽

Xiaosheng Wang

Keyword(s):

Logistic Regression ◽

Protein Structure ◽

Dna Binding ◽

Molecular Biology ◽

Binding Proteins ◽

Protein Sequences ◽

Low Complexity ◽

Dna Binding Proteins ◽

Position Information ◽

Position Representation

Aim and Objective:: Given the rapidly increasing number of molecular biology data available, computational methods of low complexity are necessary to infer protein structure, function, and evolution. Method:: In the work, we proposed a novel mthod, FermatS, which based on the global position information and local position representation from the curve and normalized moments of inertia, respectively, to extract features information of protein sequences. Furthermore, we use the generated features by FermatS method to analyze the similarity/dissimilarity of nine ND5 proteins and establish the prediction model of DNA-binding proteins based on logistic regression with 5-fold crossvalidation. Results:: In the similarity/dissimilarity analysis of nine ND5 proteins, the results are consistent with evolutionary theory. Moreover, this method can effectively predict the DNA-binding proteins in realistic situations. Conclusion:: The findings demonstrate that the proposed method is effective for comparing, recognizing and predicting protein sequences. The main code and datasets can download from https://github.com/GaoYa1122/FermatS.

Download Full-text

Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207321666180130100838 ◽

2018 ◽

Vol 21 (2) ◽

pp. 100-110 ◽

Cited By ~ 3

Author(s):

Chun Li ◽

Jialing Zhao ◽

Changzhong Wang ◽

Yuhua Yao

Keyword(s):

Dna Binding ◽

Protein Sequence ◽

Protein Identification ◽

Binding Proteins ◽

Graphical Representation ◽

Sequence Data ◽

Protein Sequences ◽

Dna Binding Proteins ◽

Support Vector ◽

Letter Sequence

Aim and Objective: The rapid increase in the amount of protein sequence data available leads to an urgent need for novel computational algorithms to analyze and compare these sequences. This study is undertaken to develop an efficient computational approach for timely encoding protein sequences and extracting the hidden information. Methods: Based on two physicochemical properties of amino acids, a protein primary sequence was converted into a three-letter sequence, and then a graph without loops and multiple edges and its geometric line adjacency matrix were obtained. A generalized PseAAC (pseudo amino acid composition) model was thus constructed to characterize a protein sequence numerically. Results: By using the proposed mathematical descriptor of a protein sequence, similarity comparisons among β-globin proteins of 17 species and 72 spike proteins of coronaviruses were made, respectively. The resulting clusters agreed well with the established taxonomic groups. In addition, a generalized PseAAC based SVM (support vector machine) model was developed to identify DNA-binding proteins. Experiment results showed that our method performed better than DNAbinder, DNA-Prot, iDNA-Prot and enDNA-Prot by 3.29-10.44% in terms of ACC, 0.056-0.206 in terms of MCC, and 1.45-15.76% in terms of F1M. When the benchmark dataset was expanded with negative samples, the presented approach outperformed the four previous methods with improvement in the range of 2.49-19.12% in terms of ACC, 0.05-0.32 in terms of MCC, and 3.82- 33.85% in terms of F1M. Conclusion: These results suggested that the generalized PseAAC model was very efficient for comparison and analysis of protein sequences, and very competitive in identifying DNA-binding proteins.

Download Full-text

enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning

BioMed Research International ◽

10.1155/2014/294279 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 15

Author(s):

Ruifeng Xu ◽

Jiyun Zhou ◽

Bin Liu ◽

Lin Yao ◽

Yulan He ◽

...

Keyword(s):

Dna Binding ◽

Ensemble Learning ◽

Protein Identification ◽

Binding Proteins ◽

Binding Protein ◽

Regulation Of Gene Expression ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Training Dataset ◽

Regulation Of Transcription

DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97–9.52% in ACC and 0.08–0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83–16.63% in terms of ACC and 0.02–0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.

Download Full-text

Collation and analyses of DNA-binding protein domain families from sequence and structural databanks

Molecular BioSystems ◽

10.1039/c4mb00629a ◽

2015 ◽

Vol 11 (4) ◽

pp. 1110-1118 ◽

Cited By ~ 2

Author(s):

Sony Malhotra ◽

Ramanathan Sowdhamini

Keyword(s):

Dna Binding ◽

Binding Proteins ◽

Binding Protein ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Protein Domain ◽

Divergent Evolution ◽

Molecular Function ◽

Molecular Functions

The distribution of GO molecular functions across different SCOP DNA-binding folds was studied. Majority of the folds were observed to perform more than one molecular function. This supports the notion that majority of DNA-binding proteins might follow divergent evolution.

Download Full-text

Limited proteolysis studies on the Escherichia coli single-stranded DNA binding protein. Evidence for a functionally homologous domain in both the Escherichia coli and T4 DNA binding proteins.

Journal of Biological Chemistry ◽

10.1016/s0021-9258(18)32867-9 ◽

1983 ◽

Vol 258 (5) ◽

pp. 3346-3355 ◽

Cited By ~ 18

Author(s):

K R Williams ◽

E K Spicer ◽

M B LoPresti ◽

R A Guggenheimer ◽

J W Chase

Keyword(s):

Escherichia Coli ◽

Dna Binding ◽

Binding Proteins ◽

Binding Protein ◽

Limited Proteolysis ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Single Stranded Dna

Download Full-text

Characterization of single-stranded DNA-binding protein SsbB fromStaphylococcus aureus: SsbB cannot stimulate PriA helicase

RSC Advances ◽

10.1039/c8ra04392b ◽

2018 ◽

Vol 8 (50) ◽

pp. 28367-28375 ◽

Cited By ~ 5

Author(s):

Kuan-Lin Chen ◽

Jen-Hao Cheng ◽

Chih-Yang Lin ◽

Yen-Hua Huang ◽

Cheng-Yang Huang

Keyword(s):

Dna Replication ◽

Dna Binding ◽

Binding Proteins ◽

Binding Protein ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Metabolic Processes ◽

Single Stranded Dna

Single-stranded DNA-binding proteins (SSBs) are essential to cells as they participate in DNA metabolic processes, such as DNA replication, repair, and recombination.

Download Full-text

Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences

BMC Bioinformatics ◽

10.1186/s12859-017-1715-8 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 6

Author(s):

Wei Wang ◽

Lin Sun ◽

Shiguang Zhang ◽

Hongjun Zhang ◽

Jinling Shi ◽

...

Keyword(s):

Dna Binding ◽

Binding Proteins ◽

Protein Sequences ◽

Dna Binding Proteins ◽

Double Stranded Dna

Download Full-text

Computational Methods for Predicting DNA Binding Proteins

Current Proteomics ◽

10.2174/1570164616666190722141129 ◽

2020 ◽

Vol 17 (4) ◽

pp. 258-270

Author(s):

Gaofeng Pan ◽

Jiandong Wang ◽

Liang Zhao ◽

William Hoskins ◽

Jijun Tang

Keyword(s):

Machine Learning ◽

Dna Binding ◽

Computational Methods ◽

Binding Proteins ◽

Binding Protein ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Learning Methods ◽

Machine Learning Methods ◽

Classifier Algorithms

Background: DNA-binding proteins are very important to many biomolecular functions. The traditional experimental methods are expensive and time-consuming, so, computational methods that can predict whether a protein is a DNA-binding protein or not are very helpful to researchers. Machine learning has been widely used in many research areas. Many researchers have proposed machine learning methods for DNA-binding protein prediction, and this paper highlights their advantages and disadvantages. Objective: There are many computational methods that can predict DNA-binding proteins. Every method uses different features and different classifier algorithms. In this paper, a review of these methods is provided to find out some common procedures that can help researchers to develop more accurate methods. Methods: Firstly, the information stored in the protein sequence and gene sequence is presented. That information is the basis to find out the patterns leading to binding. Then, feature extraction methods and classifier algorithms are discussed. At last, some commonly used benchmark datasets are analysed and evaluated by methods. Conclusion: In this review, we analyzed some popular computational methods to predict DNAbinding protein. From those methods, we highlighted many features necessary to build up an accurate DNA-binding protein classifier. This can also help researchers to build up more useful computational tools. Currently, there are some machine learning methods with good performance in predicting DNAbinding proteins. The performance can be improved by using different kinds of features and classifiers.

Download Full-text

Molecular goniometers for single-particle cryo-EM of DNA-binding proteins

10.1101/2020.02.27.968883 ◽

2020 ◽

Cited By ~ 1

Author(s):

Tural Aksel ◽

Zanlin Yu ◽

Yifan Cheng ◽

Shawn M. Douglas

Keyword(s):

Dna Binding ◽

Single Particle ◽

Binding Proteins ◽

Binding Protein ◽

Accurate Determination ◽

Dna Nanotechnology ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Small Proteins ◽

Particle Images

AbstractCorrect reconstruction of macromolecular structure by cryo-electron microscopy relies on accurate determination of the orientation of single-particle images. For small (<100 kDa) DNA-binding proteins, obtaining particle images with sufficiently asymmetric features to correctly guide alignment is challenging. DNA nanotechnology was conceived as a potential tool for building host nanostructures to prescribe the locations and orientations of docked proteins. We used DNA origami to construct molecular goniometers—instruments to precisely orient objects—to dock a DNA-binding protein on a double-helix stage that has user-programmable tilt and rotation angles. Each protein orientation maps to a distinct barcode pattern specifying particle classification and angle assignment. We used goniometers to obtain a 6.5 Å structure of BurrH, an 82-kDa DNA-binding protein whose helical pseudosymmetry prevents accurate image orientation using classical cryo-EM. Our approach should be adaptable for other DNA-binding proteins, and a wide variety of other small proteins, by fusing DNA binding domains to them.

Download Full-text

The Recognition Method for the Supersecondary Structure of DNA-Binding Protein

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.602-605.1614 ◽

2014 ◽

Vol 602-605 ◽

pp. 1614-1617

Author(s):

Ming Hai Yao ◽

Na Wang

Keyword(s):

Secondary Structure ◽

Dna Binding ◽

Binding Proteins ◽

Gene Expression Regulation ◽

Binding Protein ◽

Transition Probability ◽

Dna Binding Protein ◽

Dna Binding Proteins ◽

Recognition Method ◽

Homologous Proteins

The structure of DNA binding proteins is identified that has great significance for the study of gene expression regulation mechanism.The new recognition method is proposed to identify the super-secondary structure and structure domain of DNA-binding protein in this paper. The nucleotide transition probability is calculated by the known DNA-binding protein binding locus sequence. Using mouse data which downloaded from the TRANSFAC establish the binding protein super-secondary structure recognition models. The probability score is calculated by the transition probability of the binding site and the background. This method differs from the conventional method, It is neither the amino acid sequence of the protein, nor the use of homologous proteins. In order to verify the validity of the algorithm, 10 DNA-binding proteins of drosophila and yeast are used to do the experiment. The experimental results show that our method has very good recognition result.

Download Full-text