A 2-D graphical representation of protein sequences based on nucleotide triplet codons

2005 ◽  
Vol 413 (4-6) ◽  
pp. 458-462 ◽  
Author(s):  
Fenglan Bai ◽  
Tianming Wang
2004 ◽  
Vol 397 (1-3) ◽  
pp. 247-252 ◽  
Author(s):  
Milan Randić ◽  
Jure Zupan ◽  
Alexandru T. Balaban

2014 ◽  
Vol 10 ◽  
pp. EBO.S14713 ◽  
Author(s):  
Yuhua Yao ◽  
Shoujiang Yan ◽  
Huimin Xu ◽  
Jianning Han ◽  
Xuying Nan ◽  
...  

2019 ◽  
Vol 2019 ◽  
pp. 1-10 ◽  
Author(s):  
Mervat M. Abo-Elkhier ◽  
Marwa A. Abd Elwahaab ◽  
Moheb I. Abo El Maaty

The comparison of protein sequences according to similarity is a fundamental aspect of today’s biomedical research. With the developments of sequencing technologies, a large number of protein sequences increase exponentially in the public databases. Famous sequences’ comparison methods are alignment based. They generally give excellent results when the sequences under study are closely related and they are time consuming. Herein, a new alignment-free method is introduced. Our technique depends on a new graphical representation and descriptor. The graphical representation of protein sequence is a simple way to visualize protein sequences. The descriptor compresses the primary sequence into a single vector composed of only two values. Our approach gives good results with both short and long sequences within a little computation time. It is applied on nine beta globin, nine ND5 (NADH dehydrogenase subunit 5), and 24 spike protein sequences. Correlation and significance analyses are also introduced to compare our similarity/dissimilarity results with others’ approaches, results, and sequence homology.


2018 ◽  
Vol 21 (2) ◽  
pp. 100-110 ◽  
Author(s):  
Chun Li ◽  
Jialing Zhao ◽  
Changzhong Wang ◽  
Yuhua Yao

Aim and Objective: The rapid increase in the amount of protein sequence data available leads to an urgent need for novel computational algorithms to analyze and compare these sequences. This study is undertaken to develop an efficient computational approach for timely encoding protein sequences and extracting the hidden information. Methods: Based on two physicochemical properties of amino acids, a protein primary sequence was converted into a three-letter sequence, and then a graph without loops and multiple edges and its geometric line adjacency matrix were obtained. A generalized PseAAC (pseudo amino acid composition) model was thus constructed to characterize a protein sequence numerically. Results: By using the proposed mathematical descriptor of a protein sequence, similarity comparisons among β-globin proteins of 17 species and 72 spike proteins of coronaviruses were made, respectively. The resulting clusters agreed well with the established taxonomic groups. In addition, a generalized PseAAC based SVM (support vector machine) model was developed to identify DNA-binding proteins. Experiment results showed that our method performed better than DNAbinder, DNA-Prot, iDNA-Prot and enDNA-Prot by 3.29-10.44% in terms of ACC, 0.056-0.206 in terms of MCC, and 1.45-15.76% in terms of F1M. When the benchmark dataset was expanded with negative samples, the presented approach outperformed the four previous methods with improvement in the range of 2.49-19.12% in terms of ACC, 0.05-0.32 in terms of MCC, and 3.82- 33.85% in terms of F1M. Conclusion: These results suggested that the generalized PseAAC model was very efficient for comparison and analysis of protein sequences, and very competitive in identifying DNA-binding proteins.


2010 ◽  
Vol 31 (11) ◽  
pp. 2136-2142 ◽  
Author(s):  
Ping-An He ◽  
Yan-Ping Zhang ◽  
Yu-Hua Yao ◽  
Yi-Fa Tang ◽  
Xu-Ying Nan

2014 ◽  
Vol 2014 ◽  
pp. 1-15 ◽  
Author(s):  
Lei Wang ◽  
Hui Peng ◽  
Jinhua Zheng

To facilitate the intuitional analysis of protein sequences, a novel graphical representation of protein sequences called ADLD (Alignment Diagonal Line Diagram) is introduced in this paper first, and then a new ADLD based method is proposed and utilized to analyze the similarity/dissimilarity of protein sequences. Comparing with existing methods, our ADLD based method is proved to be effective in the similarity/dissimilarity analysis of protein sequences and have the merits of good intuition, visuality, and simplicity. The examinations of the similarities/dissimilarities for both the 16 different ND5 proteins and the 29 different spike proteins illustrate the utility of our ADLD based approach.


Sign in / Sign up

Export Citation Format

Share Document