scholarly journals Protein sequence comparison under a new complex representation of amino acids based on their physio-chemical properties

2018 ◽  
Vol 7 (1.8) ◽  
pp. 181
Author(s):  
Jayanta Pal ◽  
Soumen Ghosh ◽  
Bansibadan Maji ◽  
Dilip Kumar Bhattacharya

The paper first considers a new complex representation of amino acids of which the real parts and imaginary parts are taken respectively from hydrophilic properties and residue volumes of amino acids. Then it applies complex Fourier transform on the represented sequence of complex numbers to obtain the spectrum in the frequency domain. By using the method of ‘Inter coefficient distances’ on the spectrum obtained, it constructs phylogenetic trees of different Protein sequences. Finally on the basis of such phylogenetic trees pair wise comparison is made for such Protein sequences. The paper also obtains pair wise comparison of the same protein sequences following the same method but based on a known complex representation of amino acids, where the real and imaginary parts refer to hydrophobicity properties and residue volumes of the amino acids respectively. The results of the two methods are now compared with those of the same sequences obtained earlier by other methods. It is found that both the methods are workable, further the new complex representation is better compared to the earlier one. This shows that the hydrophilic property (polarity) is a better choice than hydrophobic property of amino acids especially in protein sequence comparison.

2018 ◽  
Vol 7 (2) ◽  
pp. 678
Author(s):  
Soumen Ghosh ◽  
Jayanta Pal ◽  
Bansibadan Maji ◽  
Dilip Kumar Bhattacharya

The methods of comparison of protein sequences based on different classified groups of amino acids add a significant contribution to the literature of protein sequence comparison. But the methods vary with choice of different classified groups. Therefore, the purpose of the paper is to develop a unified approach towards the analysis of protein sequence comparison based on classification of amino acids in different groups of different cardinality. The paper considers 4 group classification, 5 group classification and 6 group classifications of amino acids, and in each case it applies the unified method for comparing two types of protein sequences, viz., 9 proteins of ND5 category and 50 Corona virus Spike Proteins. The results agree with those, which were obtained earlier by other methods based on classified groups of amino acids. An-yway it is found that the present unified formula is relatively simpler and fundamentally different from the earlier ones. Further, it can be applied conveniently in comparison of protein sequences based on all different types of classified groups of amino acids.


Author(s):  
Subhram Das ◽  
Soumen Ghosh ◽  
Jayanta Pal ◽  
Dilip K. Bhattacharya

This chapter describes the use of fuzzy set theory and intuitionistic fuzzy set theory in DNA sequence comparison. It also shows an indirect application of fuzzy set theory in comparing protein sequences. In fact, protein sequences consist of 20 amino acids. The chapter shows how such amino acids can be classified in six different groups. These groups are obtained purely from theoretical considerations. These are entirely different from the known groups of amino acids based on biological considerations. Also it is known how these classified groups of amino acids help in protein sequence comparison. The results of comparison differ as the groups differ in number and their compositions. Naturally it is expected that newer results of comparison will come out from such newer classified groups of amino acids obtained theoretically. Thus fuzzy set theory is also useful in protein sequence comparison.


2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Lulu Yu ◽  
Yusen Zhang ◽  
Ivan Gutman ◽  
Yongtang Shi ◽  
Matthias Dehmer

Abstract We develop a novel position-feature-based model for protein sequences by employing physicochemical properties of 20 amino acids and the measure of graph energy. The method puts the emphasis on sequence order information and describes local dynamic distributions of sequences, from which one can get a characteristic B-vector. Afterwards, we apply the relative entropy to the sequences representing B-vectors to measure their similarity/dissimilarity. The numerical results obtained in this study show that the proposed methods leads to meaningful results compared with competitors such as Clustal W.


2021 ◽  
Author(s):  
Jayanta Pal ◽  
Soumen Ghosh ◽  
Bansibadan Maji ◽  
Dilip Kumar Bhattacharya

Abstract Similarity/dissimilarity study of protein and genome sequences remains a challenging task and selection of techniques and descriptors to be adopted, plays an important role in computational biology. Again, genome sequence comparison is always preferred to protein sequence comparison due the presence of 20 amino acids in protein sequence compared to only 4 nucleotides in genome sequence. So it is important to consider suitable representation that is both time and space efficient and also equally applicable to protein sequences of equal and unequal lengths. In the binary form of representation, Fourier transform of a protein sequence reduces to the transformation of 20 simple binary sequences in Fourier domain, where in each such sequence, Perseval’s Identity gives a very simple computable form of power spectrum. This gives rise to readily acceptable forms of moments of different degrees. Again such moments, when properly normalized, show a monotonically descending trend with the increase in the degrees of the moments. So it is better to stick to moments of smaller degrees only. In this paper, descriptors are taken as 20 component vectors, where each component corresponds to a general second order moment of one of the 20 simple binary sequences. Then distance matrices are obtained by using Euclidean distance as the distance measure between each pair of sequence. Phylogenetic trees are obtained from the distance matrices using UPGMA algorithm. In the present paper, the datasets used for similarity/dissimilarity study are 9 ND4, 16 ND5, 9 ND6, 24 TF proteins and 12 Baculovirus proteins. It is found that the phylogenetic trees produced by the present method are at par with those produced by the earlier methods adopted by other authors and also their known biological references. Further it takes less computational time and also it is equally applicable to sequences of equal and unequal lengths.


2005 ◽  
Vol 15 (3) ◽  
pp. 254-260 ◽  
Author(s):  
William R Pearson ◽  
Michael L Sierk

Sign in / Sign up

Export Citation Format

Share Document