Establishing the minimality of phylogenetic trees from protein sequences

Author(s):  
M. D. Hendy

2007 ◽  
Vol 57 (10) ◽  
pp. 2289-2295 ◽  
Author(s):  
Madalin Enache ◽  
Takashi Itoh ◽  
Tadamasa Fukushima ◽  
Ron Usami ◽  
Lucia Dumitru ◽  
...  

In order to clarify the current phylogeny of the haloarchaea, particularly the closely related genera that have been difficult to sort out using 16S rRNA gene sequences, the DNA-dependent RNA polymerase subunit B′ gene (rpoB′) was used as a complementary molecular marker. Partial sequences of the gene were determined from 16 strains of the family Halobacteriaceae. Comparisons of phylogenetic trees inferred from the gene and protein sequences as well as from corresponding 16S rRNA gene sequences suggested that species of the genera Natrialba, Natronococcus, Halobiforma, Natronobacterium, Natronorubrum, Natrinema/Haloterrigena and Natronolimnobius formed a monophyletic group in all trees. In the RpoB′ protein tree, the alkaliphilic species Natrialba chahannaoensis, Natrialba hulunbeirensis and Natrialba magadii formed a tight group, while the neutrophilic species Natrialba asiatica formed a separate group with species of the genera Natronorubrum and Natronolimnobius. Species of the genus Natronorubrum were split into two groups in both the rpoB′ gene and protein trees. The most important advantage of the use of the rpoB′ gene over the 16S rRNA gene is that sequences of the former are highly conserved amongst species of the family Halobacteriaceae. All sequences determined so far can be aligned unambiguously without any gaps. On the other hand, gaps are necessary at 49 positions in the inner part of the alignment of 16S rRNA gene sequences. The rpoB′ gene and protein sequences can be used as an excellent alternative molecular marker in phylogenetic analysis of the Halobacteriaceae.



1980 ◽  
Vol 187 (1) ◽  
pp. 65-74 ◽  
Author(s):  
D Penny ◽  
M D Hendy ◽  
L R Foulds

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.



Author(s):  
Mochammad Rajasa Mukti Negara ◽  
Ita Krissanti ◽  
Gita Widya Pradini

BACKGROUND Nucleocapsid (N) protein is one of four structural proteins of SARS-CoV-2  which is known to be more conserved than spike protein and is highly immunogenic. This study aimed to analyze the variation of the SARS-CoV-2 N protein sequences in ASEAN countries, including Indonesia. METHODS Complete sequences of SARS-CoV-2 N protein from each ASEAN country were obtained from Global Initiative on Sharing All Influenza Data (GISAID), while the reference sequence was obtained from GenBank. All sequences collected from December 2019 to March 2021 were grouped to the clade according to GISAID, and two representative isolates were chosen from each clade for the analysis. The sequences were aligned by MUSCLE, and phylogenetic trees were built using MEGA-X software based on the nucleotide and translated AA sequences. RESULTS 98 isolates of complete N protein genes from ASEAN countries were analyzed. The nucleotides of all isolates were 97.5% conserved. Of 31 nucleotide changes, 22 led to amino acid (AA) substitutions; thus, the AA sequences were 94.5% conserved. The phylogenetic tree of nucleotide and AA sequences shows similar branches. Nucleotide variations in clade O (C28311T); clade GR (28881–28883 GGG>AAC); and clade GRY (28881–28883 GGG>AAC and C28977T) lead to specific branches corresponding to the clade within both trees. CONCLUSIONS The N protein sequences of SARS-CoV-2 across ASEAN countries are highly conserved. Most isolates were closely related to the reference sequence originating from China, except the isolates representing clade O, GR, and GRY which formed specific branches in the phylogenetic tree.



Genes ◽  
2019 ◽  
Vol 10 (7) ◽  
pp. 490 ◽  
Author(s):  
Sharma ◽  
Gupta

The class Hematozoa encompasses several clinically important genera, including Plasmodium, whose members cause the major life-threating disease malaria. Hence, a good understanding of the interrelationships of organisms from this class and reliable means for distinguishing them are of much importance. This study reports comprehensive phylogenetic and comparative analyses on protein sequences on the genomes of 28 hematozoa species to understand their interrelationships. In addition to phylogenetic trees based on two large datasets of protein sequences, detailed comparative analyses were carried out on the genomes of hematozoa species to identify novel molecular synapomorphies consisting of conserved signature indels (CSIs) in protein sequences. These studies have identified 79 CSIs that are exclusively present in specific groups of Hematozoa/Plasmodium species, also supported by phylogenetic analysis, providing reliable means for the identification of these species groups and understanding their interrelationships. Of these CSIs, six CSIs are specifically shared by all hematozoa species, two CSIs serve to distinguish members of the order Piroplasmida, five CSIs are uniquely found in all Piroplasmida species except B. microti and two CSIs are specific for the genus Theileria. Additionally, we also describe 23 CSIs that are exclusively present in all genome-sequenced Plasmodium species and two, nine, ten and eight CSIs which are specific for members of the Plasmodium subgenera Haemamoeba, Laverania, Vinckeia and Plasmodium (excluding P. ovale and P. malariae), respectively. Additionally, our work has identified several CSIs that support species relationships which are not evident from phylogenetic analysis. Of these CSIs, one CSI supports the ancestral nature of the avian-Plasmodium species in comparison to the mammalian-infecting groups of Plasmodium species, four CSIs strongly support a specific relationship of species between the subgenera Plasmodium and Vinckeia and three CSIs each that reliably group P. malariae with members of the subgenus Plasmodium and P. ovale within the subgenus Vinckeia, respectively. These results provide a reliable framework for understanding the evolutionary relationships among the Plasmodium/Piroplasmida species. Further, in view of the exclusivity of the described molecular markers for the indicated groups of hematozoa species, particularly large numbers of unique characteristics that are specific for all Plasmodium species, they provide important molecular tools for biochemical/genetic studies and for developing novel diagnostics and therapeutics for these organisms.



2012 ◽  
Vol 466-467 ◽  
pp. 27-30
Author(s):  
Kun Luo ◽  
Dong Hui Luo

Inositol 1,3,4-trisphosphate 5/6 kinase (ITPK1) is a pivotal enzyme in producing IP6 , a moleculae that play an essential role in many biochemistry process in mammal cells. In this paper, two phylogenetic trees are constructed based on the mRNA sequences and the protein sequences, respectively. The results indicate that the protein sequences are more conserved than mRNA sequences in primates. Although both plant and animal have an abundant distribution of ITPK1 domain, there exists a great variation in protein sequence between plant and animal. The protein-based tree reflects an evolution orders that is consistent with that of organisms evolution. Z-test of selection indicates that evolution of protein ITPK1 is caused by selection pressure.



2015 ◽  
Vol 65 (Pt_3) ◽  
pp. 1050-1069 ◽  
Author(s):  
Radhey S. Gupta ◽  
Sohail Naushad ◽  
Sheridan Baker

The Halobacteria constitute one of the largest groups within the Archaea . The hierarchical relationship among members of this large class, which comprises a single order and a single family, has proven difficult to determine based upon 16S rRNA gene trees and morphological and physiological characteristics. This work reports detailed phylogenetic and comparative genomic studies on >100 halobacterial (haloarchaeal) genomes containing representatives from 30 genera to investigate their evolutionary relationships. In phylogenetic trees reconstructed on the basis of 32 conserved proteins, using both neighbour-joining and maximum-likelihood methods, two major clades (clades A and B) encompassing nearly two-thirds of the sequenced haloarchaeal species were strongly supported. Clades grouping the same species/genera were also supported by the 16S rRNA gene trees and trees for several individual highly conserved proteins (RpoC, EF-Tu, UvrD, GyrA, EF-2/EF-G). In parallel, our comparative analyses of protein sequences from haloarchaeal genomes have identified numerous discrete molecular markers in the form of conserved signature indels (CSI) in protein sequences and conserved signature proteins (CSPs) that are found uniquely in specific groups of haloarchaea. Thirteen CSIs in proteins involved in diverse functions and 68 CSPs that are uniquely present in all or most genome-sequenced haloarchaea provide novel molecular means for distinguishing members of the class Halobacteria from all other prokaryotes. The members of clade A are distinguished from all other haloarchaea by the unique shared presence of two CSIs in the ribose operon protein and small GTP-binding protein and eight CSPs that are found specifically in members of this clade. Likewise, four CSIs in different proteins and five other CSPs are present uniquely in members of clade B and distinguish them from all other haloarchaea. Based upon their specific clustering in phylogenetic trees for different gene/protein sequences and the unique shared presence of large numbers of molecular signatures, members of clades A and B are indicated to be distinct from all other haloarchaea because of their uniquely shared evolutionary histories. Based upon these results, it is proposed that clades A and B be recognized as two new orders, Natrialbales ord. nov. and Haloferacales ord. nov., within the class Halobacteria , containing the novel families Natrialbaceae fam. nov. and Haloferacaceae fam. nov. Other members of the class Halobacteria that are not members of these two orders will remain part of the emended order Halobacteriales in an emended family Halobacteriaceae .



2018 ◽  
Vol 7 (1.8) ◽  
pp. 181
Author(s):  
Jayanta Pal ◽  
Soumen Ghosh ◽  
Bansibadan Maji ◽  
Dilip Kumar Bhattacharya

The paper first considers a new complex representation of amino acids of which the real parts and imaginary parts are taken respectively from hydrophilic properties and residue volumes of amino acids. Then it applies complex Fourier transform on the represented sequence of complex numbers to obtain the spectrum in the frequency domain. By using the method of ‘Inter coefficient distances’ on the spectrum obtained, it constructs phylogenetic trees of different Protein sequences. Finally on the basis of such phylogenetic trees pair wise comparison is made for such Protein sequences. The paper also obtains pair wise comparison of the same protein sequences following the same method but based on a known complex representation of amino acids, where the real and imaginary parts refer to hydrophobicity properties and residue volumes of the amino acids respectively. The results of the two methods are now compared with those of the same sequences obtained earlier by other methods. It is found that both the methods are workable, further the new complex representation is better compared to the earlier one. This shows that the hydrophilic property (polarity) is a better choice than hydrophobic property of amino acids especially in protein sequence comparison.



2004 ◽  
Vol 78 (14) ◽  
pp. 7748-7762 ◽  
Author(s):  
Frederic Bibollet-Ruche ◽  
Elizabeth Bailes ◽  
Feng Gao ◽  
Xavier Pourrut ◽  
Katrina L. Barlow ◽  
...  

ABSTRACT Nearly complete sequences of simian immunodeficiency viruses (SIVs) infecting 18 different nonhuman primate species in sub-Saharan Africa have now been reported; yet, our understanding of the origins, evolutionary history, and geographic distribution of these viruses still remains fragmentary. Here, we report the molecular characterization of a lentivirus (SIVdeb) naturally infecting De Brazza's monkeys (Cercopithecus neglectus). Complete SIVdeb genomes (9,158 and 9,227 bp in length) were amplified from uncultured blood mononuclear cell DNA of two wild-caught De Brazza's monkeys from Cameroon. In addition, partial pol sequences (650 bp) were amplified from four offspring of De Brazza's monkeys originally caught in the wild in Uganda. Full-length (9,068 bp) and partial pol (650 bp) SIVsyk sequences were also amplified from Sykes's monkeys (Cercopithecus albogularis) from Kenya. Analysis of these sequences identified a new SIV clade (SIVdeb), which differed from previously characterized SIVs at 40 to 50% of sites in Pol protein sequences. The viruses most closely related to SIVdeb were SIVsyk and members of the SIVgsn/SIVmus/SIVmon group of viruses infecting greater spot-nosed monkeys (Cercopithecus nictitans), mustached monkeys (Cercopithecus cephus), and mona monkeys (Cercopithecus mona), respectively. In phylogenetic trees of concatenated protein sequences, SIVdeb, SIVsyk, and SIVgsn/SIVmus/SIVmon clustered together, and this relationship was highly significant in all major coding regions. Members of this virus group also shared the same number of cysteine residues in their extracellular envelope glycoprotein and a high-affinity AIP1 binding site (YPD/SL) in their p6 Gag protein, as well as a unique transactivation response element in their viral long terminal repeat; however, SIVdeb and SIVsyk, unlike SIVgsn, SIVmon, and SIVmus, did not encode a vpu gene. These data indicate that De Brazza's monkeys are naturally infected with SIVdeb, that this infection is prevalent in different areas of the species' habitat, and that geographically diverse SIVdeb strains cluster in a single virus group. The consistent clustering of SIVdeb with SIVsyk and the SIVmon/SIVmus/SIVgsn group also suggests that these viruses have evolved from a common ancestor that likely infected a Cercopithecus host in the distant past. The vpu gene appears to have been acquired by a subset of these Cercopithecus viruses after the divergence of SIVdeb and SIVsyk.



2020 ◽  
Author(s):  
Francesco Ballesio ◽  
Ali Haider Bangash ◽  
Didier Barradas-Bautista ◽  
Justin Barton ◽  
Andrea Guarracino ◽  
...  

The pandemicity & the ability of the SARS-COV-2 to reinfect a cured subject, among other damaging characteristics of it, took everybody by surprise. A global collaborative scientific effort was direly required to bring learned people from different niches of medicine & data science together. Such a platform was provided by COVID19 Virtual BioHackathon, organized from the 5th to the 11th of April, 2020, to ponder on the related pressing issues varying in their diversity from text mining to genomics. Under the "Machine learning" track, we determined optimal k-mer length for feature extraction, constructed continuous distributed representations for protein sequences to create phylogenetic trees in an alignment-free manner, and clustered predicted MHC class I and II binding affinity to aid in vaccine design. All the related work in available in a Github repository under an MIT license for future research.



Sign in / Sign up

Export Citation Format

Share Document