A short guide to phylogeny reconstruction

This review is a short introduction to phylogenetic analysis. Phylogenetic analysis allows comprehensive understanding of the origin and evolution of species. Generally, it is possible to construct the phylogenetic trees according to different features and characters (e.g. morphological and anatomical characters, RAPD patterns, FISH patterns, sequences of DNA/RNA and amino acid sequences). The DNA sequences are preferable for phylogenetic analyses of closely related species. On the other hand, the amino acid sequences are used for phylogenetic analyses of more distant relationships. The sequences can be analysed using many computer programs. The methods most often used for phylogenetic analyses are neighbor-joining (NJ), maximum parsimony (MP), maximum likelihood (ML) and Bayesian inference.

Download Full-text

Phylogenetic Analysis: Basic Concepts and Its Use as a Tool for Virology and Molecular Epidemiology

Acta Scientiae Veterinariae ◽

10.22456/1679-9216.81158 ◽

2018 ◽

Vol 44 (1) ◽

pp. 20

Author(s):

Eloiza Teles Caldart ◽

Helena Mata ◽

Cláudio Wageck Canal ◽

Ana Paula Ravazzolo

Keyword(s):

Phylogenetic Analysis ◽

Amino Acid ◽

Molecular Epidemiology ◽

Phylogenetic Analyses ◽

Phylogenetic Reconstruction ◽

Evolutionary Process ◽

Amino Acid Sequences ◽

Evolutionary Models ◽

Reconstruction Methods ◽

Basic Concepts

Background: Phylogenetic analyses are an essential part in the exploratory assessment of nucleic acid and amino acid sequences. Particularly in virology, they are able to delineate the evolution and epidemiology of disease etiologic agents and/or the evolutionary path of their hosts. The objective of this review is to help researchers who want to use phylogenetic analyses as a tool in virology and molecular epidemiology studies, presenting the most commonly used methodologies, describing the importance of the different techniques, their peculiar vocabulary and some examples of their use in virology.Review: This article starts presenting basic concepts of molecular epidemiology and molecular evolution, emphasizing their relevance in the context of viral infectious diseases. It presents a session on the vocabulary relevant to the subject, bringing readers to a minimum level of knowledge needed throughout this literature review. Within its main subject, the text explains what a molecular phylogenetic analysis is, starting from a multiple alignment of nucleotide or amino acid sequences. The different software used to perform multiple alignments may apply different algorithms. To build a phylogeny based on amino acid or nucleotide sequences it is necessary to produce a data matrix based on a model for nucleotide or amino acid replacement, also called evolutionary model. There are a number of evolutionary models available, varying in complexity according to the number of parameters (transition, transversion, GC content, nucleotide position in the codon, among others). Some papers presented herein provide techniques that can be used to choose evolutionary models. After the model is chosen, the next step is to opt for a phylogenetic reconstruction method that best fits the available data and the selected model. Here we present the most common reconstruction methods currently used, describing their principles, advantages and disadvantages. Distance methods, for example, are simpler and faster, however, they do not provide reliable estimations when the sequences are highly divergent. The accuracy of the analysis with probabilistic models (neighbour joining, maximum likelihood and bayesian inference) strongly depends on the adherence of the actual data to the chosen development model. Finally, we also explore topology confidence tests, especially the most used one, the bootstrap. To assist the reader, this review presents figures to explain specific situations discussed in the text and numerous examples of previously published scientific articles in virology that demonstrate the importance of the techniques discussed herein, as well as their judicious use.Conclusion: The DNA sequence is not only a record of phylogeny and divergence times, but also keeps signs of how the evolutionary process has shaped its history and also the elapsed time in the evolutionary process of the population. Analyses of genomic sequences by molecular phylogeny have demonstrated a broad spectrum of applications. It is important to note that for the different available data and different purposes of phylogenies, reconstruction methods and evolutionary models should be wisely chosen. This review provides theoretical basis for the choice of evolutionary models and phylogenetic reconstruction methods best suited to each situation. In addition, it presents examples of diverse applications of molecular phylogeny in virology.

Download Full-text

Phylogenetic analysis of human rhinovirus capsid protein VP1 and 2A protease coding sequences confirms shared genus-like relationships with human enteroviruses

Journal of General Virology ◽

10.1099/vir.0.80445-0 ◽

2005 ◽

Vol 86 (3) ◽

pp. 697-706 ◽

Cited By ~ 50

Author(s):

Pia Laine ◽

Carita Savolainen ◽

Soile Blomqvist ◽

Tapani Hovi

Keyword(s):

Phylogenetic Analysis ◽

Amino Acid ◽

Capsid Protein ◽

Phylogenetic Trees ◽

Amino Acid Sequences ◽

Human Rhinovirus ◽

Human Enterovirus ◽

Coding Region ◽

Coding Sequences ◽

Capsid Protein Vp1

Phylogenetic analysis of the capsid protein VP1 coding sequences of all 101 human rhinovirus (HRV) prototype strains revealed two major genetic clusters, similar to that of the previously reported VP4/VP2 coding sequences, representing the established two species, Human rhinovirus A (HRV-A) and Human rhinovirus B (HRV-B). Pairwise nucleotide identities varied from 61 to 98 % within and from 46 to 55 % between the two HRV species. Interserotypic sequence identities in both HRV species were more variable than those within any Human enterovirus (HEV) species in the same family. This means that unequivocal serotype identification by VP1 sequence analysis used for HEV strains may not always be possible for HRV isolates. On the other hand, a comprehensive insight into the relationships between VP1 and partial 2A sequences of HRV and HEV revealed a genus-like situation. Distribution of pairwise nucleotide identity values between these genera varied from 41 to 54 % in the VP1 coding region, similar to those between heterologous members of the two HRV species. Alignment of the deduced amino acid sequences revealed more fully conserved amino acid residues between HRV-B and polioviruses than between the two HRV species. In phylogenetic trees, where all HRVs and representatives from all HEV species were included, the two HRV species did not cluster together but behaved like members of the same genus as the HEVs. In conclusion, from a phylogenetic point of view, there are no good reasons to keep these two human picornavirus genera taxonomically separated.

Download Full-text

Quantitative Analysis of Protein Evolution: The Phylogeny of Osteopontin

Frontiers in Genetics ◽

10.3389/fgene.2021.700789 ◽

2021 ◽

Vol 12 ◽

Author(s):

Xia Wang ◽

Georg F. Weber

Keyword(s):

Amino Acids ◽

Phylogenetic Analysis ◽

Amino Acid ◽

Protein Evolution ◽

Phylogenetic Trees ◽

Building Blocks ◽

Amino Acid Sequences ◽

Systems Research ◽

Physico Chemical ◽

Box Counting Dimension

The phylogenetic analysis of proteins conventionally relies on the evaluation of amino acid sequences or coding sequences. Individual amino acids have measurable features that allow the translation from strings of letters (amino acids or bases) into strings of numbers (physico-chemical properties). When the letters are converted to measurable properties, such numerical strings can be evaluated quantitatively with various tools of complex systems research. We build on our prior phylogenetic analysis of the cytokine Osteopontin to validate the quantitative approach toward the study of protein evolution. Phylogenetic trees constructed from the number strings differentiate among all sequences. In pairwise comparisons, autocorrelation, average mutual information and box counting dimension yield one number each for the overall relatedness between sequences. We also find that bivariate wavelet analysis distinguishes hypermutable regions from conserved regions of the protein. The investigation of protein evolution via quantitative study of the physico-chemical characteristics pertaining to the amino acid building blocks broadens the spectrum of applicable research tools, accounts for mutation as well as selection, gives assess to multiple vistas depending on the property evaluated, discriminates more accurately among sequences, and renders the analysis more quantitative than utilizing strings of letters as starting points.

Download Full-text

Purification and characterization of a uterine retinol-binding protein in the bitch

Biochemical Journal ◽

10.1042/bj3110407 ◽

1995 ◽

Vol 311 (2) ◽

pp. 407-415 ◽

Cited By ~ 9

Author(s):

W C Buhi ◽

I M Alvarez ◽

V M Shille ◽

M J Thatcher ◽

J P Harney ◽

...

Keyword(s):

Amino Acid ◽

Dna Sequences ◽

Binding Protein ◽

Amino Acid Content ◽

Amino Acid Sequences ◽

Retinol Binding Protein ◽

Total Amino Acid ◽

Major Protein ◽

Serum Retinol ◽

Two Dimensional Gel Electrophoresis

A major canine endometrial secreted protein (cP6, 23,000-M(r)) was purified by ion-exchange and gel-filtration chromatography and characterized by two-dimensional gel electrophoresis. Anti-[human retinol-binding protein (hRBP)] serum identified cP6 on immunoblot analysis and immunoprecipitated cP6 from culture medium. This major protein was also shown to bind [3H]retinol. N-terminal and internal amino acid sequences were determined and compared with previously identified protein, RNA, or DNA sequences. N-terminal analysis revealed that cP6 had high identity and similarity to serum retinol-binding proteins (RBPs), while internal sequence analysis showed a strong similarity to rat androgen-dependent epididymal protein and beta-lactoglobulins. Amino acid analysis, however, showed significant differences between these proteins and cP6 in both total amino acid content and certain selected amino acids. Immunohistochemical analysis showed staining for RBP only in the uterine luminal epithelium. These studies suggest that bitch endometrium secretes a family of proteins (cP6), some of which bind [3H]retinol, are immunologically related to the RBP family, and have N-terminal and internal sequences with a high similarity to RBP, beta-lactoglobulins and other members of the lipocalin family. This family of proteins may be important in early development for supplying retinol or derivatives to the developing embryo.

Download Full-text

Structural and antigenic polymorphism of the 35- to 48-kilodalton merozoite surface antigen (MSA-2) of the malaria parasite Plasmodium falciparum

Molecular and Cellular Biology ◽

10.1128/mcb.11.2.963-971.1991 ◽

1991 ◽

Vol 11 (2) ◽

pp. 963-971

Author(s):

B Fenton ◽

J T Clark ◽

C M Khan ◽

J V Robinson ◽

D Walliker ◽

...

Keyword(s):

Plasmodium Falciparum ◽

Amino Acid ◽

Surface Antigen ◽

Dna Sequences ◽

Amino Acid Sequences ◽

Merozoite Surface Antigen ◽

Parasite Plasmodium ◽

Genes Encoding ◽

Parasite Plasmodium Falciparum ◽

Group B

Merozoite surface antigen MSA-2 of the human parasite Plasmodium falciparum is being considered for the development of a malaria vaccine. The antigen is polymorphic, and specific monoclonal antibodies differentiate five serological variants of MSA-2 among 25 parasite isolates. The variants are grouped into two major serogroups, A and B. Genes encoding two different variants from serogroup A have been sequenced, and their DNA together with deduced amino acid sequences were compared with sequences encoded by other alleles. The comparison shows that the serological classification reflects differences in DNA sequences and deduced primary structure of MSA-2 variants and serogroups. Thus, the overall homologies of DNA and amino acid sequences are over 95% among variants in the same serogroup. In contrast, similarities between the group A variants and a group B variant are only 70 and 64% for DNA and amino acid sequences, respectively. We propose that the MSA-2 protein is encoded by two highly divergent groups of alleles, with limited additional polymorphism displayed within each group.

Download Full-text

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text

Inferring Species Trees from Gene Trees: A Phylogenetic Analysis of the Elapidae (Serpentes) Based on the Amino Acid Sequences of Venom Proteins

Molecular Phylogenetics and Evolution ◽

10.1006/mpev.1997.0434 ◽

1997 ◽

Vol 8 (3) ◽

pp. 349-362 ◽

Cited By ~ 79

Author(s):

Joseph B Slowinski ◽

Alec Knight ◽

Alejandro P Rooney

Keyword(s):

Phylogenetic Analysis ◽

Amino Acid ◽

Amino Acid Sequences ◽

Gene Trees ◽

Species Trees ◽

Venom Proteins

Download Full-text

Taxonomic status of Bhanja and Kismayo viruses (family Bunyaviridae)

Epidemiology and Infectious Diseases (Russian Journal) ◽

10.17816/eid40625 ◽

2012 ◽

Vol 17 (4) ◽

pp. 4-8

Author(s):

A. S Klimentov ◽

A. P Gmyl ◽

A. M Butenko ◽

L. V Gmyl ◽

O. V Isaeva ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Amino Acid ◽

Nucleotide Sequence ◽

Taxonomic Status ◽

Amino Acid Sequences ◽

The Family ◽

L Segment

The nucleotide sequence of M= (1398 nucleotides and L= (6186 nucleotides) segments of the genome of Bhanja virus and L-segment (1297 nucleotides) of Kismayo virus has been partially determined. Phylogenetic analysis of deduced amino acid sequences showed that these viruses are novel members of the Flebovirus (Phlebovirus) genus in the family Bunyaviridae

Download Full-text

Molecular characterization and phylogenetic analysis of NBS-LRR genes in wild relatives of eggplant (Solanum melongena L

Indian Journal of Agricultural Research ◽

10.18805/ijare.a-4793 ◽

2018 ◽

Author(s):

Sona. S Dev ◽

P. Poornima ◽

Akhil Venu

Keyword(s):

Phylogenetic Analysis ◽

Amino Acid ◽

Sequence Similarity ◽

Interleukin 1 ◽

Preliminary Investigation ◽

Solanum Melongena ◽

Wild Relatives ◽

Amino Acid Sequences ◽

R Genes ◽

Multiple Sequence

Eggplantor brinjal (Solanum melongena L.), is highly susceptible to various soil-borne diseases. The extensive use of chemical fungicides to combat these diseases can be minimized by identification of resistance gene analogs (RGAs) in wild species of cultivated plants.In the present study, degenerate PCR primers for the conserved regions ofnucleotide binding site-leucine rich repeat (NBS-LRR) were used to amplify RGAs from wild relatives of eggplant (Black nightshade (Solanum nigrum), Indian nightshade (Solanumviolaceum)and Solanu mincanum) which showed resistance to the bacterial wilt pathogen, Ralstonia solanacearumin the preliminary investigation. The amino acid sequence of the amplicons when compared to each other and to the amino acid sequences of known RGAs deposited in Gen Bank revealed significant sequence similarity. The phylogenetic analysis indicated that they belonged to the toll interleukin-1 receptors (TIR)-NBS-LRR type R-genes. Multiple sequence alignment with other known R genes showed significant homology with P-loop, Kinase 2 and GLPL domains of NBS-LRR class genes. There has been no report on R genes from these wild eggplants and hence the diversity analysis of these novel RGAs can lead to the identification of other novel R genes within the germplasm of different brinjal plants as well as other species of Solanum.

Download Full-text

Identification of mariner-like elements in Sitodiplosis mosellana (Diptera: Cecidomyiidae)

The Canadian Entomologist ◽

10.4039/n05-007 ◽

2006 ◽

Vol 138 (2) ◽

pp. 138-146 ◽

Cited By ~ 4

Author(s):

O. Mittapalli ◽

R.H. Shukle ◽

I.L. Wise

Keyword(s):

Phylogenetic Analysis ◽

Amino Acid ◽

Blot Analysis ◽

Copy Number ◽

Southern Blot Analysis ◽

Pcr Primers ◽

Amino Acid Sequences ◽

Degenerate Pcr ◽

Sitodiplosis Mosellana ◽

Wheat Midge

AbstractMariner-like element sequences were recovered from the genome of the orange wheat midge, Sitodiplosis mosellana (Géhin), with degenerate PCR primers designed to conserved regions of mariner transposases. The deduced amino acid sequences of the mariner-like transposases from S. mosellana showed 67% to 78% identity with the peptide sequences of other mariner transposases. A phylogenetic analysis revealed that the mariner-like elements from S. mosellana grouped in the mauritiana subfamily of mariner transposons. Results from Southern blot analysis suggest mariner-like elements are at a moderate copy number in the genome of S. mosellana.

Download Full-text