Ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

Ghost-tree is a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach uses one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families) as a “foundation” phylogeny. A second, more rapidly evolving genetic marker is then used to build “extension” phylogenies for more closely related organisms (e.g., fungal species or strains) that are then grafted on to the foundation tree by mapping taxonomic names. We apply ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. The result is a phylogenetic tree, compatible with the commonly used UNITE fungal database, that supports phylogenetic diversity analysis (e.g., UniFrac) of fungal communities profiled using ITS markers. Availability: ghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree.

Download Full-text

Ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

10.7287/peerj.preprints.1106v1 ◽

2015 ◽

Author(s):

Jennifer Fouquier ◽

Jai R Rideout ◽

Evan Bolyen ◽

John H Chase ◽

Arron Shiffer ◽

...

Keyword(s):

Phylogenetic Tree ◽

Genetic Marker ◽

Phylogenetic Trees ◽

Phylogenetic Diversity ◽

Sequence Data ◽

Fungal Species ◽

Bioinformatics Tool ◽

Hybrid Gene ◽

Fungal Database ◽

Taxonomic Groups

Download Full-text

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text

Distribution of Therapeutic Efficacy of Ranunculales Plants Used by Ethnic Minorities on the Phylogenetic Tree of Chinese Species

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2022/9027727 ◽

2022 ◽

Vol 2022 ◽

pp. 1-10

Author(s):

Da-Cheng Hao ◽

Yulu Zhang ◽

Chun-Nian He ◽

Pei-Gen Xiao

Keyword(s):

Phylogenetic Tree ◽

Ethnic Minorities ◽

Therapeutic Efficacy ◽

Phylogenetic Trees ◽

Therapeutic Effects ◽

Distribution Law ◽

R Language ◽

Net Relatedness Index ◽

Angiosperm Species ◽

Taxonomic Groups

The medicinal properties of plants can be evolutionarily predicted by phylogeny-based methods, which, however, have not been used to explore the regularity of therapeutic effects of Chinese plants utilized by ethnic minorities. This study aims at exploring the distribution law of therapeutic efficacy of Ranunculales plants on the phylogenetic tree of Chinese species. We collected therapeutic efficacy data of 551 ethnomedicinal species belonging to five species-rich families of Ranunculales; these therapeutic data were divided into 15 categories according to the impacted tissues and organs. The phylogenetic tree of angiosperm species was used to analyze the phylogenetic signals of ethnomedicinal plants by calculating the net relatedness index (NRI) and nearest taxon index (NTI) in R language. The NRI results revealed a clustered structure for eight medicinal categories (poisoning/intoxication, circulatory, gastrointestinal, eyesight, oral, pediatric, skin, and urinary disorders) and overdispersion for the remaining seven (neurological, general, hepatobiliary, musculoskeletal, otolaryngologic, reproductive, and respiratory disorders), while the NTI metric identified the clustered structure for all. Statistically, NRI and NTI values were significant in 5 and 11 categories, respectively. It was found that Mahonia eurybracteata has therapeutic effects on all categories. iTOL was used to visualize the distribution of treatment efficacy on species phylogenetic trees. By figuring out the distribution of therapeutic effects of Ranunculales medicinal plants, the importance of phylogenetic methods in finding potential medicinal resources is highlighted; NRI, NTI, and similar indices can be calculated to help find taxonomic groups with medicinal efficacy based on the phylogenetic tree of flora in different geographic regions.

Download Full-text

Analysis of The Nucleotide Sequence Diversity of the Lassa Virus and Augmenting its Phylogenetic Tree

STEM Fellowship Journal ◽

10.17975/sfj-2018-005 ◽

2018 ◽

Vol 4 (1) ◽

pp. 21-26

Author(s):

Sean Oddoye

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Nucleotide Diversity ◽

Sequence Data ◽

Sequence Diversity ◽

Future Research ◽

P Value ◽

Hemorrhagic Disease ◽

Lassa Virus ◽

Glycoprotein Precursor

Lassa Virus (LASV) is the etiological catalyst for Lassa fever, an acute hemorrhagic disease with a mortality rate of 15%. Many aspects of the Lassa virus are not understood, like the causation of deafness in ⅓ of surviving patients or why symptoms are benign for 80% of those infected with the virus. Ambiguities like these suggest that there might exist some genomic heterogeneity among infecting viruses and demonstrate a need to quantify and analyze polymorphisms within LASV. Patterns that emerge from phylogenetic trees can be used to assess the structure of a population while also providing insights to the genetic makeup. The purpose of this investigation was to develop a more streamlined means of calculating nucleotide diversity within a subpopulation of Lassa virus strains and to augment a phylogenetic tree of the Lassa Virus glycoprotein precursor (GPC) segment. A total of 25 partial and complete data sequences of LASV strains were obtained from the Genbank Archives. During phase one of this investigation, the sequence data was inputted into MEGA analytical software and the sequence diversity was derived on a nucleotide level. Data from the individual strand sequences was used to augment a phylogenetic tree using Treeview X software. In phase two of this investigation, an algorithm was created using RStudio, with BSGenome and BioStrings extensions. The sequence diversity derived from the statistical analyses on MEGA was compared to that of the algorithm created. A p-value of 0.08 was found, which deviates from the accepted range of non-medical p-value of 0.00 to 0.05. It is suggested that future research focuses on creating a refurbished version of the algorithm to calculate a nucleotide diversity within a percent error of 5%.

Download Full-text

For common community phylogenetic analyses, go ahead and use synthesis phylogenies

10.1101/370353 ◽

2018 ◽

Cited By ~ 3

Author(s):

Daijiang Li ◽

Lauren Trotta ◽

Hannah E. Marx ◽

Julie M. Allen ◽

Miao Sun ◽

...

Keyword(s):

Phylogenetic Trees ◽

Phylogenetic Diversity ◽

Gene Sequence ◽

Sequence Data ◽

Phylogenetic Signal ◽

Phylogenetic Analyses ◽

Tree Of Life ◽

Pairwise Distance ◽

Highly Correlated ◽

Gene Sequence Data

AbstractShould we build our own phylogenetic trees based on gene sequence data, or can we simply use available synthesis phylogenies? This is a fundamental question that any study involving a phylogenetic framework must face at the beginning of the project. Building a phylogeny from gene sequence data (purpose-built phylogeny) requires more effort, expertise, and cost than subsetting an already available phylogeny (synthesis-based phylogeny). However, we still lack a comparison of how these two approaches to building phylogenetic trees influence common community phylogenetic analyses such as comparing community phylogenetic diversity and estimating trait phylogenetic signal. Here, we generated three purpose-built phylogenies and their corresponding synthesis-based trees (two from Phylomatic and one from the Open Tree of Life [OTL]). We simulated 1,000 communities and 12,000 continuous traits along each purpose-built phylogeny. We then compared the effects of different trees on estimates of phylogenetic diversity (alpha and beta) and phylogenetic signal (Pagel’s λ and Blomberg’s K). Synthesis-based phylogenies generally yielded higher estimates of phylogenetic diversity when compared to purpose-built phylogenies. However, resulting measures of phylogenetic diversity from both types of phylogenies were highly correlated (Spearman’s ρ > 0.8 in most cases). Mean pairwise distance (both alpha and beta) is the index that is most robust to the differences in tree construction that we tested. Measures of phylogenetic diversity based on the OTL showed the highest correlation with measures based on the purpose-built phylogenies. Trait phylogenetic signal estimated with synthesis-based phylogenies, especially from the OTL, were also highly correlated with estimates of Blomberg’s K or close to Pagel’s λ from purpose-built phylogenies when traits were simulated under Brownian Motion. For commonly employed community phylogenetic analyses, our results justify taking advantage of recently developed and continuously improving synthesis trees, especially the Open Tree of Life.

Download Full-text

Phylogenetic tree shapes resolve disease transmission patterns

10.1101/003194 ◽

2014 ◽

Cited By ~ 1

Author(s):

Caroline Colijn ◽

Jennifer Gardy

Keyword(s):

Phylogenetic Tree ◽

Real World ◽

Disease Transmission ◽

Phylogenetic Trees ◽

Sequence Data ◽

Communicable Disease ◽

Disease Outbreaks ◽

Transmission Dynamics ◽

Topological Features ◽

Computationally Intensive

AbstractWhole genome sequencing is becoming popular as a tool for understanding outbreaks of communicable diseases, with phylogenetic trees being used to identify individual transmission events or to characterize outbreak-level overall transmission dynamics. Existing methods to infer transmission dynamics from sequence data rely on well-characterised infectious periods, epidemiological and clinical meta-data which may not always be available, and typically require computationally intensive analysis focussing on the branch lengths in phylogenetic trees. We sought to determine whether the topological structures of phylogenetic trees contain signatures of the overall transmission patterns underyling an outbreak. Here we use simulated outbreaks to train and then test computational classifiers. We test the method on data from two real-world outbreaks. We find that different transmission patterns result in quantitatively different phylogenetic tree shapes. We describe five topological features that summarize a phylogeny’s structure and find that computational classifiers based on these are capable of predicting an outbreak’s transmission dynamics. The method is robust to variations in the transmission parameters and network types, and recapitulates known epidemiology of previously characterized real-world outbreaks. We conclude that there are simple structural properties of phylogenetic trees which, when combined, can distinguish communicable disease outbreaks with a super-spreader, homogeneous transmission, and chains of transmission. This is possible using genome data alone, and can be done during an outbreak. We discuss the implications for management of outbreaks.

Download Full-text

Compressing Streams of Phylogenetic Trees

10.1101/440644 ◽

2018 ◽

Author(s):

Axel Trefzer ◽

Alexandros Stamatakis

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Sequence Data ◽

Branch Length ◽

Distinct Species ◽

Mcmc Methods ◽

Molecular Sequence Data ◽

Molecular Sequence ◽

Posterior Probability Distribution ◽

Tree Compression

AbstractBayesian Markov-Chain Monte Carlo (MCMC) methods for phylogenetic tree inference, that is, inference of the evolutionary history of distinct species using their molecular sequence data, typically generate large sets of phylogenetic trees. The trees generated by the MCMC procedure are samples of the posterior probability distribution that MCMC methods approximate. Thus, they generate a stream of correlated binary trees that need to be stored. Here, we adapt state-of-the art algorithms for binary tree compression to phylogenetic tree data streams and extend them to also store the required meta-data. On a phylogenetic tree stream containing 1, 000 trees with 500 leaves including branch length values, we achieve a compression rate of 5.4 compared to the uncompressed tree files and of 1.8 compared to bzip2-compressed tree files. For compressing the same trees, but without branch length values, our compression method is approximately an order of magnitude better than bzip2. A prototype implementation is available at https://github.com/axeltref/tree-compression.git.

Download Full-text

SeqDistK: a Novel Tool for Alignment-free Phylogenetic Analysis

10.1101/2021.08.16.456500 ◽

2021 ◽

Author(s):

Xuemei Liu ◽

Wen Li ◽

Guanda Huang ◽

Tianlai Huang ◽

Qingang Xiong ◽

...

Keyword(s):

Phylogenetic Analysis ◽

16S Rrna ◽

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Large Scale ◽

Sequence Data ◽

Ground Truth ◽

Group Method ◽

Metagenomic Sequence ◽

Alignment Free

Algorithms for constructing phylogenetic trees are fundamental to study the evolution of viruses, bacteria, and other microbes. Established multiple alignment-based algorithms are inefficient for large scale metagenomic sequence data because of their high requirement of inter-sequence correlation and high computational complexity. In this paper, we present SeqDistK, a novel tool for alignment-free phylogenetic analysis. SeqDistK computes the dissimilarity matrix for phylogenetic analysis, incorporating seven k-mer based dissimilarity measures, namely d2, d2S, d2star, Euclidean, Manhattan, CVTree, and Chebyshev. Based on these dissimilarities, SeqDistK constructs phylogenetic tree using the Unweighted Pair Group Method with Arithmetic Mean algorithm. Using a golden standard dataset of 16S rRNA and its associated phylogenetic tree, we compared SeqDistK to Muscle - a multi sequence aligner. We found SeqDistK was not only 38 times faster than Muscle in computational efficiency but also more accurate. SeqDistK achieved the smallest symmetric difference between the inferred and ground truth trees with a range between 13 to 18, while that of Muscle was 62. When measures d2, d2star, d2S, Euclidean, and k-mer size k=5 were used, SeqDistK consistently inferred phylogenetic tree almost identical to the ground truth tree. We also performed clustering of 16S rRNA sequences using SeqDistK and found the clustering was highly consistent with known biological taxonomy. Among all the measures, d2S (k=5, M=2) showed the best accuracy as it correctly clustered and classified all sample sequences. In summary, SeqDistK is a novel, fast and accurate alignment-free tool for large-scale phylogenetic analysis. SeqDistK software is freely available at https://github.com/htczero/SeqDistK.

Download Full-text

Phylogenetic relationships in Dermocybe and related Cortinarius taxa based on nuclear ribosomal DNA internal transcribed spacers

Canadian Journal of Botany ◽

10.1139/b97-058 ◽

1997 ◽

Vol 75 (4) ◽

pp. 519-532 ◽

Cited By ~ 46

Author(s):

Y. J. Liu ◽

S. O. Rogers ◽

Y. J. Liu ◽

J. F. Ammirati

Keyword(s):

Ribosomal Dna ◽

Phylogenetic Relationships ◽

Phylogenetic Trees ◽

Sequence Data ◽

Molecular Data ◽

Internal Transcribed Spacers ◽

Nuclear Ribosomal Dna ◽

Herbarium Specimens ◽

Data Reorganization ◽

Taxonomic Groups

The genus Cortinarius Fr. (Cortinariaceae, Agaricales) is divided into four or more subgenera. Dermocybe (Fr.) Sacc. has been recognized as either a subgenus of Cortinarius or a separate genus, distinguished in part by the presence of various anthraquinonic pigments. Nucleotide sequences of ribosomal DNA 5.8S and internal transcribed spacers were used to investigate the phylogenetic relationships among species of Dermocybe and selected taxa from subgenera of Cortinarius. Sequence data from 47 herbarium specimens representing 31 taxa (28 species plus 3 varieties) of Dermocybe and Cortinarius were analyzed using parsimony, maximum likelihood, and neighbor joining. In general, molecular data support the morphological groupings of the taxa, although they more closely correspond to biochemical (anthraquinone and other) analyses. Phylogenetic trees showed that, while the sections Dermocybe and Malicoriae are monophyletic, and the concolorous or almost concolorous red species (section Sanguineae, such as D. sanguinea and relatives) together formed a coherent clade, the subgenus Dermocybe sensu lato itself is polyphyletic. Cortinarius californicus clusters with taxa in Cortinarius, subgenus Telamonia, section Armillati. Dermocybe olivaceopicta is more closely related to other subgenera of Cortinarius than to Dermocybe. Within the genus Cortinarius, certain of the subgenera may actually represent coherent genera. Of the subgenera examined, Telamonia, Phlegmacium, and possibly Sericeocybe appear to represent well defined taxonomic groupings. However, current assignments of taxa within Leprocybe and Myxacium were inconsistent with the molecular data. Reorganization of some taxa and taxonomic groups is suggested. Key words: Dermocybe, Cortinarius, molecular phylogeny, rDNA, ITS1, ITS2.

Download Full-text

Molecular evidence for hybridization in the aquatic plant Limosella on sub-Antarctic Marion Island

Antarctic Science ◽

10.1017/s0954102021000079 ◽

2021 ◽

pp. 1-9

Author(s):

John H. Chau ◽

Michelle Greve ◽

Bettine Jansen van Vuuren

Keyword(s):

Phylogenetic Trees ◽

Sequence Data ◽

Phylogenetic Analyses ◽

Morphological Variability ◽

Molecular Evidence ◽

Marion Island ◽

Specific Sequence ◽

Nuclear Locus ◽

Species Specific ◽

Taxonomic Groups

Abstract DNA sequence data have become a crucial tool in assessing the relationship between morphological variation and genetic and taxonomic groups, including in the Antarctic biota. Morphologically distinct populations of submersed aquatic vascular plants were observed on sub-Antarctic Marion Island, potentially representing the two species of such plants listed in the island's flora, Limosella australis R.Br. (Scrophulariaceae) and Ranunculus moseleyi Hook.f. (Ranunculaceae). To confirm their taxonomic identity, we sequenced a nuclear locus (internal transcribed spacer; ITS) and two plastid loci (trnL-trnF, rps16) from three specimens collected on Marion Island and compared the sequences with those in public sequence databases. For all three loci, sequences from the Marion Island specimens were nearly identical despite morphological dissimilarity, and phylogenetic analyses resolved them to a position in Limosella. In phylogenetic trees and comparisons of species-specific sequence polymorphisms, the Marion Island specimens were closest to a clade comprising Limosella aquatica L., L. curdieana F.Muell. and L. major Diels for ITS and closest to L. australis for the plastid loci. Cytonuclear discordance suggests a history of hybridization or introgression, which may have consequences for morphological variability and ecological adaptation.

Download Full-text