scholarly journals MonoPhy: a simple R package to find and visualize monophyly issues

2016 ◽  
Vol 2 ◽  
pp. e56 ◽  
Author(s):  
Orlando Schwery ◽  
Brian C. O’Meara

Background.The monophyly of taxa is an important attribute of a phylogenetic tree. A lack of it may hint at shortcomings of either the tree or the current taxonomy, or can indicate cases of incomplete lineage sorting or horizontal gene transfer. Whichever is the reason, a lack of monophyly can misguide subsequent analyses. While monophyly is conceptually simple, it is manually tedious and time consuming to assess on modern phylogenies of hundreds to thousands of species.Results.The R packageMonoPhyallows assessment and exploration of monophyly of taxa in a phylogeny. It can assess the monophyly of genera using the phylogeny only, and with an additional input file any other desired higher order taxa or unranked groups can be checked as well.Conclusion.Summary tables, easily subsettable results and several visualization options allow quick and convenient exploration of monophyly issues, thus makingMonoPhya valuable tool for any researcher working with phylogenies.

2015 ◽  
Author(s):  
Orlando Schwery ◽  
Brian C O'Meara

Background. The monophyly of taxa is an important attribute of a phylogenetic tree, as a lack of it may hint at shortcomings of either the tree or the current taxonomy and can misguide subsequent analyses. While monophyly is conceptually simple, it is manually tedious and time consuming to assess on modern phylogenies of hundreds to thousands of species. Results. The R package MonoPhy allows assessment and exploration of monophyly of taxa in a phylogeny. It can assess the monophyly of genera using the phylogeny only, and with an additional input file, any other desired higher taxa or unranked groups can be checked as well. Conclusion. Summary tables, easily subsettable results and several visualization options allow quick and convenient exploration of monophyly issues, thus making MonoPhy a valuable tool for any researcher working with phylogenies.


BMC Genomics ◽  
2015 ◽  
Vol 16 (Suppl 10) ◽  
pp. S1 ◽  
Author(s):  
Ruth Davidson ◽  
Pranjal Vachaspati ◽  
Siavash Mirarab ◽  
Tandy Warnow

2015 ◽  
Author(s):  
Orlando Schwery ◽  
Brian C O'Meara

Background. The monophyly of taxa is an important attribute of a phylogenetic tree, as a lack of it may hint at shortcomings of either the tree or the current taxonomy and can misguide subsequent analyses. While monophyly is conceptually simple, it is manually tedious and time consuming to assess on modern phylogenies of hundreds to thousands of species. Results. The R package MonoPhy allows assessment and exploration of monophyly of taxa in a phylogeny. It can assess the monophyly of genera using the phylogeny only, and with an additional input file, any other desired higher taxa or unranked groups can be checked as well. Conclusion. Summary tables, easily subsettable results and several visualization options allow quick and convenient exploration of monophyly issues, thus making MonoPhy a valuable tool for any researcher working with phylogenies.


2015 ◽  
Author(s):  
Ruth Davidson ◽  
Pranjal Vachaspati ◽  
Siavash Mirarab ◽  
Tandy Warnow

Background: Species tree estimation is challenged by gene tree heterogeneity resulting from biological processes such as duplication and loss, hybridization, incomplete lineage sorting (ILS), and horizontal gene transfer (HGT). Mathematical theory about reconstructing species trees in the presence of HGT alone or ILS alone suggests that quartet-based species tree methods (known to be statistically consistent under ILS, or under bounded amounts of HGT) might be effective techniques for estimating species trees when both HGT and ILS are present. Results: We evaluated several publicly available coalescent-based methods and concatenation under maximum likelihood on simulated datasets with moderate ILS and varying levels of HGT. Our study shows that two quartet-based species tree estimation methods (ASTRAL-2 and weighted Quartets MaxCut) are both highly accurate, even on datasets with high rates of HGT. In contrast, although NJst and concatenation using maximum likelihood are highly accurate under low HGT, they are less robust to high HGT rates. Conclusion: Our study shows that quartet-based species-tree estimation methods can be highly accurate under the presence of both HGT and ILS. The study suggests the possibility that some quartet-based methods might be statistically consistent under phylogenomic models of gene tree heterogeneity with both HGT and ILS. Keywords: phylogenomics; HGT; ILS; summary methods; concatenation


Author(s):  
Bennett J Kapili ◽  
Anne E Dekas

Abstract Motivation Linking microbial community members to their ecological functions is a central goal of environmental microbiology. When assigned taxonomy, amplicon sequences of metabolic marker genes can suggest such links, thereby offering an overview of the phylogenetic structure underpinning particular ecosystem functions. However, inferring microbial taxonomy from metabolic marker gene sequences remains a challenge, particularly for the frequently sequenced nitrogen fixation marker gene, nitrogenase reductase (nifH). Horizontal gene transfer in recent nifH evolutionary history can confound taxonomic inferences drawn from the pairwise identity methods used in existing software. Other methods for inferring taxonomy are not standardized and require manual inspection that is difficult to scale. Results We present Phylogenetic Placement for Inferring Taxonomy (PPIT), an R package that infers microbial taxonomy from nifH amplicons using both phylogenetic and sequence identity approaches. After users place query sequences on a reference nifH gene tree provided by PPIT (n = 6317 full-length nifH sequences), PPIT searches the phylogenetic neighborhood of each query sequence and attempts to infer microbial taxonomy. An inference is drawn only if references in the phylogenetic neighborhood are: (1) taxonomically consistent and (2) share sufficient pairwise identity with the query, thereby avoiding erroneous inferences due to known horizontal gene transfer events. We find that PPIT returns a higher proportion of correct taxonomic inferences than BLAST-based approaches at the cost of fewer total inferences. We demonstrate PPIT on deep-sea sediment and find that Deltaproteobacteria are the most abundant potential diazotrophs. Using this dataset we show that emending PPIT inferences based on visual inspection of query sequence placement can achieve taxonomic inferences for nearly all sequences in a query set. We additionally discuss how users can apply PPIT to the analysis of other marker genes. Availability PPIT is freely available to non-commercial users at https://github.com/bkapili/ppit. Installation includes a vignette that demonstrates package use and reproduces the nifH amplicon analysis discussed here. The raw nifH amplicon sequence data have been deposited in the GenBank, EMBL, and DDBJ databases under BioProject number PRJEB37167. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
John A Rhodes ◽  
Hector Baños ◽  
Jonathan D Mitchell ◽  
Elizabeth S Allman

Abstract Summary MSCquartets is an R package for species tree hypothesis testing, inference of species trees, and inference of species networks under the Multispecies Coalescent model of incomplete lineage sorting and its network analog. Input for these analyses are collections of metric or topological locus trees which are then summarized by the quartets displayed on them. Results of hypothesis tests at user-supplied levels are displayed in a simplex plot by color-coded points. The package implements the QDC and WQDC algorithms for topological and metric species tree inference, and the NANUQ algorithm for level-1 topological species network inference, all of which give statistically consistent estimators under the model. Availability MSCquartets is available through the Comprehensive R Archive Network: https://CRAN.R-project.org/package=MSCquartets. Supplementary information Supplementary materials, including example data and analyses, are incorporated into the package.


2011 ◽  
Vol 7 (5) ◽  
pp. 211-213 ◽  
Author(s):  
Tokumasa Horiike ◽  
Daisuke Miyata ◽  
Yoshio Tateno ◽  
Ryoichi Minai

2020 ◽  
Vol 36 (9) ◽  
pp. 2740-2749
Author(s):  
Henry Xing ◽  
Steven W Kembel ◽  
Vladimir Makarenkov

Abstract Motivation Phylogenetic trees and the methods for their analysis have played a key role in many evolutionary, ecological and bioinformatics studies. Alternatively, phylogenetic networks have been widely used to analyze and represent complex reticulate evolutionary processes which cannot be adequately studied using traditional phylogenetic methods. These processes include, among others, hybridization, horizontal gene transfer, and genetic recombination. Nowadays, sequence similarity and genome similarity networks have become an efficient tool for community analysis of large molecular datasets in comparative studies. These networks can be used for tackling a variety of complex evolutionary problems such as the identification of horizontal gene transfer events, the recovery of mosaic genes and genomes, and the study of holobionts. Results The shortest path in a phylogenetic tree is used to estimate evolutionary distances between species. We show how the shortest path concept can be extended to sequence similarity networks by defining five new distances, NetUniFrac, Spp, Spep, Spelp and Spinp, and the Transfer index, between species communities present in the network. These new distances can be seen as network analogs of the traditional UniFrac distance used to assess dissimilarity between species communities in a phylogenetic tree, whereas the Transfer index is intended for estimating the rate and direction of gene transfers, or species dispersal, between different phylogenetic, or ecological, species communities. Moreover, NetUniFrac and the Transfer index can be computed in linear time with respect to the number of edges in the network. We show how these new measures can be used to analyze microbiota and antibiotic resistance gene similarity networks. Availability and implementation Our NetFrac program, implemented in R and C, along with its source code, is freely available on Github at the following URL address: https://github.com/XPHenry/Netfrac. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document