MonoPhy: a simple R package to find and visualize monophyly issues

MonoPhy: A simple R package to find and visualize monophyly issues

10.7287/peerj.preprints.1600v1 ◽

2015 ◽

Author(s):

Orlando Schwery ◽

Brian C O'Meara

Keyword(s):

Phylogenetic Tree ◽

R Package ◽

Input File ◽

Higher Taxa ◽

Additional Input

Background. The monophyly of taxa is an important attribute of a phylogenetic tree, as a lack of it may hint at shortcomings of either the tree or the current taxonomy and can misguide subsequent analyses. While monophyly is conceptually simple, it is manually tedious and time consuming to assess on modern phylogenies of hundreds to thousands of species. Results. The R package MonoPhy allows assessment and exploration of monophyly of taxa in a phylogeny. It can assess the monophyly of genera using the phylogeny only, and with an additional input file, any other desired higher taxa or unranked groups can be checked as well. Conclusion. Summary tables, easily subsettable results and several visualization options allow quick and convenient exploration of monophyly issues, thus making MonoPhy a valuable tool for any researcher working with phylogenies.

Download Full-text

Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer

BMC Genomics ◽

10.1186/1471-2164-16-s10-s1 ◽

2015 ◽

Vol 16 (Suppl 10) ◽

pp. S1 ◽

Cited By ~ 33

Author(s):

Ruth Davidson ◽

Pranjal Vachaspati ◽

Siavash Mirarab ◽

Tandy Warnow

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Incomplete Lineage Sorting ◽

Species Tree ◽

Lineage Sorting ◽

Tree Estimation

Download Full-text

MonoPhy: A simple R package to find and visualize monophyly issues

10.7287/peerj.preprints.1600 ◽

2015 ◽

Author(s):

Orlando Schwery ◽

Brian C O'Meara

Keyword(s):

Phylogenetic Tree ◽

R Package ◽

Input File ◽

Higher Taxa ◽

Additional Input

Background. The monophyly of taxa is an important attribute of a phylogenetic tree, as a lack of it may hint at shortcomings of either the tree or the current taxonomy and can misguide subsequent analyses. While monophyly is conceptually simple, it is manually tedious and time consuming to assess on modern phylogenies of hundreds to thousands of species. Results. The R package MonoPhy allows assessment and exploration of monophyly of taxa in a phylogeny. It can assess the monophyly of genera using the phylogeny only, and with an additional input file, any other desired higher taxa or unranked groups can be checked as well. Conclusion. Summary tables, easily subsettable results and several visualization options allow quick and convenient exploration of monophyly issues, thus making MonoPhy a valuable tool for any researcher working with phylogenies.

Download Full-text

Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer

10.1101/023168 ◽

2015 ◽

Cited By ~ 1

Author(s):

Ruth Davidson ◽

Pranjal Vachaspati ◽

Siavash Mirarab ◽

Tandy Warnow

Keyword(s):

Maximum Likelihood ◽

Gene Transfer ◽

Horizontal Gene Transfer ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Species Tree ◽

Estimation Methods ◽

Species Trees ◽

Lineage Sorting ◽

Tree Estimation

Background: Species tree estimation is challenged by gene tree heterogeneity resulting from biological processes such as duplication and loss, hybridization, incomplete lineage sorting (ILS), and horizontal gene transfer (HGT). Mathematical theory about reconstructing species trees in the presence of HGT alone or ILS alone suggests that quartet-based species tree methods (known to be statistically consistent under ILS, or under bounded amounts of HGT) might be effective techniques for estimating species trees when both HGT and ILS are present. Results: We evaluated several publicly available coalescent-based methods and concatenation under maximum likelihood on simulated datasets with moderate ILS and varying levels of HGT. Our study shows that two quartet-based species tree estimation methods (ASTRAL-2 and weighted Quartets MaxCut) are both highly accurate, even on datasets with high rates of HGT. In contrast, although NJst and concatenation using maximum likelihood are highly accurate under low HGT, they are less robust to high HGT rates. Conclusion: Our study shows that quartet-based species-tree estimation methods can be highly accurate under the presence of both HGT and ILS. The study suggests the possibility that some quartet-based methods might be statistically consistent under phylogenomic models of gene tree heterogeneity with both HGT and ILS. Keywords: phylogenomics; HGT; ILS; summary methods; concatenation

Download Full-text

Comparing Two Bayesian Methods for Gene Tree/Species Tree Reconstruction: Simulations with Incomplete Lineage Sorting and Horizontal Gene Transfer

Systematic Biology ◽

10.1093/sysbio/syr003 ◽

2011 ◽

Vol 60 (3) ◽

pp. 261-275 ◽

Cited By ~ 71

Author(s):

Yujin Chung ◽

Cécile Ané

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Bayesian Methods ◽

Tree Species ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Species Tree ◽

Lineage Sorting ◽

Tree Reconstruction

Download Full-text

PPIT: an R package for inferring microbial taxonomy from nifH sequences

Bioinformatics ◽

10.1093/bioinformatics/btab100 ◽

2021 ◽

Author(s):

Bennett J Kapili ◽

Anne E Dekas

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Query Sequence ◽

Marker Gene ◽

R Package ◽

Supplementary Information ◽

Marker Genes ◽

Pairwise Identity ◽

Metabolic Marker ◽

Microbial Taxonomy

Abstract Motivation Linking microbial community members to their ecological functions is a central goal of environmental microbiology. When assigned taxonomy, amplicon sequences of metabolic marker genes can suggest such links, thereby offering an overview of the phylogenetic structure underpinning particular ecosystem functions. However, inferring microbial taxonomy from metabolic marker gene sequences remains a challenge, particularly for the frequently sequenced nitrogen fixation marker gene, nitrogenase reductase (nifH). Horizontal gene transfer in recent nifH evolutionary history can confound taxonomic inferences drawn from the pairwise identity methods used in existing software. Other methods for inferring taxonomy are not standardized and require manual inspection that is difficult to scale. Results We present Phylogenetic Placement for Inferring Taxonomy (PPIT), an R package that infers microbial taxonomy from nifH amplicons using both phylogenetic and sequence identity approaches. After users place query sequences on a reference nifH gene tree provided by PPIT (n = 6317 full-length nifH sequences), PPIT searches the phylogenetic neighborhood of each query sequence and attempts to infer microbial taxonomy. An inference is drawn only if references in the phylogenetic neighborhood are: (1) taxonomically consistent and (2) share sufficient pairwise identity with the query, thereby avoiding erroneous inferences due to known horizontal gene transfer events. We find that PPIT returns a higher proportion of correct taxonomic inferences than BLAST-based approaches at the cost of fewer total inferences. We demonstrate PPIT on deep-sea sediment and find that Deltaproteobacteria are the most abundant potential diazotrophs. Using this dataset we show that emending PPIT inferences based on visual inspection of query sequence placement can achieve taxonomic inferences for nearly all sequences in a query set. We additionally discuss how users can apply PPIT to the analysis of other marker genes. Availability PPIT is freely available to non-commercial users at https://github.com/bkapili/ppit. Installation includes a vignette that demonstrates package use and reproduces the nifH amplicon analysis discussed here. The raw nifH amplicon sequence data have been deposited in the GenBank, EMBL, and DDBJ databases under BioProject number PRJEB37167. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A matter of phylogenetic scale: Distinguishing incomplete lineage sorting from lateral gene transfer as the cause of gene tree discord in recent versus deep diversification histories

American Journal of Botany ◽

10.1002/ajb2.1064 ◽

2018 ◽

Vol 105 (3) ◽

pp. 376-384 ◽

Cited By ~ 15

Author(s):

L. Lacey Knowles ◽

Huateng Huang ◽

Jeet Sukumaran ◽

Stephen A. Smith

Keyword(s):

Gene Transfer ◽

Lateral Gene Transfer ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Lineage Sorting

Download Full-text

MSCquartets 1.0: Quartet methods for species trees and networks under the multispecies coalescent model in R

Bioinformatics ◽

10.1093/bioinformatics/btaa868 ◽

2020 ◽

Author(s):

John A Rhodes ◽

Hector Baños ◽

Jonathan D Mitchell ◽

Elizabeth S Allman

Keyword(s):

Network Inference ◽

Incomplete Lineage Sorting ◽

R Package ◽

Species Tree ◽

Supplementary Information ◽

Species Trees ◽

Lineage Sorting ◽

Coalescent Model ◽

Multispecies Coalescent ◽

Tree Inference

Abstract Summary MSCquartets is an R package for species tree hypothesis testing, inference of species trees, and inference of species networks under the Multispecies Coalescent model of incomplete lineage sorting and its network analog. Input for these analyses are collections of metric or topological locus trees which are then summarized by the quartets displayed on them. Results of hypothesis tests at user-supplied levels are displayed in a simplex plot by color-coded points. The package implements the QDC and WQDC algorithms for topological and metric species tree inference, and the NANUQ algorithm for level-1 topological species network inference, all of which give statistically consistent estimators under the model. Availability MSCquartets is available through the Comprehensive R Archive Network: https://CRAN.R-project.org/package=MSCquartets. Supplementary information Supplementary materials, including example data and analyses, are incorporated into the package.

Download Full-text

HGT-Gen: a tool for generating a phylogenetic tree with horizontal gene transfer

Bioinformation ◽

10.6026/97320630007211 ◽

2011 ◽

Vol 7 (5) ◽

pp. 211-213 ◽

Cited By ~ 3

Author(s):

Tokumasa Horiike ◽

Daisuke Miyata ◽

Yoshio Tateno ◽

Ryoichi Minai

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Phylogenetic Tree

Download Full-text

Transfer index, NetUniFrac and some useful shortest path-based distances for community analysis in sequence similarity networks

Bioinformatics ◽

10.1093/bioinformatics/btaa043 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2740-2749

Author(s):

Henry Xing ◽

Steven W Kembel ◽

Vladimir Makarenkov

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Phylogenetic Tree ◽

Shortest Path ◽

Sequence Similarity ◽

Community Analysis ◽

Supplementary Information ◽

Similarity Networks ◽

Transfer Index ◽

Sequence Similarity Networks

Abstract Motivation Phylogenetic trees and the methods for their analysis have played a key role in many evolutionary, ecological and bioinformatics studies. Alternatively, phylogenetic networks have been widely used to analyze and represent complex reticulate evolutionary processes which cannot be adequately studied using traditional phylogenetic methods. These processes include, among others, hybridization, horizontal gene transfer, and genetic recombination. Nowadays, sequence similarity and genome similarity networks have become an efficient tool for community analysis of large molecular datasets in comparative studies. These networks can be used for tackling a variety of complex evolutionary problems such as the identification of horizontal gene transfer events, the recovery of mosaic genes and genomes, and the study of holobionts. Results The shortest path in a phylogenetic tree is used to estimate evolutionary distances between species. We show how the shortest path concept can be extended to sequence similarity networks by defining five new distances, NetUniFrac, Spp, Spep, Spelp and Spinp, and the Transfer index, between species communities present in the network. These new distances can be seen as network analogs of the traditional UniFrac distance used to assess dissimilarity between species communities in a phylogenetic tree, whereas the Transfer index is intended for estimating the rate and direction of gene transfers, or species dispersal, between different phylogenetic, or ecological, species communities. Moreover, NetUniFrac and the Transfer index can be computed in linear time with respect to the number of edges in the network. We show how these new measures can be used to analyze microbiota and antibiotic resistance gene similarity networks. Availability and implementation Our NetFrac program, implemented in R and C, along with its source code, is freely available on Github at the following URL address: https://github.com/XPHenry/Netfrac. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text