MonoPhy: A simple R package to find and visualize monophyly issues

10.7287/peerj.preprints.1600 ◽

2015 ◽

Author(s):

Orlando Schwery ◽

Brian C O'Meara

Keyword(s):

Phylogenetic Tree ◽

R Package ◽

Input File ◽

Higher Taxa ◽

Additional Input

Background. The monophyly of taxa is an important attribute of a phylogenetic tree, as a lack of it may hint at shortcomings of either the tree or the current taxonomy and can misguide subsequent analyses. While monophyly is conceptually simple, it is manually tedious and time consuming to assess on modern phylogenies of hundreds to thousands of species. Results. The R package MonoPhy allows assessment and exploration of monophyly of taxa in a phylogeny. It can assess the monophyly of genera using the phylogeny only, and with an additional input file, any other desired higher taxa or unranked groups can be checked as well. Conclusion. Summary tables, easily subsettable results and several visualization options allow quick and convenient exploration of monophyly issues, thus making MonoPhy a valuable tool for any researcher working with phylogenies.

Download Full-text

MonoPhy: a simple R package to find and visualize monophyly issues

PeerJ Computer Science ◽

10.7717/peerj-cs.56 ◽

2016 ◽

Vol 2 ◽

pp. e56 ◽

Cited By ~ 11

Author(s):

Orlando Schwery ◽

Brian C. O’Meara

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Phylogenetic Tree ◽

Incomplete Lineage Sorting ◽

R Package ◽

Higher Order ◽

Input File ◽

Lineage Sorting ◽

Additional Input

Background.The monophyly of taxa is an important attribute of a phylogenetic tree. A lack of it may hint at shortcomings of either the tree or the current taxonomy, or can indicate cases of incomplete lineage sorting or horizontal gene transfer. Whichever is the reason, a lack of monophyly can misguide subsequent analyses. While monophyly is conceptually simple, it is manually tedious and time consuming to assess on modern phylogenies of hundreds to thousands of species.Results.The R packageMonoPhyallows assessment and exploration of monophyly of taxa in a phylogeny. It can assess the monophyly of genera using the phylogeny only, and with an additional input file any other desired higher order taxa or unranked groups can be checked as well.Conclusion.Summary tables, easily subsettable results and several visualization options allow quick and convenient exploration of monophyly issues, thus makingMonoPhya valuable tool for any researcher working with phylogenies.

Download Full-text

Treeio: An R Package for Phylogenetic Tree Input and Output with Richly Annotated and Associated Data

Molecular Biology and Evolution ◽

10.1093/molbev/msz240 ◽

2019 ◽

Vol 37 (2) ◽

pp. 599-603 ◽

Cited By ~ 25

Author(s):

Li-Gen Wang ◽

Tommy Tsan-Yuk Lam ◽

Shuangbin Xu ◽

Zehan Dai ◽

Lang Zhou ◽

...

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

R Package ◽

External Data ◽

Input And Output ◽

Evolutionary Context ◽

Tree Data ◽

Downstream Analysis ◽

Different Sources ◽

Associated Data

Abstract Phylogenetic trees and data are often stored in incompatible and inconsistent formats. The outputs of software tools that contain trees with analysis findings are often not compatible with each other, making it hard to integrate the results of different analyses in a comparative study. The treeio package is designed to connect phylogenetic tree input and output. It supports extracting phylogenetic trees as well as the outputs of commonly used analytical software. It can link external data to phylogenies and merge tree data obtained from different sources, enabling analyses of phylogeny-associated data from different disciplines in an evolutionary context. Treeio also supports export of a phylogenetic tree with heterogeneous-associated data to a single tree file, including BEAST compatible NEXUS and jtree formats; these facilitate data sharing as well as file format conversion for downstream analysis. The treeio package is designed to work with the tidytree and ggtree packages. Tree data can be processed using the tidy interface with tidytree and visualized by ggtree. The treeio package is released within the Bioconductor and rOpenSci projects. It is available at https://www.bioconductor.org/packages/treeio/.

Download Full-text

Inference of Adaptive Shifts for Multivariate Correlated Traits

10.1101/146191 ◽

2017 ◽

Cited By ~ 2

Author(s):

Paul Bastide ◽

Cécile Ané ◽

Stéphane Robin ◽

Mahendra Mariadassou

Keyword(s):

Phylogenetic Tree ◽

Missing Values ◽

Expectation Maximization Algorithm ◽

Selection Criterion ◽

Principal Component ◽

Likelihood Estimation ◽

R Package ◽

Stabilizing Selection ◽

New World Monkeys ◽

Wide Range

AbstractTo study the evolution of several quantitative traits, the classical phylogenetic comparative framework consists of a multivariate random process running along the branches of a phylogenetic tree. The Ornstein-Uhlenbeck (OU) process is sometimes preferred to the simple Brownian Motion (BM) as it models stabilizing selection toward an optimum. The optimum for each trait is likely to be changing over the long periods of time spanned by large modern phylogenies. Our goal is to automatically detect the position of these shifts on a phylogenetic tree, while accounting for correlations between traits, which might exist because of structural or evolutionary constraints. We show that, in the presence shifts, phylogenetic Principal Component Analysis (pPCA) fails to decorrelate traits efficiently, so that any method aiming at finding shift needs to deal with correlation simultaneously. We introduce here a simplification of the full multivariate OU model, named scalar OU (scOU), which allows for noncausal correlations and is still computationally tractable. We extend the equivalence between the OU and a BM on a re-scaled tree to our multivariate framework. We describe an Expectation Maximization algorithm that allows for a maximum likelihood estimation of the shift positions, associated with a new model selection criterion, accounting for the identifiability issues for the shift localization on the tree. The method, freely available as an R-package (PhylogeneticEM) is fast, and can deal with missing values. We demonstrate its efficiency and accuracy compared to another state-of-the-art method (ℓ1ou) on a wide range of simulated scenarios, and use this new framework to re-analyze recently gathered datasets on New World Monkeys and Anolis lizards.

Download Full-text

Phylogeny-Guided Microbiome OTU-Specific Association Test (POST)

10.21203/rs.3.rs-1017592/v1 ◽

2021 ◽

Author(s):

Caizhi Huang ◽

Benjamin John Callahan ◽

Michael C Wu ◽

Shannon T. Holloway ◽

Hayden Brochu ◽

...

Keyword(s):

Phylogenetic Tree ◽

Real Data ◽

R Package ◽

Association Test ◽

Public Access ◽

Phylogenetic Distance ◽

Phylogenetic Information ◽

Phylogenic Tree ◽

Kernel Machine ◽

Specific Association

Abstract Background: The relationship between host conditions and microbiome profiles, typically characterized by operational taxonomic units (OTUs), contains important information about the microbial role in human health. Traditional association testing frameworks are challenged by the high-dimensionality and sparsity of typical microbiome profiles. Incorporating phylogenetic information is often used to address these challenges with the assumption that evolutionarily similar taxa tend to behave similarly. However, this assumption may not always be valid due to the complex effect of microbes, and phylogenetic information should be incorporated in a data-supervised fashion. Results: In this work, we propose a local collapsing test called Phylogeny-guided microbiome OTU-Specific association Test (POST). In POST, whether or not to borrow information and how much information to borrow from the neighboring OTUs in the phylogenic tree are supervised by phylogenetic distance and the outcome-OTU association. POST is constructed under the kernel machine framework to accommodate complex OTU effects and extends kernel machine microbiome tests from community-level to OTU-level. Using simulation studies, we showed that when the phylogenetic tree is informative, POST has better performance than existing OTU-level association tests. When the phylogenetic tree is not informative, POST achieves similar performance as existing methods. Finally, we show that POST can identify more outcome-associated OTUs that are of biological relevance in real data applications on bacterial vaginosis and on preterm birth. Conclusions: Using POST, we show that the power of detecting associated microbiome features can be enhanced by adaptively leveraging the phylogenetic information when testing for a target OTU. We developed an user friendly R package POSTm which is now available at CRAN (https://CRAN.R-project.org/package=POSTm) for public access.

Download Full-text

Transmission trees on a known pathogen phylogeny: enumeration and sampling

10.1101/160812 ◽

2017 ◽

Author(s):

Matthew Hall ◽

Caroline Colijn

Keyword(s):

Phylogenetic Tree ◽

Disease Transmission ◽

R Package ◽

Infectious Disease Transmission ◽

New Host ◽

Mathematical Properties ◽

Multiple Sampling ◽

Individual Host ◽

Branch Lengths ◽

Incomplete Sampling

AbstractOne approach to the reconstruction of infectious disease transmission trees from pathogen genomic data has been to use a phylogenetic tree, reconstructed from pathogen sequences, and annotate its internal nodes to provide a reconstruction of which host each lineage was in at each point in time. If only one pathogen lineage can be transmitted to a new host (i.e. the transmission bottleneck is complete), this corresponds to partitioning the nodes of the phylogeny into connected regions, each of which represents evolution in an individual host. These partitions define the possible transmission trees that are consistent with a given phylogenetic tree. However, the mathematical properties of the transmission trees given a phylogeny remain largely unexplored. Here, we describe a procedure to calculate the number of possible transmission trees for a given phylogeny, and we show how to uniformly sample from these transmission trees. The procedure is outlined for situations where one sample is available from each host and trees do not have branch lengths, and we also provide extensions for incomplete sampling, multiple sampling, and the application to time trees in a situation where limits on the period during which each host could have been infected are known. The sampling algorithm is available as an R package (STraTUS).

Download Full-text

Phylogenetic tree-based microbiome association test

Bioinformatics ◽

10.1093/bioinformatics/btz686 ◽

2019 ◽

Author(s):

Kang Jin Kim ◽

Jaehyun Park ◽

Sang-Chul Park ◽

Sungho Won

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Statistical Power ◽

False Negative ◽

Amplicon Sequencing ◽

R Package ◽

Chronic Fatigue ◽

Association Test ◽

Supplementary Information ◽

Association Analyses

Abstract Motivation Ecological patterns of the human microbiota exhibit high inter-subject variation, with few operational taxonomic units (OTUs) shared across individuals. To overcome these issues, non-parametric approaches, such as the Mann–Whitney U-test and Wilcoxon rank-sum test, have often been used to identify OTUs associated with host diseases. However, these approaches only use the ranks of observed relative abundances, leading to information loss, and are associated with high false-negative rates. In this study, we propose a phylogenetic tree-based microbiome association test (TMAT) to analyze the associations between microbiome OTU abundances and disease phenotypes. Phylogenetic trees illustrate patterns of similarity among different OTUs, and TMAT provides an efficient method for utilizing such information for association analyses. The proposed TMAT provides test statistics for each node, which are combined to identify mutations associated with host diseases. Results Power estimates of TMAT were compared with existing methods using extensive simulations based on real absolute abundances. Simulation studies showed that TMAT preserves the nominal type-1 error rate, and estimates of its statistical power generally outperformed existing methods in the considered scenarios. Furthermore, TMAT can be used to detect phylogenetic mutations associated with host diseases, providing more in-depth insight into bacterial pathology. Availability and implementation The 16S rRNA amplicon sequencing metagenomics datasets for colorectal carcinoma and myalgic encephalomyelitis/chronic fatigue syndrome are available from the European Nucleotide Archive (ENA) database under project accession number PRJEB6070 and PRJEB13092, respectively. TMAT was implemented in the R package. Detailed information is available at http://healthstat.snu.ac.kr/software/tmat. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Marine halogenated compound analysis: from an R package to the isolation of new griseophenone derivatives

Planta Medica ◽

10.1055/s-0036-1596648 ◽

2016 ◽

Vol 81 (S 01) ◽

pp. S1-S381

Author(s):

C Roullier ◽

Y Guitton ◽

S Prado ◽

O Grovel ◽

YF Pouchus

Keyword(s):

R Package ◽

Halogenated Compound

Download Full-text

Sequencing Analysis and Phylogenetic Tree of HPV Isolated from Breast Cancer Patients at Thi-Qar Province/Iraq

10.32792/utq/utjsci/vol7/2/10 ◽

2020 ◽

pp. 37-40

Keyword(s):

Nucleotide Sequence ◽

Phylogenetic Tree ◽

Evolutionary Relationship ◽

Nucleotide Sequencing ◽

Sequencing Analysis ◽

Breast Cancer Patients ◽

The North ◽

Formalin Fixed Paraffin ◽

Formalin Fixed Paraffin Embedded ◽

History Of

Genetic variety examination has demonstrated fundamental to the understanding of the epidemiological and developmental history of Papillomavirus (HPV), for the development of accurate diagnostic tests and for efficient vaccine design. The HPV nucleotide diversity has been investigated widely among high-risk HPV types. To make the nucleotide sequence of HPV and do the virus database in Thi-Qar province, and compare sequences of our isolates with previously described isolates from around the world and then draw its phylogenetic tree, this study done. A total of 6 breast formalin-fixed paraffin-embedded (FFPE) of the female patients were included in the study, divided as 4 FFPE malignant tumor and 2 FFPE of benign tumor. The PCR technique was implemented to detect the presence of HPV in breast tissue, and the real-time PCR used to determinant HPV genotypes, then determined a complete nucleotide sequence of HPV of L1 capsid gene, and draw its phylogenetic tree. The nucleotide sequencing finding detects a number of substitution mutation (SNPs) in (L1) gene, which have not been designated before, were identified once in this study population, and revealed that the HPV16 strains have the evolutionary relationship with the South African race, while, the HPV33 and HPV6 showing the evolutionary association with the North American and East Asian race, respectively.

Download Full-text

The Orchard Plot: Cultivating a Forest Plot for Use in Ecology, Evolution and Beyond

10.32942/osf.io/epqa7 ◽

2019 ◽

Author(s):

Shinichi Nakagawa ◽

Malgorzata Lagisz ◽

Rose E O'Dea ◽

Joanna Rutkowska ◽

Yefeng Yang ◽

...

Keyword(s):

Meta Analysis ◽

R Package ◽

Effect Sizes ◽

Forest Plot ◽

Point Estimates ◽

Aggregate Effect ◽

The Individual ◽

Meta Analyses ◽

Heterogeneous Effect ◽

Intuitive Interpretation

‘Classic’ forest plots show the effect sizes from individual studies and the aggregate effect from a meta-analysis. However, in ecology and evolution meta-analyses routinely contain over 100 effect sizes, making the classic forest plot of limited use. We surveyed 102 meta-analyses in ecology and evolution, finding that only 11% use the classic forest plot. Instead, most used a ‘forest-like plot’, showing point estimates (with 95% confidence intervals; CIs) from a series of subgroups or categories in a meta-regression. We propose a modification of the forest-like plot, which we name the ‘orchard plot’. Orchard plots, in addition to showing overall mean effects and CIs from meta-analyses/regressions, also includes 95% prediction intervals (PIs), and the individual effect sizes scaled by their precision. The PI allows the user and reader to see the range in which an effect size from a future study may be expected to fall. The PI, therefore, provides an intuitive interpretation of any heterogeneity in the data. Supplementing the PI, the inclusion of underlying effect sizes also allows the user to see any influential or outlying effect sizes. We showcase the orchard plot with example datasets from ecology and evolution, using the R package, orchard, including several functions for visualizing meta-analytic data using forest-plot derivatives. We consider the orchard plot as a variant on the classic forest plot, cultivated to the needs of meta-analysts in ecology and evolution. Hopefully, the orchard plot will prove fruitful for visualizing large collections of heterogeneous effect sizes regardless of the field of study.

Download Full-text