scholarly journals Treeio: An R Package for Phylogenetic Tree Input and Output with Richly Annotated and Associated Data

2019 ◽  
Vol 37 (2) ◽  
pp. 599-603 ◽  
Author(s):  
Li-Gen Wang ◽  
Tommy Tsan-Yuk Lam ◽  
Shuangbin Xu ◽  
Zehan Dai ◽  
Lang Zhou ◽  
...  

Abstract Phylogenetic trees and data are often stored in incompatible and inconsistent formats. The outputs of software tools that contain trees with analysis findings are often not compatible with each other, making it hard to integrate the results of different analyses in a comparative study. The treeio package is designed to connect phylogenetic tree input and output. It supports extracting phylogenetic trees as well as the outputs of commonly used analytical software. It can link external data to phylogenies and merge tree data obtained from different sources, enabling analyses of phylogeny-associated data from different disciplines in an evolutionary context. Treeio also supports export of a phylogenetic tree with heterogeneous-associated data to a single tree file, including BEAST compatible NEXUS and jtree formats; these facilitate data sharing as well as file format conversion for downstream analysis. The treeio package is designed to work with the tidytree and ggtree packages. Tree data can be processed using the tidy interface with tidytree and visualized by ggtree. The treeio package is released within the Bioconductor and rOpenSci projects. It is available at https://www.bioconductor.org/packages/treeio/.

2021 ◽  
Author(s):  
Guangchuang Yu ◽  
shuangbin xu ◽  
Zehan Dai ◽  
Pingfan Guo ◽  
Xiaocong Fu ◽  
...  

Abstract We present the ggtreeExtra package for visualizing heterogeneous data with a phylogenetic tree (https://www.bioocnductor.org/packages/ggtreeExtra). It supports more data types and visualization methods than other tools and has many features that are not available elsewhere. The ggtreeExtra package is a universal tool for tree data visualization. It extends the applications of phylogenetic tree in different disciplines by making more domain specific data to be available to visualize and interpret on the evolutionary context.


2021 ◽  
Author(s):  
Shuangbin Xu ◽  
Zehan Dai ◽  
Pingfan Guo ◽  
Xiaocong Fu ◽  
Shanshan Liu ◽  
...  

Abstract We present the ggtreeExtra package for visualizing heterogeneous data with a phylogenetic tree (https://www.bioconductor.org/packages/ggtreeExtra). It supports more data types and visualization methods than other tools and has many features that are not available elsewhere. The ggtreeExtra package is a universal tool for tree data visualization. It extends the applications of phylogenetic tree in different disciplines by making more domain specific data to be available to visualize and interpret on the evolutionary context.


2019 ◽  
Author(s):  
Kang Jin Kim ◽  
Jaehyun Park ◽  
Sang-Chul Park ◽  
Sungho Won

Abstract Motivation Ecological patterns of the human microbiota exhibit high inter-subject variation, with few operational taxonomic units (OTUs) shared across individuals. To overcome these issues, non-parametric approaches, such as the Mann–Whitney U-test and Wilcoxon rank-sum test, have often been used to identify OTUs associated with host diseases. However, these approaches only use the ranks of observed relative abundances, leading to information loss, and are associated with high false-negative rates. In this study, we propose a phylogenetic tree-based microbiome association test (TMAT) to analyze the associations between microbiome OTU abundances and disease phenotypes. Phylogenetic trees illustrate patterns of similarity among different OTUs, and TMAT provides an efficient method for utilizing such information for association analyses. The proposed TMAT provides test statistics for each node, which are combined to identify mutations associated with host diseases. Results Power estimates of TMAT were compared with existing methods using extensive simulations based on real absolute abundances. Simulation studies showed that TMAT preserves the nominal type-1 error rate, and estimates of its statistical power generally outperformed existing methods in the considered scenarios. Furthermore, TMAT can be used to detect phylogenetic mutations associated with host diseases, providing more in-depth insight into bacterial pathology. Availability and implementation The 16S rRNA amplicon sequencing metagenomics datasets for colorectal carcinoma and myalgic encephalomyelitis/chronic fatigue syndrome are available from the European Nucleotide Archive (ENA) database under project accession number PRJEB6070 and PRJEB13092, respectively. TMAT was implemented in the R package. Detailed information is available at http://healthstat.snu.ac.kr/software/tmat. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Guangchuang Yu

AbstractGgtree supports mapping and visualizing associated external data on phylogeny with two general methods. The output of ggtree is a ggtree graphic object that can be rendered as a static image. Most importantly, the input tree and associated data that used in visualization can be extracted from the graphic object, making it an ideal data structure for publishing tree (image, tree and data in one single object) and thus enhance data reuse and analytic reproducibility.


2016 ◽  
Vol 8 (1) ◽  
pp. 28-36 ◽  
Author(s):  
Guangchuang Yu ◽  
David K. Smith ◽  
Huachen Zhu ◽  
Yi Guan ◽  
Tommy Tsan‐Yuk Lam

Author(s):  
Darawan Rinchai ◽  
Jessica Roelands ◽  
Mohammed Toufiq ◽  
Wouter Hendrickx ◽  
Matthew C Altman ◽  
...  

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Javier Fernández-López ◽  
M. Teresa Telleria ◽  
Margarita Dueñas ◽  
Mara Laguna-Castro ◽  
Klaus Schliep ◽  
...  

AbstractThe use of different sources of evidence has been recommended in order to conduct species delimitation analyses to solve taxonomic issues. In this study, we use a maximum likelihood framework to combine morphological and molecular traits to study the case of Xylodon australis (Hymenochaetales, Basidiomycota) using the locate.yeti function from the phytools R package. Xylodon australis has been considered a single species distributed across Australia, New Zealand and Patagonia. Multi-locus phylogenetic analyses were conducted to unmask the actual diversity under X. australis as well as the kinship relations respect their relatives. To assess the taxonomic position of each clade, locate.yeti function was used to locate in a molecular phylogeny the X. australis type material for which no molecular data was available using morphological continuous traits. Two different species were distinguished under the X. australis name, one from Australia–New Zealand and other from Patagonia. In addition, a close relationship with Xylodon lenis, a species from the South East of Asia, was confirmed for the Patagonian clade. We discuss the implications of our results for the biogeographical history of this genus and we evaluate the potential of this method to be used with historical collections for which molecular data is not available.


2021 ◽  
Vol 82 (1-2) ◽  
Author(s):  
Lena Collienne ◽  
Alex Gavryushkin

AbstractMany popular algorithms for searching the space of leaf-labelled (phylogenetic) trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is $${\mathbf {N}}{\mathbf {P}}$$ N P -hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although anked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to $${\mathbf {N}}{\mathbf {P}}$$ N P -hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with tens of thousands of leaves (and likely hundreds of thousands if implemented efficiently).


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Peter W. Eide ◽  
Seyed H. Moosavi ◽  
Ina A. Eilertsen ◽  
Tuva H. Brunsell ◽  
Jonas Langerud ◽  
...  

AbstractGene expression-based subtypes of colorectal cancer have clinical relevance, but the representativeness of primary tumors and the consensus molecular subtypes (CMS) for metastatic cancers is not well known. We investigated the metastatic heterogeneity of CMS. The best approach to subtype translation was delineated by comparisons of transcriptomic profiles from 317 primary tumors and 295 liver metastases, including multi-metastatic samples from 45 patients and 14 primary-metastasis sets. Associations were validated in an external data set (n = 618). Projection of metastases onto principal components of primary tumors showed that metastases were depleted of CMS1-immune/CMS3-metabolic signals, enriched for CMS4-mesenchymal/stromal signals, and heavily influenced by the microenvironment. The tailored CMS classifier (available in an updated version of the R package CMScaller) therefore implemented an approach to regress out the liver tissue background. The majority of classified metastases were either CMS2 or CMS4. Nonetheless, subtype switching and inter-metastatic CMS heterogeneity were frequent and increased with sampling intensity. Poor-prognostic value of CMS1/3 metastases was consistent in the context of intra-patient tumor heterogeneity.


2019 ◽  
Vol 36 (7) ◽  
pp. 2017-2024
Author(s):  
Weiwei Zhang ◽  
Ziyi Li ◽  
Nana Wei ◽  
Hua-Jun Wu ◽  
Xiaoqi Zheng

Abstract Motivation Inference of differentially methylated (DM) CpG sites between two groups of tumor samples with different geno- or pheno-types is a critical step to uncover the epigenetic mechanism of tumorigenesis, and identify biomarkers for cancer subtyping. However, as a major source of confounding factor, uneven distributions of tumor purity between two groups of tumor samples will lead to biased discovery of DM sites if not properly accounted for. Results We here propose InfiniumDM, a generalized least square model to adjust tumor purity effect for differential methylation analysis. Our method is applicable to a variety of experimental designs including with or without normal controls, different sources of normal tissue contaminations. We compared our method with conventional methods including minfi, limma and limma corrected by tumor purity using simulated datasets. Our method shows significantly better performance at different levels of differential methylation thresholds, sample sizes, mean purity deviations and so on. We also applied the proposed method to breast cancer samples from TCGA database to further evaluate its performance. Overall, both simulation and real data analyses demonstrate favorable performance over existing methods serving similar purpose. Availability and implementation InfiniumDM is a part of R package InfiniumPurify, which is freely available from GitHub (https://github.com/Xiaoqizheng/InfiniumPurify). Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document