scholarly journals Efficient Bayesian inference of phylogenetic trees from large scale, low-depth genome-wide single-cell data

Author(s):  
Fatemeh Dorri ◽  
Sohrab Salehi ◽  
Kevin Chern ◽  
Tyler Funnell ◽  
Marc Williams ◽  
...  

A new generation of scalable single cell whole genome sequencing (scWGS) methods, allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cells populations. Phylogenetic reconstruction is central to identifying sub-populations and distinguishing mutational processes. The ability to sequence tens of thousands of single genomes at high resolution per experiment is challenging the assumptions and scalability of existing phylogenetic tree building methods and calls for tailored phylogenetic models and scalable inference algorithms. We propose a phylogenetic model and associated Bayesian inference procedure which exploits the specifics of scWGS data. A first highlight of our approach is a novel phylogenetic encoding of copy-number data providing an attractive statistical-computational trade-off by simplifying the site dependencies induced by rearrangements while still forming a sound foundation to phylogenetic inference. A second highlight is an innovative phylogenetic tree exploration move which makes the cost of MCMC iterations bounded by O(|C| + |L|), where |C| is the number of cells and |L| is the number of loci. In contrast, existing off-the-shelf likelihood-based methods incur iteration cost of O(|C| |L|). Moreover, the novel move considers an exponential number of neighbouring trees whereas off-the-shelf moves consider a polynomial size set of neighbours. The third highlight is a novel mutation calling method that incorporates the copy-number data and the underlying phylogenetic tree to overcome the missing data issue. This framework allows us to realistically consider routine Bayesian phylogenetic inference at the scale of scWGS data.

2014 ◽  
Author(s):  
Tyler Garvin ◽  
Robert Aboukhalil ◽  
Jude Kendall ◽  
Timour Baslan ◽  
Gurinder S. Atwal ◽  
...  

We present an open-source visual-analytics web platform, Ginkgo (http://qb.cshl.edu/ginkgo), for the interactive analysis and quality assessment of single-cell copy-number alterations. Ginkgo automatically constructs copy-number profiles of individual cells from mapped reads, as well as constructing phylogenetic trees of related cells. We validate Ginkgo by reproducing the results of five major studies and examine the data characteristics of three commonly used single-cell amplification techniques to conclude DOP-PCR to be the most consistent for CNV analysis.


2021 ◽  
Author(s):  
Pedro F Ferreira ◽  
Jack Kuipers ◽  
Niko Beerenwinkel

Cancer arises and evolves by the accumulation of somatic mutations that provide a selective advantage. The interplay of mutations and their functional consequences shape the evolutionary dynamics of tumors and contribute to different clinical outcomes. In the absence of scalable methods to jointly assay genomic and transcriptomic profiles of the same individual cell, the two data modalities are usually measured separately and need to be integrated computationally. Here, we introduce SCATrEx, a statistical model to map single-cell gene expression data onto the evolutionary history of copy number alterations of the tumor. SCATrEx jointly assigns cancer cells assayed with scRNA-seq to copy number profiles arranged in a copy number aberration tree and augments the tree with clone-specific clusters. Our simulations show that SCATrEx improves over both state-of-the-art unsupervised clustering methods and cell-to-clone assignment methods. In an application to real data, we observe that SCATrEx finds inter-clone and intra-clone gene expression heterogeneity not detectable using other integration methods. SCATrEx will allow for a better understanding of tumor evolution by jointly analysing the genomic and transcriptomic changes that drive it.


Author(s):  
Klaus Schliep ◽  
Alastair Alastair Potts ◽  
David A Morrison ◽  
Guido W Grimm

The fields of phylogenetic tree and network inference have dramatically advanced in the last decade, but independently with few attempts to bridge them. Here we provide a framework, implemented in the phangorn library in R, to transfer information between trees and networks. This includes: 1) identifying and labelling equivalent tree branches and network edges, 2) transferring branch support to network edges, and 3) mapping bipartition support from a sample of trees (e.g. from bootstrapping or Bayesian inference) onto network edges. The ability to readily combine tree and network information should lead to more comprehensive evolutionary comparisons and conclusions.


2016 ◽  
Author(s):  
Klaus Schliep ◽  
Alastair Alastair Potts ◽  
David A Morrison ◽  
Guido W Grimm

The fields of phylogenetic tree and network inference have dramatically advanced in the last decade, but independently with few attempts to bridge them. Here we provide a framework, implemented in the phangorn library in R, to transfer information between trees and networks. This includes: 1) identifying and labelling equivalent tree branches and network edges, 2) transferring branch support to network edges, and 3) mapping bipartition support from a sample of trees (e.g. from bootstrapping or Bayesian inference) onto network edges. The ability to readily combine tree and network information should lead to more comprehensive evolutionary comparisons and conclusions.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Stefan Kurtenbach ◽  
Anthony M. Cruz ◽  
Daniel A. Rodriguez ◽  
Michael A. Durante ◽  
J. William Harbour

Abstract Background Recent advances in single cell sequencing technologies allow for greater resolution in assessing tumor clonality using chromosome copy number variations (CNVs). While single cell DNA sequencing technologies are ideal to identify tumor sub-clones, they remain expensive and in contrast to single cell RNA-seq (scRNA-seq) methods are more limited in the data they generate. However, CNV data can be inferred from scRNA-seq and bulk RNA-seq, for which several tools have been developed, including inferCNV, CaSpER, and HoneyBADGER. Inferences regarding tumor clonality from CNV data (and other sources) are frequently visualized using phylogenetic plots, which previously required time-consuming and error-prone, manual analysis. Results Here, we present Uphyloplot2, a python script that generates phylogenetic plots directly from inferred RNA-seq data, or any Newick formatted dendrogram file. The tool is publicly available at https://github.com/harbourlab/UPhyloplot2/. Conclusions Uphyloplot2 is an easy-to-use tool to generate phylogenetic plots to depict tumor clonality from scRNA-seq data and other sources.


2020 ◽  
Author(s):  
Stefan Kurtenbach ◽  
Daniel A. Rodriguez ◽  
Michael A. Durante ◽  
J. William Harbour

AbstractRecent advances in single cell sequencing technologies allow for greater resolution in assessing tumor clonality using chromosome copy number variations (CNVs), which can be inferred from single cell RNA-seq (scRNA-seq) data using applications such as inferCNV. Inferences regarding tumor clonality are frequently visualized using phylogenetic plots, which previously required time-consuming and tedious manual analysis. Here, we present UPhyloplot2, a python script that generates phylogenetic plots directly from inferCNV output files. The tool is publicly available at https://github.com/harbourlab/UPhyloplot2/.


2020 ◽  
Vol 13 (10) ◽  
pp. 2118-2125
Author(s):  
Levon Aslanyan ◽  
Hranush Avagyan ◽  
Zaven Karalyan

Aim: A genome-scale phylogenetic analysis was used to infer the evolutionary dynamics of Asfarviridae – African swine fever virus (ASFV) – and better define its genetic diversity. Materials and Methods: All complete ASFV genomes from NCBI's resource as of March 2020 were used. The phylogenetic analysis used maximum likelihood and neighbor-joining methods. The evolutionary models detection was done with the help of the package of programs MEGA-X. Algorithms were used to build phylogenetic trees for type B DNA polymerases of ASFV (n=34) and HcDNAV (n=2), as an external group. Results: An expedient categorization of the Asfarviridae family uses five clades. Genotype 1 (except for LIV 5/40 virus isolate) as well genotype 7 are assigned to the alpha clade; genotype 2 to the beta clade; genotypes 8, 9, and 10 to the gamma clade; genotype 5 to the delta clade; and genotypes 3, 4, and 20, as well as genotype 22 and the LIV 5/40 isolate to the epsilon clade. Branch lengths on the phylogenetic tree are proportional to genetic distance along the branch. Branches at the phylogenetic tree of Asfarviridae are much shorter than branches for Baculoviridae. Shorter branches in ASFVs population suggest that Asfarviridae evolved relatively recently and remain more closely related. Conclusion: We suggest applying more robust standards using whole genomes to ensure the correct classification of ASFV and maintain phylogeny as a useful tool.


Entropy ◽  
2019 ◽  
Vol 21 (3) ◽  
pp. 313
Author(s):  
Jun Feng ◽  
Zeyun Liu ◽  
Hongwei Feng ◽  
Richard Sutcliffe ◽  
Jianni Liu ◽  
...  

To address the instability of phylogenetic trees in morphological datasets caused by missing values, we present a phylogenetic inference method based on a concept decision tree (CDT) in conjunction with attribute reduction. First, a reliable initial phylogenetic seed tree is created using a few species with relatively complete morphological information by using biologists’ prior knowledge or by applying existing tools such as MrBayes. Second, using a top-down data processing approach, we construct concept-sample templates by performing attribute reduction at each node in the initial phylogenetic seed tree. In this way, each node is turned into a decision point with multiple concept-sample templates, providing decision-making functions for grafting. Third, we apply a novel matching algorithm to evaluate the degree of similarity between the species’ attributes and their concept-sample templates and to determine the location of the species in the initial phylogenetic seed tree. In this manner, the phylogenetic tree is established step by step. We apply our algorithm to several datasets and compare it with the maximum parsimony, maximum likelihood, and Bayesian inference methods using the two evaluation criteria of accuracy and stability. The experimental results indicate that as the proportion of missing data increases, the accuracy of the CDT method remains at 86.5%, outperforming all other methods and producing a reliable phylogenetic tree.


2019 ◽  
Author(s):  
Gryte Satas ◽  
Simone Zaccaria ◽  
Geoffrey Mon ◽  
Benjamin J. Raphael

AbstractMotivationSingle-cell DNA sequencing enables the measurement of somatic mutations in individual tumor cells, and provides data to reconstruct the evolutionary history of the tumor. Nearly all existing methods to construct phylogenetic trees from single-cell sequencing data use single-nucleotide variants (SNVs) as markers. However, most solid tumors contain copy-number aberrations (CNAs) which can overlap loci containing SNVs. Particularly problematic are CNAs that delete an SNV, thus returning the SNV locus to the unmutated state. Such mutation losses are allowed in some models of SNV evolution, but these models are generally too permissive, allowing mutation losses without evidence of a CNA overlapping the locus.ResultsWe introduce a novel loss-supported evolutionary model, a generalization of the infinite sites and Dollo models, that constrains mutation losses to loci with evidence of a decrease in copy number. We design a new algorithm, Single-Cell Algorithm for Reconstructing the Loss-supported Evolution of Tumors (Scarlet), that infers phylogenies from single-cell tumor sequencing data using the loss-supported model and a probabilistic model of sequencing errors and allele dropout. On simulated data, we show that Scarlet outperforms current single-cell phylogeny methods, recovering more accurate trees and correcting errors in SNV data. On single-cell sequencing data from a metastatic colorectal cancer patient, Scarlet constructs a phylogeny that is both more consistent with the observed copy-number data and also reveals a simpler monooclonal seeding of the metastasis, contrasting with published reports of polyclonal seeding in this patient. Scarlet substantially improves single-cell phylogeny inference in tumors with CNAs, yielding new insights into the analysis of tumor evolution.AvailabilitySoftware is available at github.com/raphael-group/[email protected]


Sign in / Sign up

Export Citation Format

Share Document