scholarly journals A Common Methodological Phylogenomics Framework for intra-patient heteroplasmies to infer SARS-CoV-2 sublineages and tumor clones

2020 ◽  
Author(s):  
Filippo Utro ◽  
Chaya Levovitz ◽  
Kahn Rhrissorrakrai ◽  
Laxmi Parida

AbstractWe present a common methodological framework to infer the phylogenomics from genomic data, be it reads of SARS-CoV-2 of multiple COVID-19 patients or bulk DNAseq of the tumor of a cancer patient. The commonality is in the phylogenetic retrodiction based on the genomic reads in both scenarios. While there is evidence of heteroplasmy, i.e., multiple lineages of SARS-CoV-2 in the same COVID-19 patient; to date, there is no evidence of sublineages recombining within the same patient. The heterogeneity in a patient’s tumor is analogous to intra-patient heteroplasmy and the absence of recombination in the cells of tumor is a widely accepted assumption. Just as the different frequencies of the genomic variants in a tumor presupposes the existence of multiple tumor clones and provides a handle to computationally infer them, we postulate that so do the different variant frequencies in the viral reads, offering the means to infer the multiple co-infecting sublineages. We describe the Concerti computational framework for inferring phylogenies in each of the two scenarios. To demonstrate the accuracy of the method, we reproduce some known results in both scenarios. We also make some additional discoveries. We uncovered new potential parallel mutation in the evolution of the SARS-CoV-2 virus. In the context of cancer, we uncovered new clones harboring resistant mutations to therapy from clinically plausible phylogenetic tree in a patient.

BMC Genomics ◽  
2021 ◽  
Vol 22 (S5) ◽  
Author(s):  
Filippo Utro ◽  
Chaya Levovitz ◽  
Kahn Rhrissorrakrai ◽  
Laxmi Parida

Abstract Background All diseases containing genetic material undergo genetic evolution and give rise to heterogeneity including cancer and infection. Although these illnesses are biologically very different, the ability for phylogenetic retrodiction based on the genomic reads is common between them and thus tree-based principles and assumptions are shared. Just as the different frequencies of tumor genomic variants presupposes the existence of multiple tumor clones and provides a handle to computationally infer them, we postulate that the different variant frequencies in viral reads offers the means to infer multiple co-infecting sublineages. Results We present a common methodological framework to infer the phylogenomics from genomic data, be it reads of SARS-CoV-2 of multiple COVID-19 patients or bulk DNAseq of the tumor of a cancer patient. We describe the Concerti computational framework for inferring phylogenies in each of the two scenarios.To demonstrate the accuracy of the method, we reproduce some known results in both scenarios. We also make some additional discoveries. Conclusions Concerti successfully extracts and integrates information from multi-point samples, enabling the discovery of clinically plausible phylogenetic trees that capture the heterogeneity known to exist both spatially and temporally. These models can have direct therapeutic implications by highlighting “birth” of clones that may harbor resistance mechanisms to treatment, “death” of subclones with drug targets, and acquisition of functionally pertinent mutations in clones that may have seemed clinically irrelevant. Specifically in this paper we uncover new potential parallel mutations in the evolution of the SARS-CoV-2 virus. In the context of cancer, we identify new clones harboring resistant mutations to therapy.


2016 ◽  
Vol 33 (8) ◽  
pp. 2102-2116 ◽  
Author(s):  
Denise Kühnert ◽  
Tanja Stadler ◽  
Timothy G. Vaughan ◽  
Alexei J. Drummond

2016 ◽  
Author(s):  
J.M. Alves ◽  
T. Prieto ◽  
D. Posada

ABSTRACTIt is generally agreed that tumors are composed of multiple cell clones defined by different somatic mutations. Characterizing the evolutionary mechanisms driving this intratumor genetic heterogeneity (ITH) is crucial to improve both cancer diagnosis and therapeutic strategies. For that purpose, recent ITH studies have focused on qualitative comparisons of mutational profiles derived from bulk sequencing of multiple tumor samples extracted from the same patient. Here, we show some examples where the naive use of bulk data in multiregional studies may lead to erroneous inferences of the evolutionary trajectories that underlie tumor progression, including biased timing of somatic mutations, spurious parallel mutation events, and/or incorrect chronological ordering of metastatic events. In addition, we analyze three real datasets to highlight how the use of bulk mutational profiles instead of inferred clones can lead to different conclusions about mutational recurrence and population structure.


2016 ◽  
Author(s):  
Dan Vanderkam ◽  
B. Arman Aksoy ◽  
Isaac Hodes ◽  
Jaclyn Perrone ◽  
Jeffrey Hammerbacher

pileup.js is a new browser-based genome viewer. It is designed to facilitate the investigation of evidence for genomic variants within larger web applications. It takes advantage of recent developments in the JavaScript ecosystem to provide a modular, reliable and easily embedded library.


2015 ◽  
Author(s):  
Andrea Sottoriva ◽  
Trevor Graham

Despite extraordinary efforts to profile cancer genomes on a large scale, interpreting the vast amount of genomic data in the light of cancer evolution and in a clinically relevant manner remains challenging. Here we demonstrate that cancer next-generation sequencing data is dominated by the signature of growth governed by a power-law distribution of mutant allele frequencies. The power-law signature is common to multiple tumor types and is a consequence of the effectively-neutral evolutionary dynamics that underpin the evolution of a large proportion of cancers, giving rise to the abundance of mutations responsible for intra-tumor heterogeneity. Importantly, the law allows the measurement, in each individual cancer, of the in vivo mutation rate and the timing of mutations with remarkable precision. This result provides a new way to interpret cancer genomic data by considering the physics of tumor growth in a way that is both patient-specific and clinically relevant.


GigaScience ◽  
2020 ◽  
Vol 9 (8) ◽  
Author(s):  
Arash Bayat ◽  
Piotr Szul ◽  
Aidan R O’Brien ◽  
Robert Dunne ◽  
Brendan Hosking ◽  
...  

Abstract Background Many traits and diseases are thought to be driven by >1 gene (polygenic). Polygenic risk scores (PRS) hence expand on genome-wide association studies by taking multiple genes into account when risk models are built. However, PRS only considers the additive effect of individual genes but not epistatic interactions or the combination of individual and interacting drivers. While evidence of epistatic interactions ais found in small datasets, large datasets have not been processed yet owing to the high computational complexity of the search for epistatic interactions. Findings We have developed VariantSpark, a distributed machine learning framework able to perform association analysis for complex phenotypes that are polygenic and potentially involve a large number of epistatic interactions. Efficient multi-layer parallelization allows VariantSpark to scale to the whole genome of population-scale datasets with 100,000,000 genomic variants and 100,000 samples. Conclusions Compared with traditional monogenic genome-wide association studies, VariantSpark better identifies genomic variants associated with complex phenotypes. VariantSpark is 3.6 times faster than ReForeSt and the only method able to scale to ultra-high-dimensional genomic data in a manageable time.


2018 ◽  
Author(s):  
Alexander M. Wailan ◽  
Francesc Coll ◽  
Eva Heinz ◽  
Gerry Tonkin-Hill ◽  
Jukka Corander ◽  
...  

ABSTRACTThe ability to distinguish between pathogens is a fundamental requirement to understand the epidemiology of infectious diseases. Phylogenetic analysis of genomic data can provide a powerful platform to identify lineages within bacterial populations, and thus inform outbreak investigation and transmission dynamics. However, resolving differences between pathogens associated with low variant (LV) populations carrying low median pairwise single nucleotide variant (SNV) distances, remains a major challenge. Here we present rPinecone, an R package designed to define sub-lineages within closely related LV populations. rPinecone uses a root-to-tip directional approach to define sub-lineages within a phylogenetic tree according to SNV distance from the ancestral node. The utility of this program was demonstrated using genomic data of two LV populations: a hospital outbreak of methicillin-resistant Staphylococcus aureus and endemic Salmonella Typhi from rural Cambodia. rPinecone identified the transmission branches of the hospital outbreak and geographically-confined lineages in Cambodia. Sub-lineages identified by rPinecone in both analyses were phylogenetically robust. It is anticipated that rPinecone can be used to discriminate between lineages of bacteria from LV populations where other methods fail, enabling a deeper understanding of infectious disease epidemiology for public health purposes.DATA SUMMARYSource code for rPinecone is available on GitHub under the open source licence GNU GPL 3; (url: https://github.com/alexwailan/rpinecone).Newick format files for both phylogenetic trees have been deposited in Figshare; (url: https://doi.org/10.6084/m9.figshare.7022558)Geographical analysis of the S. Typhi Dataset using Microreact is available at https://microreact.org/project/r1IqkrN1X.Accession numbers, meta data and sample lineage results of both datasets used in this paper are listed in the supplementary tables.I/We confirm all supporting data, code and protocols have been provided within the article or through supplementary data files. ⊠IMPACT STATEMENTWhole genome sequence data from bacterial pathogens is increasingly used in the epidemiological investigation of infectious disease, both in outbreak and endemic situations. However, distinguishing bacterial species which are both very similar and which are likely to come from a small geographical and temporal range presents a major technical challenge for epidemiologists. rPinecone was designed to address this challenge and utilises phylogenetic data to define lineages within bacterial populations that have limited variation. This approach is therefore of great interest to epidemiologists as it adds a further level of clarity above and beyond that which is offered by existing approaches which have not been designed to consider bacterial isolates containing variation that only transiently exist, but which is epidemiologically informative. rPinecone has the flexibility to be applied to multiple pathogens and has direct application for investigations of clinical outbreaks and endemic disease to understand transmission dynamics or geographical hotspots of disease.


Sign in / Sign up

Export Citation Format

Share Document