A computational framework for detecting signatures of accelerated somatic evolution in cancer genomes

ABSTRACTBy accumulation of somatic mutations, cancer genomes evolve, diverging away from the genome of the host. It remains unclear to what extent somatic evolutionary divergence is comparable across different regions of the cancer genome versus concentrated in specific genomic elements. We present a novel computational framework, SASE-mapper, to identify genomic regions that show signatures of accelerated somatic evolution (SASE) in a subset of samples in a cohort, marked by accumulation of an excess of somatic mutations compared to that expected based on local, context-aware background mutation rates in the cancer genomes. Analyzing tumor whole genome sequencing data for 365 samples from 5 cohorts we detect recurrent SASE at a genome-wide scale. The SASEs were enriched for genomic elements associated with active chromatin, and regulatory regions of several known cancer genes had SASE in multiple cohorts. Regions with SASE carried specific mutagenic signatures and often co-localized within the 3D nuclear space suggesting their common basis. A subset of SASEs was frequently associated with regulatory changes in key cancer pathways and also poor clinical outcome. While the SASE-associated mutations were not necessarily recurrent at base-pair resolution, the SASEs recurrently targeted same functional regions, with similar consequences. It is likely that regulatory redundancy and plasticity promote prevalence of SASE-like patterns in the cancer genomes.

Download Full-text

Reconstruction of clone- and haplotype-specific cancer genome karyotypes from bulk tumor samples

10.1101/560839 ◽

2019 ◽

Cited By ~ 5

Author(s):

Sergey Aganezov ◽

Benjamin J. Raphael

Keyword(s):

Sequence Data ◽

Evolutionary Model ◽

Response To Treatment ◽

Nucleotide Position ◽

Structural Variants ◽

Sequencing Data ◽

Somatic Evolution ◽

A Genome ◽

Cancer Genomes ◽

Specific Cancer

AbstractMany cancer genomes are extensively rearranged with highly aberrant chromosomal karyotypes. These genome rearrangements, or structural variants, can be detected in tumor DNA sequencing data by abnormal mapping of se-quence reads to the reference genome. However, nearly all cancer sequencing to date is of bulk tumor samples which consist of a heterogeneous mixture of normal cells and subpopulations of cancers cells, or clones, that harbor distinct somatic structural variants. We introduce a novel algorithm, Reconstructing Cancer Karyotypes (RCK), to reconstruct haplotype-specific karyotypes of one or more rearranged cancer genomes, or clones, that best explain the read alignments from a bulk tumor sample. RCK leverages specific evolutionary constraints on the somatic mutation process in cancer to reduce ambiguity in the deconvolution of admixed DNA sequence data into multiple haplotype-specific cancer karyotypes. In particular, RCK relies on generalizations of the infinite sites assumption that a genome re-arrangement is highly unlikely to occur at the same nucleotide position more than once during somatic evolution. RCK’s comprehensive model allows us to incorporate information both from short and long-read sequencing technologies and is applicable to bulk tumor samples containing a mixture of an arbitrary number of derived genomes. We compared RCK to the state-of-the-art method ReMixT on a dataset of 17 primary and metastatic prostate cancer samples. We demonstrate that ReMixT’s limited support for heterogeneity and lack of evolutionary constrains leads to reconstruction of implausible karyotypes. In contrast, RCK’s infers cancer karyotypes that better explain read alignments from bulk tumor samples and are consistent with a reasonable evolutionary model. RCK’s reconstructions of clone- and haplotype-specific karyotypes will aid further studies of the role of intra-tumor heterogeneity in cancer development and response to treatment. RCK is available at https://github.com/raphael-group/RCK.

Download Full-text

Network analysis reveals differential metabolic functionality in antibiotic-resistantPseudomonas aeruginosa

10.1101/303289 ◽

2018 ◽

Author(s):

Laura J. Dunphy ◽

Phillip Yen ◽

Jason A. Papin

Keyword(s):

Antibiotic Resistance ◽

Carbon Sources ◽

Growth Dynamics ◽

Metabolic Phenotype ◽

Metabolic Adaptation ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Drug Induced ◽

Antibiotic Resistant ◽

A Genome

AbstractMetabolic adaptations accompanying the development of antibiotic resistance in bacteria remain poorly understood. To interrogate this relationship, we profiled the growth of lab-evolved antibiotic-resistant lineages of the opportunistic pathogenPseudomonas aeruginosaacross 190 unique carbon sources. We semi-automatically calculated growth dynamics (maximum growth density, growth rate, and time to mid-exponential phase) of over 2,800 growth curves. These data revealed that the evolution of antibiotic resistance resulted in systems-level changes to growth dynamics and metabolic phenotype. Drug-resistant lineages predominantly displayed decreased growth relative to the ancestral lineage; however, resistant lineages occasionally displayed enhanced growth on certain carbon sources, indicating that adaption to drug can provide a growth advantage in certain environments. A genome-scale metabolic network reconstruction (GENRE) ofP. aeruginosastrain UCBPP-PA14 was paired with whole-genome sequencing data of one of the drug-evolved lineages to predict genes contributing to observed changes in metabolism. Finally, we experimentally validatedin silicopredictions to identify genes mutated in resistantP. aeruginosaaffecting loss of catabolic function. Our results build upon previous mechanistic knowledge of drug-induced metabolic adaptation and provide a framework for the identification of metabolic limitations in antibiotic-resistant pathogens. Robust drug-driven changes in bacterial metabolism have the potential to be exploited to select against antibiotic-resistant populations in chronic infections.

Download Full-text

Local Ancestry Prediction with PyLAE

10.1101/2020.11.13.380105 ◽

2020 ◽

Author(s):

Alexander Smetanin ◽

Nikita Moshkov ◽

Tatiana V. Tatarinova

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Computational Efficiency ◽

Source Code ◽

High Density ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Local Ancestry ◽

A Genome

AbstractSummaryWe developed PyLAE - a new tool for determining local ancestry along a genome using whole-genome sequencing data or high-density genotyping experiments. PyLAE can process an arbitrarily large number of ancestral populations (with or without an informative prior). Since PyLAE does not involve estimation of many parameters, it can process thousands of genomes within a day. Computational efficiency, straightforward presentation of results, and an ease of installation makes PyLAE a useful tool to study admixed populations.Availability and implementationThe source code and installation manual are available at https://github.com/smetam/pylae.

Download Full-text

Identifying structural variants using linked-read sequencing data

10.1101/190454 ◽

2017 ◽

Cited By ~ 5

Author(s):

Rebecca Elyanow ◽

Hsin-Ta Wu ◽

Benjamin J. Raphael

Keyword(s):

Cancer Cell Line ◽

Whole Genome Sequencing Data ◽

Sequence Information ◽

The Novel ◽

Cancer Genes ◽

Structural Variants ◽

Sequencing Data ◽

Individual Genome ◽

Large Deletions ◽

Cancer Genomes

AbstractStructural variation, including large deletions, duplications, inversions, translocations, and other rearrangements, is common in human and cancer genomes. A number of methods have been developed to identify structural variants from Illumina short-read sequencing data. However, reliable identification of structural variants remains challenging because many variants have breakpoints in repetitive regions of the genome and thus are difficult to identify with short reads. The recently developed linked-read sequencing technology from 10X Genomics combines a novel barcoding strategy with Illumina sequencing. This technology labels all reads that originate from a small number (~5-10) DNA molecules ~50Kbp in length with the same molecular barcode. These barcoded reads contain long-range sequence information that is advantageous for identification of structural variants. We present Novel Adjacency Identification with Barcoded Reads (NAIBR), an algorithm to identify structural variants in linked-read sequencing data. NAIBR predicts novel adjacencies in a individual genome resulting from structural variants using a probabilistic model that combines multiple signals in barcoded reads. We show that NAIBR outperforms several existing methods for structural variant identification – including two recent methods that also analyze linked-reads – on simulated sequencing data and 10X whole-genome sequencing data from the NA12878 human genome and the HCC1954 breast cancer cell line. Several of the novel somatic structural variants identified in HCC1954 overlap known cancer genes.

Download Full-text

Identification of Genes under Purifying Selection in Human Cancers

10.1101/129205 ◽

2017 ◽

Author(s):

Robert A. Mathis ◽

Ethan S. Sokol ◽

Piyush B. Gupta

Keyword(s):

Negative Selection ◽

Somatic Mutations ◽

Purifying Selection ◽

Sequencing Data ◽

Systematic Assessment ◽

Coding Regions ◽

Strong Negative Selection ◽

Cancer Genomes ◽

Lung Adenocarcinomas ◽

Widespread Interest

AbstractThere is widespread interest in finding therapeutic vulnerabilities by analyzing the somatic mutations in cancers. Most analyses have focused on identifying driver oncogenes mutated in patient tumors, but this approach is incapable of discovering genes essential for tumor growth yet not activated through mutation. We show that such genes can be systematically discovered by mining cancer sequencing data for evidence of purifying selection. We show that purifying selection reduces substitution rates in coding regions of cancer genomes, depleting up to 90% of mutations for some genes. Moreover, mutations resulting in non-conservative amino acid substitutions are under strong negative selection in tumors, whereas conservative substitutions are more tolerated. Genes under purifying selection include members of the EGFR and FGFR pathways in lung adenocarcinomas, and DNA repair pathways in melanomas. A systematic assessment of purifying selection in tumors would identify hundreds of tumor-specific enablers and thus novel targets for therapy.

Download Full-text

The shaping of immunological responses through natural selection after the Roma Diaspora

Scientific Reports ◽

10.1038/s41598-020-73182-1 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Begoña Dobon ◽

Rob ter Horst ◽

Hafid Laayouni ◽

Mayukh Mondal ◽

Erica Bianco ◽

...

Keyword(s):

Fungal Infections ◽

Balkan Peninsula ◽

Host Population ◽

Human Migration ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Immunological Responses ◽

A Genome ◽

Local Host ◽

Northwest India

Abstract The Roma people are the largest transnational ethnic minority in Europe and can be considered the last human migration of South Asian origin into the continent. They left Northwest India approximately 1,000 years ago, reaching the Balkan Peninsula around the twelfth century and Romania in the fourteenth century. Here, we analyze whole-genome sequencing data of 40 Roma and 40 non-Roma individuals from Romania. We performed a genome-wide scan of selection comparing Roma, their local host population, and a Northwestern Indian population, to identify the selective pressures faced by the Roma mainly after they settled in Europe. We identify under recent selection several pathways implicated in immune responses, among them cellular metabolism pathways known to be rewired after immune stimulation. We validated the interaction between PIK3-mTOR-HIF-1α and cytokine response influenced by bacterial and fungal infections. Our results point to a significant role of these pathways for host defense against the most prevalent pathogens in Europe during the last millennium.

Download Full-text

FindZX: an automated pipeline for detecting and visualising sex chromosomes using whole-genome sequencing data

10.1101/2021.10.18.464774 ◽

2021 ◽

Author(s):

Hanna Sigeman ◽

Bella Sinclair ◽

Bengt Hansson

Keyword(s):

Sex Chromosomes ◽

Sex Chromosome ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

A Genome ◽

Genomic Studies ◽

Taxonomic Groups ◽

User Friendly ◽

Genomic Patterns

Sex chromosomes have evolved numerous times, as revealed by recent genomic studies. However, large gaps in our knowledge of sex chromosome diversity across the tree of life remain. Filling these gaps, through the study of novel species, is crucial for improved understanding of why and how sex chromosomes evolve. Characterization of sex chromosomes in already well-studied organisms is also important to avoid misinterpretations of population genomic patterns caused by undetected sex chromosome variation. Here we present findZX, an automated Snakemake-based computational pipeline for detecting and visualizing sex chromosomes through differences in genome coverage and heterozygosity between males and females. FindZX is user-friendly and scalable to suit different computational platforms and works with any number of male and female samples. An option to perform a genome coordinate lift-over to a reference genome of another species allows users to inspect sex- linked regions over larger contiguous chromosome regions, while also providing important between- species synteny information. To demonstrate its effectiveness, we applied findZX to publicly available genomic data from species belonging to widely different taxonomic groups (mammals, birds, reptiles, fish, and insects), with sex chromosome systems of different ages, sizes, and levels of differentiation. We also demonstrate that the lift-over method is robust over large phylogenetic distances (>80 million years of evolution).

Download Full-text

Local ancestry prediction with PyLAE

PeerJ ◽

10.7717/peerj.12502 ◽

2021 ◽

Vol 9 ◽

pp. e12502

Author(s):

Nikita Moshkov ◽

Aleksandr Smetanin ◽

Tatiana V. Tatarinova

Keyword(s):

Genome Sequencing ◽

Gold Standard ◽

Source Code ◽

Genomic Data ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Local Ancestry ◽

1000 Genomes ◽

A Genome

Summary We developed PyLAE, a new tool for determining local ancestry along a genome using whole-genome sequencing data or high-density genotyping experiments. PyLAE can process an arbitrarily large number of ancestral populations (with or without an informative prior). Since PyLAE does not involve estimating many parameters, it can process thousands of genomes within a day. PyLAE can run on phased or unphased genomic data. We have shown how PyLAE can be applied to the identification of differentially enriched pathways between populations. The local ancestry approach results in higher enrichment scores compared to whole-genome approaches. We benchmarked PyLAE using the 1000 Genomes dataset, comparing the aggregated predictions with the global admixture results and the current gold standard program RFMix. Computational efficiency, minimal requirements for data pre-processing, straightforward presentation of results, and ease of installation make PyLAE a valuable tool to study admixed populations. Availability and implementation The source code and installation manual are available at https://github.com/smetam/pylae.

Download Full-text

Genome-wide profiling of heritable and de novo STR variations

10.1101/077727 ◽

2016 ◽

Cited By ~ 7

Author(s):

Thomas Willems ◽

Dina Zielinski ◽

Assaf Gordon ◽

Melissa Gymrek ◽

Yaniv Erlich

Keyword(s):

Tandem Repeats ◽

High Throughput Sequencing ◽

De Novo ◽

Genetic Diseases ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Genome Wide ◽

A Genome ◽

Short Tandem

AbstractShort tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases, population genetics applications, and forensic casework. However, STRs have proven problematic to genotype from high-throughput sequencing data. Here, we describe HipSTR, a novel haplotype-based method for robustly genotyping, haplotyping, and phasing STRs from whole genome sequencing data and report a genome-wide analysis and validation of de novo STR mutations.

Download Full-text

Investigating the Evolutionary Importance of Denisovan Introgressions in Papua New Guineans and Australians

10.1101/022632 ◽

2015 ◽

Author(s):

Ya Hu ◽

Qiliang Ding ◽

Yi Wang ◽

Shuhua Xu ◽

Yungang He ◽

...

Keyword(s):

False Positive Rate ◽

Whole Genome Sequencing Data ◽

Phase Method ◽

Sequencing Data ◽

Male Gonad ◽

Two Phase ◽

Genome Wide ◽

A Genome ◽

Positive Rate ◽

Olfactory Pit

Previous research reported that Papua New Guineans (PNG) and Australians contain introgressions from Denisovans. Here we present a genome-wide analysis of Denisovan introgressions in PNG and Australians. We firstly developed a two-phase method to detect Denisovan introgressions from whole-genome sequencing data. This method has relatively high detection power (79.74%) and low false positive rate (2.44%) based on simulations. Using this method, we identified 1.34 Gb of Denisovan introgressions from sixteen PNG and four Australian genomes, in which we identified 38,877 Denisovan introgressive alleles (DIAs). We found that 78 Denisovan introgressions were under positive selection. Genes located in the 78 introgressions are related to evolutionarily important functions, such as spermatogenesis, fertilization, cold acclimation, circadian rhythm, development of brain, neural tube, face, and olfactory pit, immunity, etc. We also found that 121 DIAs are missense. Genes harboring the 121 missense DIAs are also related to evolutionarily important functions, such as female pregnancy, development of face, lung, heart, skin, nervous system, and male gonad, visual and smell perception, response to heat, pain, hypoxia, and UV, lipid transport, metabolism, blood coagulation, wound healing, aging, etc. Taken together, this study suggests that Denisovan introgressions in PNG and Australians are evolutionarily important, and may help PNG and Australians in local adaptation. In this study, we also proposed a method that could efficiently identify archaic hominin introgressions in modern non-African genomes.

Download Full-text