scholarly journals Pan cancer patterns of allelic imbalance from chromosomal aberrations in 33 tumor types

2019 ◽  
Author(s):  
Smruthy Sivakumar ◽  
F Anthony San Lucas ◽  
Yasminka A Jakubek ◽  
Jerry Fowler ◽  
Paul Scheet

ABSTRACTSomatic copy number alterations (SCNAs), including deletions and duplications, serve as hallmarks of tumorigenesis. SCNAs may span entire chromosomes and typically result in deviations from an expected one-to-one ratio of alleles at heterozygous loci, leading to allelic imbalance (AI). The Cancer Genome Atlas (TCGA) reports SCNAs identified using a circular binary segmentation (CBS) algorithm, providing segment mean copy number estimates from Affymetrix single-nucleotide polymorphism DNA microarray total (log R ratio) intensities, but not allele-specific (“B allele”) intensities that inform of AI. Here we seek to provide a TCGA-wide description of AI in tumor genomes, including AI induced by SCNAs and copy-neutral loss-of-heterozygosity (cnLOH), using a powerful haplotype-based method applied to allele-specific intensities. We present AI summaries for all 33 tumor sites and propose an automated adjustment procedure to improve calibration of existing SCNA calls in TCGA for tumors with high levels of aneuploidy where baseline intensities were difficult to establish without annotation of AI. Overall, 94% of tumor samples exhibited AI. Recurrent events included deletions of 17p, 9q, 3p, amplifications of 8q, 1q, 7p as well as mixed event types on 8p and 13q. The AI-based approach identified frequent cnLOH on 17p across multiple tumor sites, with additional site-specific cnLOH patterns. Our findings support the exploration of additional methods for robust automated inference procedures and to aid empirical discoveries across TCGA.

Genetics ◽  
2021 ◽  
Vol 217 (1) ◽  
Author(s):  
Smruthy Sivakumar ◽  
F Anthony San Lucas ◽  
Yasminka A Jakubek ◽  
Zuhal Ozcan ◽  
Jerry Fowler ◽  
...  

Abstract Somatic copy number alterations (SCNAs) serve as hallmarks of tumorigenesis and often result in deviations from one-to-one allelic ratios at heterozygous loci, leading to allelic imbalance (AI). The Cancer Genome Atlas (TCGA) reports SCNAs identified using a circular binary segmentation algorithm, providing segment mean copy number estimates from single-nucleotide polymorphism DNA microarray total intensities (log R ratio), but not allele-specific intensities (“B allele” frequencies) that inform of AI. Our approach provides more sensitive identification of SCNAs by modeling the “B allele” frequencies jointly, thereby bolstering the catalog of chromosomal alterations in this widely utilized resource. Here we present AI summaries for all 33 tumor sites in TCGA, including those induced by SCNAs and copy-neutral loss-of-heterozygosity (cnLOH). We identified AI in 94% of the tumors, higher than in previous reports. Recurrent events included deletions of 17p, 9q, 3p, amplifications of 8q, 1q, 7p, as well as mixed event types on 8p and 13q. We also observed both site-specific and pan-cancer (spanning 17p) cnLOH, patterns which have not been comprehensively characterized. The identification of such cnLOH events elucidates tumor suppressors and multi-hit pathways to carcinogenesis. We also contrast the landscapes inferred from AI- and total intensity-derived SCNAs and propose an automated procedure to improve and adjust SCNAs in TCGA for cases where high levels of aneuploidy obscured baseline intensity identification. Our findings support the exploration of additional methods for robust automated inference procedures and to aid empirical discoveries across TCGA.


2019 ◽  
Author(s):  
Sehyun Oh ◽  
Ludwig Geistlinger ◽  
Marcel Ramos ◽  
Martin Morgan ◽  
Levi Waldron ◽  
...  

AbstractBackgroundAllele-specific copy number alteration (CNA) analysis is essential to study the functional impact of single nucleotide variants (SNV) and the process of tumorigenesis. Most commonly used tools in the field rely on high quality genome-wide data with matched normal profiles, limiting their applicability in clinical settings.MethodsWe propose a workflow, based on the open-source PureCN R/Bioconductor package in conjunction with widely used variant-calling and copy number segmentation algorithms, for allele-specific CNA analysis from whole exome sequencing (WES) without matched normals. We use The Cancer Genome Atlas (TCGA) ovarian carcinoma (OV) and lung adenocarcinoma (LUAD) datasets to benchmark its performance against gold standard SNP6 microarray and WES datasets with matched normal samples. Our workflow further classifies SNVs by somatic status and then uses this information to infer somatic mutational signatures and tumor mutational burden (TMB).ResultsApplication of our workflow to tumor-only WES data produces tumor purity and ploidy estimates that are highly concordant with estimates from SNP6 microarray data and matched-normal WES data. The presence of cancer type-specific somatic mutational signatures was inferred with high accuracy. We also demonstrate high concordance of TMB between our tumor-only workflow and matched normal pipelines.ConclusionThe proposed workflow provides, to our knowledge, the only open-source option for comprehensive allele-specific CNA analysis and SNV classification of tumor-only WES with demonstrated high accuracy.


2018 ◽  
Author(s):  
Hyoyoung Choo-Wosoba ◽  
Paul S Albert ◽  
Bin Zhu

AbstractBackground:Somatic copy number alternation (SCNA) is a common feature of the cancer genome and is associated with cancer etiology and prognosis. The allele-specific SCNA analysis of a tumor sample aims to identify the allele-specific copy numbers of both alleles, adjusting for the ploidy and the tumor purity. Next generation sequencing platforms produce abundant read counts at the base-pair resolution across the exome or whole genome which is susceptible to hypersegmentation, a phenomenon where numerous regions with very short length are falsely identified as SCNA.Results:We propose hsegHMM, a hidden Markov model approach that accounts for hypersegmentation for allele-specific SCNA analysis. hsegHMM provides statistical inference of copy number profiles by using an effcient E-M algorithm procedure. Through simulation and application studies, we found that hsegHMM handles hypersegmentation effectively with a t-distribution as a part of the emission probability distribution structure and a carefully defined state space. We also compared hsegHMM with FACETS which is a current method for allele-specific SCNA analysis. For the application, we use a renal cell carcinoma sample from The Cancer Genome Atlas (TCGA) study.Conclusions:We demonstrate the robustness of hsegHMM to hypersegmentation. Furthermore, hsegHMM provides the quantification of uncertainty in identifying allele-specific SCNAs over the entire chromosomes. hsegHMM performs better than FACETS when read depth (coverage) is uneven across the genome.


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S105-S106
Author(s):  
Aparna Bhutkar ◽  
Anastasia Gurinovich ◽  
Thomas T Perls ◽  
Paola Sebastiani ◽  
Stefano Monti

Abstract Mosaicism, the presence of two or more genotypically or karyotypically distinct populations of cells in a single individual, plays an important role in human disease. Mosaicism can result in mutations and/or chromosomal alterations such as loss, gain, or copy-number neutral loss of heterozygosity. Clonal mosaicism and its relationship to aging and cancer, has been previously studied, and earlier work suggests that clonal mosaicism tends to increase with age. The aim of our research is to use genotype data of centenarians to explore the relationship between extreme longevity and mosaic chromosomal alterations (mCAs). To this end, we analyzed genome-wide genotypes from blood-derived DNA of 338 individuals from the New England Centenarian Study. The participants in this dataset ranged from 45 to 112 years of age. For the detection of mCA events, we used MoChA (https://github.com/freeseek/mocha), a bcftools extension, that predicts mCAs based on B-allele frequency (BAF) and log2 intensity(R) ratio (LRR), and uses long-range phase information to increase sensitivity. Chromosomal alteration events, including whole chromosome events, were detected in 180 out of the 338 individuals. A total of 165 duplications, 97 deletions, and 9 copy-number neutral loss of heterozygosity were detected. Additionally, there were 42 events whose copy number state could not be determined with high confidence. 236 events out of the 313 were detected in individuals aged 100 and older. Our analysis of chromosomal alteration frequency by age indicates that, within centenarians, the proportion of individuals with mCAs significantly decreases with increased age (p < 0.05, correlation -0.73).


2018 ◽  
Author(s):  
Simone Zaccaria ◽  
Benjamin J. Raphael

Copy-number aberrations (CNAs) and whole-genome duplications (WGDs) are frequent somatic mutations in cancer. Accurate quantification of these mutations from DNA sequencing of bulk tumor samples is complicated by varying tumor purity, admixture of multiple tumor clones with distinct mutations, and high aneuploidy. Standard methods for CNA inference analyze tumor samples individually, but recently DNA sequencing of multiple samples from a cancer patient - e.g. from multiple regions of a primary tumor, matched primary/metastases, or multiple time points - has become common. We introduce a new algorithm, Holistic Allele-specific Tumor Copy-number Heterogeneity (HATCHet), that infers allele and clone-specific CNAs and WGDs jointly across multiple tumor samples from the same patient, and that leverages the relationships between clones in these samples. HATCHet provides a fresh perspective on CNA inference and includes several algorithmic innovations that overcome the limitations of existing methods, resulting in a more robust approach even for single-sample analysis. We also develop MASCoTE (Multiple Allele-specific Simulation of Copy-number Tumor Evolution), a framework for generating realistic simulated multi-sample DNA sequencing data with appropriate corrections for the differences in genome lengths between the normal and tumor clone(s) present in mixed samples. HATCHet outperforms current state-of-the-art methods on 256 simulated tumor samples from 64 patients, half with WGD. HATCHet's analysis of 49 primary tumor and metastasis samples from 10 prostate cancer patients reveals subclonal CNAs in only 29 of these samples, compared to the published reports of extensive subclonal CNAs in all samples. HATCHet's inferred CNAs are also more consistent with the reports of polyclonal origin and limited heterogeneity of metastasis in a subset of patients. HATCHet's analysis of 35 primary tumor and metastasis samples from 4 pancreas cancer patients reveals subclonal CNAs in 20 samples, WGDs in 3 patients, and tumor subclones that are shared across primary and metastases samples from the same patient - none of which were described in published analysis of this data. HATCHet substantially improves the analysis of CNAs and WGDs, leading to more reliable studies of tumor evolution in primary tumors and metastases.


2019 ◽  
Author(s):  
Simone Zaccaria ◽  
Benjamin J. Raphael

AbstractSingle-cell barcoding technologies have recently been used to perform whole-genome sequencing of thousands of individual cells in parallel. These technologies provide the opportunity to characterize genomic heterogeneity at single-cell resolution, but their extremely low sequencing coverage (<0.05X per cell) has thus far restricted their use to identification of the total copy number of large multi-megabase segments in individual cells. However, total copy numbers do not distinguish between the two homologous chromosomes in humans, and thus provide a limited view of tumor heterogeneity and evolution missing important events such as copy-neutral loss-of-heterozygosity (LOH). We introduce CHISEL, the first method to infer allele- and haplotype-specific copy numbers in single cells and subpopulations of cells by aggregating sparse signal across thousands of individual cells. We applied CHISEL to 10 single-cell sequencing datasets from 2 breast cancer patients, each dataset containing ≈2000 cells. We identified extensive allele-specific copy-number aberrations (CNAs) in these samples including copy-neutral LOH, whole-genome duplications (WGDs), and mirrored-subclonal CNAs in subpopulations of cells. These allele-specific CNAs alter the copy number of genomic regions containing well-known breast cancer genes including TP53, BRCA2, and PTEN but are invisible to total copy number analysis. We utilized CHISEL’s allele- and haplotype-specific copy numbers to derive a more refined reconstruction of tumor evolution: timing allele-specific CNAs before and after WGDs, identifying low-frequency subclones distinguished by unique CNAs, and uncovering evidence of convergent evolution. This reconstruction is supported by orthogonal analysis of somatic single-nucleotide variants (SNVs) obtained by pooling barcoded reads across clones defined by CHISEL.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xinping Fan ◽  
Guanghao Luo ◽  
Yu S. Huang

Abstract Background Copy number alterations (CNAs), due to their large impact on the genome, have been an important contributing factor to oncogenesis and metastasis. Detecting genomic alterations from the shallow-sequencing data of a low-purity tumor sample remains a challenging task. Results We introduce Accucopy, a method to infer total copy numbers (TCNs) and allele-specific copy numbers (ASCNs) from challenging low-purity and low-coverage tumor samples. Accucopy adopts many robust statistical techniques such as kernel smoothing of coverage differentiation information to discern signals from noise and combines ideas from time-series analysis and the signal-processing field to derive a range of estimates for the period in a histogram of coverage differentiation information. Statistical learning models such as the tiered Gaussian mixture model, the expectation–maximization algorithm, and sparse Bayesian learning were customized and built into the model. Accucopy is implemented in C++ /Rust, packaged in a docker image, and supports non-human samples, more at http://www.yfish.org/software/. Conclusions We describe Accucopy, a method that can predict both TCNs and ASCNs from low-coverage low-purity tumor sequencing data. Through comparative analyses in both simulated and real-sequencing samples, we demonstrate that Accucopy is more accurate than Sclust, ABSOLUTE, and Sequenza.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Asia Mendelevich ◽  
Svetlana Vinogradova ◽  
Saumya Gupta ◽  
Andrey A. Mironov ◽  
Shamil R. Sunyaev ◽  
...  

AbstractA sensitive approach to quantitative analysis of transcriptional regulation in diploid organisms is analysis of allelic imbalance (AI) in RNA sequencing (RNA-seq) data. A near-universal practice in such studies is to prepare and sequence only one library per RNA sample. We present theoretical and experimental evidence that data from a single RNA-seq library is insufficient for reliable quantification of the contribution of technical noise to the observed AI signal; consequently, reliance on one-replicate experimental design can lead to unaccounted-for variation in error rates in allele-specific analysis. We develop a computational approach, Qllelic, that accurately accounts for technical noise by making use of replicate RNA-seq libraries. Testing on new and existing datasets shows that application of Qllelic greatly decreases false positive rate in allele-specific analysis while conserving appropriate signal, and thus greatly improves reproducibility of AI estimates. We explore sources of technical overdispersion in observed AI signal and conclude by discussing design of RNA-seq studies addressing two biologically important questions: quantification of transcriptome-wide AI in one sample, and differential analysis of allele-specific expression between samples.


PLoS ONE ◽  
2010 ◽  
Vol 5 (6) ◽  
pp. e10909 ◽  
Author(s):  
Zongzhi Liu ◽  
Ao Li ◽  
Vincent Schulz ◽  
Min Chen ◽  
David Tuck

PLoS ONE ◽  
2011 ◽  
Vol 6 (8) ◽  
pp. e24052 ◽  
Author(s):  
Marguerite R. Irvin ◽  
Nathan E. Wineinger ◽  
Treva K. Rice ◽  
Nicholas M. Pajewski ◽  
Edmond K. Kabagambe ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document