scholarly journals A systematic evaluation of copy number alterations detection methods on real SNP array and deep sequencing data

2019 ◽  
Vol 20 (S25) ◽  
Author(s):  
Fei Luo

Abstract Background The Copy Number Alterations (CNAs) are discovered to be tightly associated with cancers, so accurately detecting them is one of the most important tasks in the cancer genomics. A series of CNAs detection methods have been proposed and new ones are still being developed. Due to the complexity of CNAs in cancers, no CNAs detection method has been accepted as the gold standard caller. Several evaluation works have made attempts to reveal typical CNAs detection methods’ performance. Limited by the scale of evaluation data, these different comparison works don’t reach a consensus and the researchers are still confused on how to choose one proper CNAs caller for their analysis. Therefore, it needs a more comprehensive evaluation of typical CNAs detection methods’ performance. Results In this work, we use a large-scale real dataset from CAGEKID consortium to evaluate total 12 typical CNAs detection methods. These methods are most widely used in cancer researches and always used as benchmark for the newly proposed CNAs detection methods. This large-scale dataset comprises of SNP array data on 94 samples and the whole genome sequencing data on 10 samples. Evaluations are comprehensively implemented in current scenarios of CNAs detection, which include that detect CNAs on SNP array data, on sequencing data with tumor and normal matched samples and on sequencing data with single tumor sample. Three SNP based methods are firstly ranked. Subsequently, the best SNP based method’s results are used as benchmark to compare six matched samples based methods and three single tumor sample based methods in terms of the preprocessing, recall rate, Jaccard index and segmentation characteristics. Conclusions Our survey thoroughly reveals 12 typical methods’ superiority and inferiority. We explain why methods show specific characteristics from a methodological standpoint. Finally, we present the guiding principle for choosing one proper CNAs detection method under specific conditions. Some unsolved problems and expectations are also addressed for upcoming CNAs detection methods.

2017 ◽  
Author(s):  
Zilu Zhou ◽  
Weixin Wang ◽  
Li-San Wang ◽  
Nancy Ruonan Zhang

AbstractMotivationCopy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large-scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous SNP-array data are available. An integrated cross-platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele-specific reads.ResultsWe propose a statistical framework, integrated Copy Number Variation detection algorithm (iCNV), which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP-array data by a Hidden Markov Model (HMM). We compare integrated two-platform CNV detection using iCNV to naive intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only, and show that the utilization of allele-specific reads improve CNV detection accuracy compared to existing methods.Availabilityhttps://github.com/zhouzilu/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Barbara Tabak ◽  
Gordon Saksena ◽  
Coyin Oh ◽  
Galen F. Gao ◽  
Barbara Hill Meyers ◽  
...  

AbstractMotivationSomatic copy-number alterations (SCNAs) play an important role in cancer development. Systematic noise in sequencing and array data present a significant challenge to the inference of SCNAs for cancer genome analyses. As part of The Cancer Genome Atlas (TCGA), the Broad Institute Genome Characterization Center developed the Tangent copy-number inference pipeline to generate copy-number profiles using single-nucleotide polymorphism (SNP) array and whole-exome sequencing (WES) data from over 10,000 pairs of tumors and matched normal samples. Here, we describe the Tangent pipeline, which begins with DNA sequencing data in the form of .bam files or raw SNP array probe-level intensity data, and ends with segmented copy-number calls to facilitate the identification of novel genes potentially targeted by SCNAs. We also describe a modification of Tangent, Pseudo-Tangent, which enables denoising through comparisons between tumor profiles when few normal samples are available.ResultsTangent Normalization offers substantial signal-to-noise ratio (SNR) improvements compared to conventional normalization methods in both SNP array and WES analyses. The improvement in SNRs is achieved primarily through noise reduction with minimal effect on signal. Pseudo-Tangent also reduces noise when few normal samples are available. Tangent and Pseudo-Tangent are broadly applicable and enable more accurate inference of SCNAs from DNA sequencing and array data.Availability and ImplementationTangent is available at https://github.com/coyin/tangent and as a Docker image (https://hub.docker.com/r/coyin/tangent). Tangent is also the normalization method for the Copy Number pipeline in Genome Analysis Toolkit 4 (GATK4)[email protected], [email protected], [email protected]


2022 ◽  
Author(s):  
Etienne Sollier ◽  
Jack Kuipers ◽  
Niko Beerenwinkel ◽  
Koichi Takahashi ◽  
Katharina Jahn

Reconstructing the history of somatic DNA alterations that occurred in a tumour can help understand its evolution and predict its resistance to treatment. Single-cell DNA sequencing (scDNAseq) can be used to investigate clonal heterogeneity and to inform phylogeny reconstruction. However, existing phylogenetic methods for scDNAseq data are designed either for point mutations or for large copy number variations, but not for both types of events simultaneously. Here, we develop COMPASS, a computational method for inferring the joint phylogeny of mutations and copy number alterations from targeted scDNAseq data. We evaluate COMPASS on simulated data and show that it outperforms existing methods. We apply COMPASS to a large cohort of 123 patients with acute myeloid leukemia (AML) and detect copy number alterations, including subclonal ones, which are in agreement with current knowledge of AML development. We further used bulk SNP array data to orthogonally validate or findings.


2017 ◽  
Author(s):  
Hui Yang ◽  
Gary Chen ◽  
Leandro Lima ◽  
Han Fang ◽  
Laura Jimenez ◽  
...  

ABSTRACTBACKGROUNDWhole-genome sequencing (WGS) data may be used to identify copy number variations (CNVs). Existing CNV detection methods mostly rely on read depth or alignment characteristics (paired-end distance and split reads) to infer gains/losses, while neglecting allelic intensity ratios and cannot quantify copy numbers. Additionally, most CNV callers are not scalable to handle a large number of WGS samples.METHODSTo facilitate large-scale and rapid CNV detection from WGS data, we developed a Dynamic Programming Imputation (DPI) based algorithm called HadoopCNV, which infers copy number changes through both allelic frequency and read depth information. Our implementation is built on the Hadoop framework, enabling multiple compute nodes to work in parallel.RESULTSCompared to two widely used tools – CNVnator and LUMPY, HadoopCNV has similar or better performance on both simulated data sets and real data on the NA12878 individual. Additionally, analysis on a 10-member pedigree showed that HadoopCNV has a Mendelian precision that is similar or better than other tools. Furthermore, HadoopCNV can accurately infer loss of heterozygosity (LOH), while other tools cannot. HadoopCNV requires only 1.6 hours for a human genome with 30X coverage, on a 32-node cluster, with a linear relationship between speed improvement and the number of nodes. We further developed a method to combine HadoopCNV and LUMPY result, and demonstrated that the combination resulted in better performance than any individual tools.CONCLUSIONSThe combination of high-resolution, allele-specific read depth from WGS data and Hadoop framework can result in efficient and accurate detection of CNVs.


2010 ◽  
Vol 39 (suppl_1) ◽  
pp. D968-D974 ◽  
Author(s):  
Qingyi Cao ◽  
Meng Zhou ◽  
Xujun Wang ◽  
Cliff A. Meyer ◽  
Yong Zhang ◽  
...  

2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii75-ii75
Author(s):  
Thais Sabedot ◽  
Michael Wells ◽  
Indrani Datta ◽  
Tathiane Malta ◽  
Ana Valeria Castro ◽  
...  

Abstract Adult diffuse gliomas are central nervous system (CNS) tumors that arise from the malignant transformation of glial cells. Nearly all gliomas will recur despite standard treatment however, current histopathological grading fails to predict which of them will relapse and/or progress. The Glioma Longitudinal AnalySiS (GLASS) consortium is a large-scale collaboration that aims to investigate the molecular profiling of matched primary and recurrent glioma samples from multiple institutions in order to better understand the dynamic evolution of these tumors. At this time, the cohort comprises 946 samples across 11 institutions and among those, 864 have DNA methylation data available. The current molecular classification based on 7 subtypes published by TCGA in 2016 was applied to the dataset. Among the IDH wildtype tumors, 33% (16/49) of the patients showed a change of subtype upon recurrence, whereas most of them (9/16) were Classic-like at the primary stage but changed to either Mesenchymal-like or PA-like at the recurrent level. Among the IDH mutant tumors, 15% (22/142) showed a change of subtype at recurrent stage, in which 16 out of 22 progressed from G-CIMP-high to G-CIMP-low. Although some tumors progressed to a different subtype upon recurrence, an unsupervised analysis showed that the samples tend to cluster by patient instead of by subtype. By estimating the copy number alterations of these tumors using DNA methylation, the overall copy number profile of the recurrent samples remains similar to their primary counterpart. From this initial analysis using epigenomic data, we were able to characterize some aspects of glioma evolution and how the DNA methylation is associated with the progression of these tumors to different subtypes. These findings corroborate the importance of epigenetics in gliomas and can potentially lead to the identification of new biomarkers that can reflect tumor burden and predict its development.


Medicina ◽  
2021 ◽  
Vol 57 (5) ◽  
pp. 502
Author(s):  
Georgiana Gug ◽  
Caius Solovan

Background and Objectives: Mycosis fungoides (MF) and large plaque parapsoriasis (LPP) evolution provide intriguing data and are the cause of numerous debates. The diagnosis of MF and LPP is associated with confusion and imprecise definition. Copy number alterations (CNAs) may play an essential role in the genesis of cancer out of genes expression dysregulation. Objectives: Due to the heterogeneity of MF and LPP and the scarcity of the cases, there are an exceedingly small number of studies that have identified molecular changes in these pathologies. We aim to identify and compare DNA copy number alterations and gene expression changes between MF and LPP to highlight the similarities and the differences between these pathologies. Materials and Methods: The patients were prospectively selected from University Clinic of Dermatology and Venereology Timișoara, Romania. From fresh frozen skin biopsies, we extracted DNA using single nucleotide polymorphism (SNP) data. The use of SNP array for copy number profiling is a promising approach for genome-wide analysis. Results: After reviewing each group, we observed that the histograms generated for chromosome 1–22 were remarkably similar and had a lot of CNAs in common, but also significant differences were seen. Conclusions: This study took a step forward in finding out the differences and similarities between MF and LPP, for a more specific and implicitly correct approach of the case. The similarity between these two pathologies in terms of CNAs is striking, emphasizing once again the difficulty of approaching and differentiating them.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xinping Fan ◽  
Guanghao Luo ◽  
Yu S. Huang

Abstract Background Copy number alterations (CNAs), due to their large impact on the genome, have been an important contributing factor to oncogenesis and metastasis. Detecting genomic alterations from the shallow-sequencing data of a low-purity tumor sample remains a challenging task. Results We introduce Accucopy, a method to infer total copy numbers (TCNs) and allele-specific copy numbers (ASCNs) from challenging low-purity and low-coverage tumor samples. Accucopy adopts many robust statistical techniques such as kernel smoothing of coverage differentiation information to discern signals from noise and combines ideas from time-series analysis and the signal-processing field to derive a range of estimates for the period in a histogram of coverage differentiation information. Statistical learning models such as the tiered Gaussian mixture model, the expectation–maximization algorithm, and sparse Bayesian learning were customized and built into the model. Accucopy is implemented in C++ /Rust, packaged in a docker image, and supports non-human samples, more at http://www.yfish.org/software/. Conclusions We describe Accucopy, a method that can predict both TCNs and ASCNs from low-coverage low-purity tumor sequencing data. Through comparative analyses in both simulated and real-sequencing samples, we demonstrate that Accucopy is more accurate than Sclust, ABSOLUTE, and Sequenza.


2021 ◽  
Author(s):  
Y. Pirosanto ◽  
N. Laseca ◽  
M. Valera ◽  
A. Molina ◽  
M. Moreno‐Millán ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document