CoNCoS: Copy number estimation in cancer with controlled support

2015 ◽  
Vol 13 (05) ◽  
pp. 1550027 ◽  
Author(s):  
Ali T. Abdallah ◽  
Matthias Fischer ◽  
Peter Nürnberg ◽  
Michael Nothnagel ◽  
Peter Frommolt

Somatic copy number (CN) alterations are major drivers of tumorigenesis and growth. Although next-generation sequencing (NGS) technologies enable a deep genomic analysis of cancers, the analysis of the data remains subject to biases and multiple sources of error, including varying local read coverage. The currently existing algorithms for NGS-based detection of CN abberations do not incorporate information on the local coverage quality. We have developed a new algorithm, copy number estimation with controlled support (CoNCoS) that increases the accuracy of CN estimation in paired tumor/normal exome sequencing data sets by assessing and optimizing the support for a site-specific CN estimate. We show by simulations and in a benchmarking study against single nucleotide polymorphism (SNP) microarray data that our approach outperforms the commonly used methods CNAnorm and VarScan2. Our algorithm is suitable to increase the accuracy of somatic CN analysis by a support-optimized estimation approach.

Author(s):  
Liam F Spurr ◽  
Mehdi Touat ◽  
Alison M Taylor ◽  
Adrian M Dubuc ◽  
Juliann Shih ◽  
...  

Abstract Summary The expansion of targeted panel sequencing efforts has created opportunities for large-scale genomic analysis, but tools for copy-number quantification on panel data are lacking. We introduce ASCETS, a method for the efficient quantitation of arm and chromosome-level copy-number changes from targeted sequencing data. Availability and implementation ASCETS is implemented in R and is freely available to non-commercial users on GitHub: https://github.com/beroukhim-lab/ascets, along with detailed documentation. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 17 (2) ◽  
pp. 185-192 ◽  
Author(s):  
Jae-Yong Nam ◽  
Nayoung K. D. Kim ◽  
Sang Cheol Kim ◽  
Je-Gun Joung ◽  
Ruibin Xi ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Guojun Liu ◽  
Junying Zhang

The next-generation sequencing technology offers a wealth of data resources for the detection of copy number variations (CNVs) at a high resolution. However, it is still challenging to correctly detect CNVs of different lengths. It is necessary to develop new CNV detection tools to meet this demand. In this work, we propose a new CNV detection method, called CBCNV, for the detection of CNVs of different lengths from whole genome sequencing data. CBCNV uses a clustering algorithm to divide the read depth segment profile, and assigns an abnormal score to each read depth segment. Based on the abnormal score profile, Tukey’s fences method is adopted in CBCNV to forecast CNVs. The performance of the proposed method is evaluated on simulated data sets, and is compared with those of several existing methods. The experimental results prove that the performance of CBCNV is better than those of several existing methods. The proposed method is further tested and verified on real data sets, and the experimental results are found to be consistent with the simulation results. Therefore, the proposed method can be expected to become a routine tool in the analysis of CNVs from tumor-normal matched samples.


Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1456
Author(s):  
Maaike van der Lee ◽  
Marjolein Kriek ◽  
Henk-Jan Guchelaar ◽  
Jesse J. Swen

The continuous development of new genotyping technologies requires awareness of their potential advantages and limitations concerning utility for pharmacogenomics (PGx). In this review, we provide an overview of technologies that can be applied in PGx research and clinical practice. Most commonly used are single nucleotide variant (SNV) panels which contain a pre-selected panel of genetic variants. SNV panels offer a short turnaround time and straightforward interpretation, making them suitable for clinical practice. However, they are limited in their ability to assess rare and structural variants. Next-generation sequencing (NGS) and long-read sequencing are promising technologies for the field of PGx research. Both NGS and long-read sequencing often provide more data and more options with regard to deciphering structural and rare variants compared to SNV panels—in particular, in regard to the number of variants that can be identified, as well as the option for haplotype phasing. Nonetheless, while useful for research, not all sequencing data can be applied to clinical practice yet. Ultimately, selecting the right technology is not a matter of fact but a matter of choosing the right technique for the right problem.


mSystems ◽  
2017 ◽  
Vol 2 (2) ◽  
Author(s):  
Amnon Amir ◽  
Daniel McDonald ◽  
Jose A. Navas-Molina ◽  
Evguenia Kopylova ◽  
James T. Morton ◽  
...  

ABSTRACT Deblur provides a rapid and sensitive means to assess ecological patterns driven by differentiation of closely related taxa. This algorithm provides a solution to the problem of identifying real ecological differences between taxa whose amplicons differ by a single base pair, is applicable in an automated fashion to large-scale sequencing data sets, and can integrate sequencing runs collected over time. High-throughput sequencing of 16S ribosomal RNA gene amplicons has facilitated understanding of complex microbial communities, but the inherent noise in PCR and DNA sequencing limits differentiation of closely related bacteria. Although many scientific questions can be addressed with broad taxonomic profiles, clinical, food safety, and some ecological applications require higher specificity. Here we introduce a novel sub-operational-taxonomic-unit (sOTU) approach, Deblur, that uses error profiles to obtain putative error-free sequences from Illumina MiSeq and HiSeq sequencing platforms. Deblur substantially reduces computational demands relative to similar sOTU methods and does so with similar or better sensitivity and specificity. Using simulations, mock mixtures, and real data sets, we detected closely related bacterial sequences with single nucleotide differences while removing false positives and maintaining stability in detection, suggesting that Deblur is limited only by read length and diversity within the amplicon sequences. Because Deblur operates on a per-sample level, it scales to modern data sets and meta-analyses. To highlight Deblur’s ability to integrate data sets, we include an interactive exploration of its application to multiple distinct sequencing rounds of the American Gut Project. Deblur is open source under the Berkeley Software Distribution (BSD) license, easily installable, and downloadable from https://github.com/biocore/deblur . IMPORTANCE Deblur provides a rapid and sensitive means to assess ecological patterns driven by differentiation of closely related taxa. This algorithm provides a solution to the problem of identifying real ecological differences between taxa whose amplicons differ by a single base pair, is applicable in an automated fashion to large-scale sequencing data sets, and can integrate sequencing runs collected over time.


2021 ◽  
pp. 1840-1852
Author(s):  
Jaclyn Schienda ◽  
Alanna J. Church ◽  
Laura B. Corson ◽  
Brennan Decker ◽  
Catherine M. Clinton ◽  
...  

PURPOSE Molecular tumor profiling is becoming a routine part of clinical cancer care, typically involving tumor-only panel testing without matched germline. We hypothesized that integrated germline sequencing could improve clinical interpretation and enhance the identification of germline variants with significant hereditary risks. MATERIALS AND METHODS Tumors from pediatric patients with high-risk, extracranial solid malignancies were sequenced with a targeted panel of cancer-associated genes. Later, germline DNA was analyzed for a subset of these genes. We performed a post hoc analysis to identify how an integrated analysis of tumor and germline data would improve clinical interpretation. RESULTS One hundred sixty participants with both tumor-only and germline sequencing reports were eligible for this analysis. Germline sequencing identified 38 pathogenic or likely pathogenic variants among 35 (22%) patients. Twenty-five (66%) of these were included in the tumor sequencing report. The remaining germline pathogenic or likely pathogenic variants were single-nucleotide variants filtered out of tumor-only analysis because of population frequency or copy-number variation masked by additional copy-number changes in the tumor. In tumor-only sequencing, 308 of 434 (71%) single-nucleotide variants reported were present in the germline, including 31% with suggested clinical utility. Finally, we provide further evidence that the variant allele fraction from tumor-only sequencing is insufficient to differentiate somatic from germline events. CONCLUSION A paired approach to analyzing tumor and germline sequencing data would be expected to improve the efficiency and accuracy of distinguishing somatic mutations and germline variants, thereby facilitating the process of variant curation and therapeutic interpretation for somatic reports, as well as the identification of variants associated with germline cancer predisposition.


2019 ◽  
Vol 35 (22) ◽  
pp. 4806-4808 ◽  
Author(s):  
Hein Chun ◽  
Sangwoo Kim

Abstract Summary Mislabeling in the process of next generation sequencing is a frequent problem that can cause an entire genomic analysis to fail, and a regular cohort-level checkup is needed to ensure that it has not occurred. We developed a new, automated tool (BAMixChecker) that accurately detects sample mismatches from a given BAM file cohort with minimal user intervention. BAMixChecker uses a flexible, data-specific set of single-nucleotide polymorphisms and detects orphan (unpaired) and swapped (mispaired) samples based on genotype-concordance score and entropy-based file name analysis. BAMixChecker shows ∼100% accuracy in real WES, RNA-Seq and targeted sequencing data cohorts, even for small panels (<50 genes). BAMixChecker provides an HTML-style report that graphically outlines the sample matching status in tables and heatmaps, with which users can quickly inspect any mismatch events. Availability and implementation BAMixChecker is available at https://github.com/heinc1010/BAMixChecker Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Vol 2017 ◽  
pp. 1-11 ◽  
Author(s):  
Jinhwa Kong ◽  
Jaemoon Shin ◽  
Jungim Won ◽  
Keonbae Lee ◽  
Unjoo Lee ◽  
...  

Copy number variations (CNVs) are structural variants associated with human diseases. Recent studies verified that disease-related genes are based on the extraction of rare de novo and transmitted CNVs from exome sequencing data. The need for more efficient and accurate methods has increased, which still remains a challenging problem due to coverage biases, as well as the sparse, small-sized, and noncontinuous nature of exome sequencing. In this study, we developed a new CNV detection method, ExCNVSS, based on read coverage depth evaluation and scale-space filtering to resolve these problems. We also developed the method ExCNVSS_noRatio, which is a version of ExCNVSS, for applying to cases with an input of test data only without the need to consider the availability of a matched control. To evaluate the performance of our method, we tested it with 11 different simulated data sets and 10 real HapMap samples’ data. The results demonstrated that ExCNVSS outperformed three other state-of-the-art methods and that our method corrected for coverage biases and detected all-sized CNVs even without matched control data.


Haematologica ◽  
2021 ◽  
Author(s):  
Thomas Creasey ◽  
Emilio Barretta ◽  
Sarra L. Ryan ◽  
Ellie Butler ◽  
Amy A Kirkwood ◽  
...  

Despite being predominantly a childhood disease, the incidence of ALL has a second peak in adults aged 60 years and over. These older adults fare extremely poorly with existing treatment strategies and very few studies have undertaken a comprehensive genetic and genomic characterisation to improve prognosis in this age group. We performed cytogenetic, single nucleotide polymorphism (SNP) array and next generation sequencing (NGS) analyses on samples from 210 patients aged ≥60 years from the UKALL14 and UKALL60+ clinical trials. BCR-ABL1 positive disease was present in 26% (55/210) of patients, followed by low hypodiploidy/near triploidy in 13% (28/210). Cytogenetically cryptic rearrangements in CRLF2, ZNF384 and MEF2D were detected in 5%, 1% and 1% of patients respectively. Copy number abnormalities were common and deletions in ALL driver genes were seen in 77% of cases. IKZF1 deletion was present in 51% (40/78) of samples tested and the IKZF1plus profile identified in over a third (28/77) of BCP-ALL cases. The genetic good risk abnormalities high hyperdiploidy (n=2), ETV6-RUNX1 (no cases) and ERG deletion (no cases) were exceptionally rare in this cohort. RAS pathway mutations were seen in 17% (4/23) of screened samples. KDM6A abnormalities, including biallelic deletions, were discovered in 5% (4/78) of SNP array and 9% (2/23) of NGS samples, and represent a novel, potentially therapeutically actionable lesions using EZH2 inhibitors. Outcome remained poor with five-year event-free (EFS) and overall survival (OS) rates of 17% and 24% respectively across the cohort indicating a need for novel therapeutic strategies.


2021 ◽  
Author(s):  
Migle Gabrielaite ◽  
Mathias Husted Torp ◽  
Sergio Andreu-Sánchez ◽  
Filipe Garrett Vieira ◽  
Christina Bligaard Pedersen ◽  
...  

Background: Copy-number variations (CNVs) have important clinical implications for several diseases and cancers. The clinically relevant CNVs are hard to detect because CNVs are common structural variations that define large parts of the normal human genome. CNV calling from short-read sequencing data has the potential to leverage available cohort studies and allow full genomic profiling in the clinic without the need for additional data modalities. Questions regarding performance of CNV calling tools for clinical use and suitable sequencing protocols remain poorly addressed, mainly because of the lack of good reference data sets. Methods: We reviewed 50 popular CNV calling tools and included 11 tools for benchmarking in a unique reference cohort encompassing 39 whole genome sequencing (WGS) samples paired with analysis by the current clinical standard—SNP-array based CNV calling. Additionally, for nine of these samples we performed whole exome sequencing (WES) performed, in order to address the effect of sequencing protocol on CNV calling. Furthermore, we included Gold Standard reference sample NA12878, and tested 12 samples with CNVs confirmed by multiplex ligation-dependent probe amplification (MLPA). Results: Tool performance varied greatly in the number of called CNVs and bias for CNV lengths. Some tools had near-perfect recall of CNVs from arrays for some samples, but poor precision. Filtering output by CNV ranks from tools did not salvage precision. Several tools had better performance patterns for NA12878, and we hypothesize that this is the result of overfitting during the tool development. Conclusions: We suggest combining tools with the best recall: GATK gCNV, Lumpy, DELLY, and cn.MOPS. These tools also capture different CNVs. Further improvements in precision requires additional development of tools, reference data sets, and annotation of CNVs, potentially assisted by the use of background panels for filtering of frequently called variants.


Sign in / Sign up

Export Citation Format

Share Document