scholarly journals CNVkit: Copy number detection and visualization for targeted sequencing using off-target reads

2014 ◽  
Author(s):  
Eric Talevich ◽  
A. Hunter Shain ◽  
Thomas Botton ◽  
Boris C. Bastian

Germline copy number variants (CNVs) and somatic copy number alterations (SCNAs) are of significant importance in syndromic conditions and cancer. Massive parallel sequencing is increasingly used to infer copy number information from variations in the read depth in sequencing data. However, this approach has limitations in the case of targeted re-sequencing, which leaves gaps in coverage between the regions chosen for enrichment and introduces biases related to the efficiency of target capture and library preparation. We present a method for copy number detection, implemented in the software package CNVkit, that uses both the targeted reads and the nonspecifically captured off-target reads to infer copy number evenly across the genome. This combination achieves both exon-level resolution in targeted regions and sufficient resolution in the larger intronic and intergenic regions to identify copy number changes. In particular, we successfully inferred copy number at equivalent to 100-kilobase resolution genome-wide from a platform targeting as few as 293 genes. After normalizing read counts to a pooled reference, we evaluated and corrected for three sources of bias that explain most of the extraneous variability in the sequencing read depth: GC content, target footprint size and spacing, and repetitive sequences. We compared the performance of CNVkit to copy number changes identified by array comparative genomic hybridization. We packaged the components of CNVkit so that it is straightforward to use and provides visualizations, detailed reporting of significant features, and export options for compatibility with other software. Availability: http://github.com/etal/cnvkit


2018 ◽  
Author(s):  
Vijay Kumar Pounraja ◽  
Gopal Jayakar ◽  
Matthew Jensen ◽  
Neil Kelkar ◽  
Santhosh Girirajan

ABSTRACTCopy-number variants (CNVs) are a major cause of several genetic disorders, making their detection an essential component of genetic analysis pipelines. Current methods for detecting CNVs from exome sequencing data are limited by high false positive rates and low concordance due to the inherent biases of individual algorithms. To overcome these issues, calls generated by two or more algorithms are often intersected using Venn-diagram approaches to identify “high-confidence” CNVs. However, this approach is inadequate, as it misses potentially true calls that do not have consensus from multiple callers. Here, we present CN-Learn, a machine-learning framework (https://github.com/girirajanlab/CN_Learn) that integrates calls from multiple CNV detection algorithms and learns to accurately identify true CNVs using caller-specific and genomic features from a small subset of validated CNVs. Using CNVs predicted by four exome-based CNV callers (CANOES, CODEX, XHMM and CLAMMS) from 503 samples, we demonstrate that CN-Learn identifies true CNVs at higher precision (~90%) and recall (~85%) rates while maintaining robust performance even when trained with minimal data (~30 samples). CN-Learn recovers twice as many CNVs compared to individual callers or Venn diagram-based approaches, with features such as exome capture probe count, caller concordance and GC content providing the most discriminatory power. In fact, about 58% of all true CNVs recovered by CN-Learn were either singletons or calls that lacked support from at least one caller. Our study underscores the limitations of current approaches for CNV identification and provides an effective method that yields high-quality CNVs for application in clinical diagnostics.



2020 ◽  
Author(s):  
liu ye ◽  
wu yangming ◽  
zheng zexin ◽  
zhou tianliangwen

Abstract Background Copy number variants (CNVs) are widespread among human genes, causing Mendelian or sporadic traits, or associating with complex diseases. Several tools have been developed for CNV assessment based on next generation sequencing (NGS) data using Read-depth (RD) strategy. However, maintaining high level of sensitivity and specificity is always challenging. Here, we present a novel, powerful, user-friendly and open accessed tool, T-CNV for CNV detection and visualization in targeted NGS panel.Results T-CNV consists of primary CNV detection and CNV candidates confirmation steps. After computing log2 values of normalized read depth ratio of tumor and normal/control sample, T-CNV confirms each possible CNV candidates by bins method, Gaussian Mixture Model (GMM) clustering approach and window-sliding method. We benchmarked its capacity with MLPA-validated dataset. Compared to three other advanced tools, T-CNV presents excellent performance with 95.42% sensitivity, 99.93% specificity and 93.63% positive predict value in MLPA-validated dataset, while achieving satisfactory performance in simulation study (sensitivity 65.95%, positive predict value 88.71% at coverage 100X).Conclusions T-CNV is a novel and robust tool for CNV detection and visualization in targeted NGS panel consisting of determination of possible CNV candidates and further confirmation by three different methods. It’s publicly available at https://github.com/Top-Gene/T-CNV.



2018 ◽  
Author(s):  
Eric Talevich ◽  
A. Hunter Shain

AbstractRNA-sequencing is most commonly used to measure gene expression, but it is possible to extract genotypic information from RNA-sequencing data, too. Point mutations and translocations can be detected when they occur in expressed genes, however, there are few software solutions to infer copy number information from RNA-sequencing data. This is because a gene’s expression is dictated by a number of variables, including, but not limited to, copy number variation. Here, we report new functionalities within the software package CNVkit that enable copy number inference from RNA-sequencing data. First, CNVkit removes technical variation in gene expression associated with GC-content and transcript length. Next, CNVkit assigns a weight, dictated by several variables, to each transcript with the net effect of preferentially inferring copy number from highly and stably expressed genes. We benchmarked our approach on 105 melanomas from The Cancer Genome Atlas project and observed a high degree of concordance (R = 0.739) between our estimates and those from array comparative genomic hybridization (aCGH) on the same samples. After initial configuration, the software requires few inputs, is able to process a batch of up to 100 samples in less than ten minutes, and can be used in conjunction with pre-existing features of CNVkit, including visualization tools. Overall, we present a rapid, user-friendly software solution to infer copy number information from gene expression data.





2021 ◽  
Author(s):  
Zhendong Zhang ◽  
Yongzhuang Liu ◽  
Gaoyang Li ◽  
Yadong Wang


2001 ◽  
Vol 22 (3) ◽  
pp. 159-163 ◽  
Author(s):  
Kowan J. Jee ◽  
Young Tak Kim ◽  
Kyu Rae Kim ◽  
Yan Aalto ◽  
Sakari Knuutila

DNA copy number changes were studied by comparative genomic hybridization on 10 tumor specimens of squamous cell carcinoma of cervix obtained from Korean patients. DNA was extracted from paraffin‐embedded sections after removal of non‐malignant cells by microdissection technique. Copy number changes were found in 8/10 tumors. The most frequent changes were chromosome 19 gains (n=6) and losses on chromosomes 4 (n=4), 5 (n=3), and 3p (n=3). A novel finding was amplification in chromosome arm 9p21‐pter in 2 cases. Gains in 1, 3q, 5p, 6p, 8q, 16p, 17, and 20q and losses at 2q, 6q, 8p, 9q, 10p, 11, 13, 16q, and 18q were observed in at least one of the cases.



2014 ◽  
Author(s):  
Sean D Smith ◽  
Joseph K Kawash ◽  
Andrey Grigoriev

Amplifications or deletions of genome segments, known as copy number variants (CNVs), have been associated with many diseases. Read depth analysis of next-generation sequencing (NGS) is an essential method of detecting CNVs. However, genome read coverage is frequently distorted by various biases of NGS platforms, which reduce predictive capabilities of existing approaches. Additionally, the use of read depth tools has been somewhat hindered by imprecise breakpoint identification. We developed GROM-RD, an algorithm that analyzes multiple biases in read coverage to detect CNVs in NGS data. We found non-uniform variance across distinct GC regions after using existing GC bias correction methods and developed a novel approach to normalize such variance. Although complex and repetitive genome segments complicate CNV detection, GROM-RD adjusts for repeat bias and uses a two-pipeline masking approach to detect CNVs in complex and repetitive segments while improving sensitivity in less complicated regions. To overcome a typical weakness of RD methods, GROM-RD employs a CNV search using size-varying overlapping windows to improve breakpoint resolution. We compared our method to two widely used programs based on read depth methods, CNVnator and RDXplorer, and observed improved CNV detection and breakpoint accuracy for GROM-RD. GROM-RD is available at http://grigoriev.rutgers.edu/software/



Sign in / Sign up

Export Citation Format

Share Document