scholarly journals A machine-learning approach for accurate detection of copy-number variants from exome sequencing

2018 ◽  
Author(s):  
Vijay Kumar Pounraja ◽  
Gopal Jayakar ◽  
Matthew Jensen ◽  
Neil Kelkar ◽  
Santhosh Girirajan

ABSTRACTCopy-number variants (CNVs) are a major cause of several genetic disorders, making their detection an essential component of genetic analysis pipelines. Current methods for detecting CNVs from exome sequencing data are limited by high false positive rates and low concordance due to the inherent biases of individual algorithms. To overcome these issues, calls generated by two or more algorithms are often intersected using Venn-diagram approaches to identify “high-confidence” CNVs. However, this approach is inadequate, as it misses potentially true calls that do not have consensus from multiple callers. Here, we present CN-Learn, a machine-learning framework (https://github.com/girirajanlab/CN_Learn) that integrates calls from multiple CNV detection algorithms and learns to accurately identify true CNVs using caller-specific and genomic features from a small subset of validated CNVs. Using CNVs predicted by four exome-based CNV callers (CANOES, CODEX, XHMM and CLAMMS) from 503 samples, we demonstrate that CN-Learn identifies true CNVs at higher precision (~90%) and recall (~85%) rates while maintaining robust performance even when trained with minimal data (~30 samples). CN-Learn recovers twice as many CNVs compared to individual callers or Venn diagram-based approaches, with features such as exome capture probe count, caller concordance and GC content providing the most discriminatory power. In fact, about 58% of all true CNVs recovered by CN-Learn were either singletons or calls that lacked support from at least one caller. Our study underscores the limitations of current approaches for CNV identification and provides an effective method that yields high-quality CNVs for application in clinical diagnostics.

2020 ◽  
Author(s):  
Furkan Özden ◽  
Can Alkan ◽  
A. Ercüment Çiçek

AbstractAccurate and efficient detection of copy number variants (CNVs) is of critical importance due to their significant association with complex genetic diseases. Although algorithms working on whole genome sequencing (WGS) data provide stable results with mostly-valid statistical assumptions, copy number detection on whole exome sequencing (WES) data has mostly been a losing game with extremely high false discovery rates. This is unfortunate as WES data is cost efficient, compact and is relatively ubiquitous. The bottleneck is primarily due to non-contiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data and learns to correct the copy number variations reported by any over-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that (i) we can efficiently triple the duplication call precision and double the deletion call precisions of the state-of-the-art algorithms. We also show that model consistently improves the performance in a (i) sequencing technology, (ii) exome capture kit and (iii) CNV caller independent manner. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets and surge its application. The code and the models are available at https://github.com/ciceklab/DECoNT.


2014 ◽  
Vol 42 (12) ◽  
pp. e97-e97 ◽  
Author(s):  
Daniel Backenroth ◽  
Jason Homsy ◽  
Laura R. Murillo ◽  
Joe Glessner ◽  
Edwin Lin ◽  
...  

2021 ◽  
Vol 84 ◽  
pp. 129-134
Author(s):  
Michael Zech ◽  
Sylvia Boesch ◽  
Matej Škorvánek ◽  
Ján Necpál ◽  
Jana Švantnerová ◽  
...  

2013 ◽  
Vol 14 (10) ◽  
pp. R120 ◽  
Author(s):  
Alberto Magi ◽  
Lorenzo Tattini ◽  
Ingrid Cifola ◽  
Romina D’Aurizio ◽  
Matteo Benelli ◽  
...  

2014 ◽  
Author(s):  
Eric Talevich ◽  
A. Hunter Shain ◽  
Thomas Botton ◽  
Boris C. Bastian

Germline copy number variants (CNVs) and somatic copy number alterations (SCNAs) are of significant importance in syndromic conditions and cancer. Massive parallel sequencing is increasingly used to infer copy number information from variations in the read depth in sequencing data. However, this approach has limitations in the case of targeted re-sequencing, which leaves gaps in coverage between the regions chosen for enrichment and introduces biases related to the efficiency of target capture and library preparation. We present a method for copy number detection, implemented in the software package CNVkit, that uses both the targeted reads and the nonspecifically captured off-target reads to infer copy number evenly across the genome. This combination achieves both exon-level resolution in targeted regions and sufficient resolution in the larger intronic and intergenic regions to identify copy number changes. In particular, we successfully inferred copy number at equivalent to 100-kilobase resolution genome-wide from a platform targeting as few as 293 genes. After normalizing read counts to a pooled reference, we evaluated and corrected for three sources of bias that explain most of the extraneous variability in the sequencing read depth: GC content, target footprint size and spacing, and repetitive sequences. We compared the performance of CNVkit to copy number changes identified by array comparative genomic hybridization. We packaged the components of CNVkit so that it is straightforward to use and provides visualizations, detailed reporting of significant features, and export options for compatibility with other software. Availability: http://github.com/etal/cnvkit


2020 ◽  
Author(s):  
Taifu Wang ◽  
Jinghua Sun ◽  
Xiuqing Zhang ◽  
Wen-Jing Wang ◽  
Qing Zhou

AbstractMotivationCopy-number variants (CNVs) are one of the major causes of genetic disorders. However, current methods for CNV calling have high false-positive rates and low concordance, and a few of them can accurately genotype CNVs.ResultsHere we propose CNV-PG (CNV Predicting and Genotyping), a machine-learning framework for accurately predicting and genotyping CNVs from paired-end sequencing data. CNV-PG can efficiently remove false positive CNVs from existing CNV discovery algorithms, and integrate CNVs from multiple CNV callers into a unified call set with high genotyping accuracy.AvailabilityCNV-PG is available at https://github.com/wonderful1/CNV-PG


Sign in / Sign up

Export Citation Format

Share Document