A machine-learning approach for accurate detection of copy-number variants from exome sequencing

ABSTRACTCopy-number variants (CNVs) are a major cause of several genetic disorders, making their detection an essential component of genetic analysis pipelines. Current methods for detecting CNVs from exome sequencing data are limited by high false positive rates and low concordance due to the inherent biases of individual algorithms. To overcome these issues, calls generated by two or more algorithms are often intersected using Venn-diagram approaches to identify “high-confidence” CNVs. However, this approach is inadequate, as it misses potentially true calls that do not have consensus from multiple callers. Here, we present CN-Learn, a machine-learning framework (https://github.com/girirajanlab/CN_Learn) that integrates calls from multiple CNV detection algorithms and learns to accurately identify true CNVs using caller-specific and genomic features from a small subset of validated CNVs. Using CNVs predicted by four exome-based CNV callers (CANOES, CODEX, XHMM and CLAMMS) from 503 samples, we demonstrate that CN-Learn identifies true CNVs at higher precision (~90%) and recall (~85%) rates while maintaining robust performance even when trained with minimal data (~30 samples). CN-Learn recovers twice as many CNVs compared to individual callers or Venn diagram-based approaches, with features such as exome capture probe count, caller concordance and GC content providing the most discriminatory power. In fact, about 58% of all true CNVs recovered by CN-Learn were either singletons or calls that lacked support from at least one caller. Our study underscores the limitations of current approaches for CNV identification and provides an effective method that yields high-quality CNVs for application in clinical diagnostics.

Download Full-text

Polishing Copy Number Variant Calls on Exome Sequencing Data via Deep Learning

10.1101/2020.05.09.086082 ◽

2020 ◽

Author(s):

Furkan Özden ◽

Can Alkan ◽

A. Ercüment Çiçek

Keyword(s):

Deep Learning ◽

Exome Sequencing ◽

Copy Number ◽

Gc Content ◽

Copy Number Variant ◽

Copy Number Variations ◽

Exome Capture ◽

Sequencing Data ◽

Independent Manner ◽

Efficient Detection

AbstractAccurate and efficient detection of copy number variants (CNVs) is of critical importance due to their significant association with complex genetic diseases. Although algorithms working on whole genome sequencing (WGS) data provide stable results with mostly-valid statistical assumptions, copy number detection on whole exome sequencing (WES) data has mostly been a losing game with extremely high false discovery rates. This is unfortunate as WES data is cost efficient, compact and is relatively ubiquitous. The bottleneck is primarily due to non-contiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data and learns to correct the copy number variations reported by any over-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that (i) we can efficiently triple the duplication call precision and double the deletion call precisions of the state-of-the-art algorithms. We also show that model consistently improves the performance in a (i) sequencing technology, (ii) exome capture kit and (iii) CNV caller independent manner. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets and surge its application. The code and the models are available at https://github.com/ciceklab/DECoNT.

Download Full-text

Clinically significant exome-based copy number variants detected by re-evaluation of exome sequencing data

Dokuz Eylül Üniversitesi Tıp Fakültesi Dergisi ◽

10.5505/deutfd.2021.29053 ◽

2021 ◽

Vol 35 (1) ◽

pp. 1-11

Author(s):

Fatma Kurt Çolak

Keyword(s):

Exome Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Clinically Significant

Download Full-text

SA55GENOME-WIDE ASSOCIATION OF COPY NUMBER VARIANTS FROM WHOLE-EXOME SEQUENCING DATA REVEALS AN ASSOCIATION BETWEEN EXTREMES IN WORKING MEMORY PERFORMANCE AND RARE CNVS

European Neuropsychopharmacology ◽

10.1016/j.euroneuro.2018.08.277 ◽

2019 ◽

Vol 29 ◽

pp. S1218

Author(s):

Angela Heck ◽

Annette Milnik ◽

Vanja Vukojevic ◽

Jana Petrovska ◽

Virginie Freytag ◽

...

Keyword(s):

Working Memory ◽

Exome Sequencing ◽

Copy Number ◽

Memory Performance ◽

Copy Number Variants ◽

Sequencing Data ◽

Rare Cnvs ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data

Download Full-text

Faculty Opinions recommendation of A machine-learning approach for accurate detection of copy number variants from exome sequencing.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.735919354.793574021 ◽

2020 ◽

Author(s):

Guy Rouleau

Keyword(s):

Machine Learning ◽

Exome Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Learning Approach ◽

Accurate Detection ◽

Machine Learning Approach

Download Full-text

CANOES: detecting rare copy number variants from whole exome sequencing data

Nucleic Acids Research ◽

10.1093/nar/gku345 ◽

2014 ◽

Vol 42 (12) ◽

pp. e97-e97 ◽

Cited By ~ 62

Author(s):

Daniel Backenroth ◽

Jason Homsy ◽

Laura R. Murillo ◽

Joe Glessner ◽

Edwin Lin ◽

...

Keyword(s):

Exome Sequencing ◽

Whole Exome Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data

Download Full-text

Clinically relevant copy-number variants in exome sequencing data of patients with dystonia

Parkinsonism & Related Disorders ◽

10.1016/j.parkreldis.2021.02.013 ◽

2021 ◽

Vol 84 ◽

pp. 129-134

Author(s):

Michael Zech ◽

Sylvia Boesch ◽

Matej Škorvánek ◽

Ján Necpál ◽

Jana Švantnerová ◽

...

Keyword(s):

Exome Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Sequencing Data ◽

Exome Sequencing Data

Download Full-text

EXCAVATOR: detecting copy number variants from whole-exome sequencing data

Genome Biology ◽

10.1186/gb-2013-14-10-r120 ◽

2013 ◽

Vol 14 (10) ◽

pp. R120 ◽

Cited By ~ 151

Author(s):

Alberto Magi ◽

Lorenzo Tattini ◽

Ingrid Cifola ◽

Romina D’Aurizio ◽

Matteo Benelli ◽

...

Keyword(s):

Exome Sequencing ◽

Whole Exome Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data

Download Full-text

CNVkit: Copy number detection and visualization for targeted sequencing using off-target reads

10.1101/010876 ◽

2014 ◽

Cited By ~ 6

Author(s):

Eric Talevich ◽

A. Hunter Shain ◽

Thomas Botton ◽

Boris C. Bastian

Keyword(s):

Copy Number ◽

Repetitive Sequences ◽

Copy Number Variants ◽

Gc Content ◽

Read Depth ◽

Comparative Genomic ◽

Sequencing Data ◽

Intergenic Regions ◽

Copy Number Detection ◽

Copy Number Changes

Germline copy number variants (CNVs) and somatic copy number alterations (SCNAs) are of significant importance in syndromic conditions and cancer. Massive parallel sequencing is increasingly used to infer copy number information from variations in the read depth in sequencing data. However, this approach has limitations in the case of targeted re-sequencing, which leaves gaps in coverage between the regions chosen for enrichment and introduces biases related to the efficiency of target capture and library preparation. We present a method for copy number detection, implemented in the software package CNVkit, that uses both the targeted reads and the nonspecifically captured off-target reads to infer copy number evenly across the genome. This combination achieves both exon-level resolution in targeted regions and sufficient resolution in the larger intronic and intergenic regions to identify copy number changes. In particular, we successfully inferred copy number at equivalent to 100-kilobase resolution genome-wide from a platform targeting as few as 293 genes. After normalizing read counts to a pooled reference, we evaluated and corrected for three sources of bias that explain most of the extraneous variability in the sequencing read depth: GC content, target footprint size and spacing, and repetitive sequences. We compared the performance of CNVkit to copy number changes identified by array comparative genomic hybridization. We packaged the components of CNVkit so that it is straightforward to use and provides visualizations, detailed reporting of significant features, and export options for compatibility with other software. Availability: http://github.com/etal/cnvkit

Download Full-text

CNV-PG: a machine-learning framework for accurate copy number variation predicting and genotyping

10.1101/2020.04.13.039016 ◽

2020 ◽

Author(s):

Taifu Wang ◽

Jinghua Sun ◽

Xiuqing Zhang ◽

Wen-Jing Wang ◽

Qing Zhou

Keyword(s):

Machine Learning ◽

False Positive ◽

Copy Number ◽

Genetic Disorders ◽

Copy Number Variants ◽

Sequencing Data ◽

Learning Framework ◽

Number Variation ◽

Paired End Sequencing ◽

Discovery Algorithms

AbstractMotivationCopy-number variants (CNVs) are one of the major causes of genetic disorders. However, current methods for CNV calling have high false-positive rates and low concordance, and a few of them can accurately genotype CNVs.ResultsHere we propose CNV-PG (CNV Predicting and Genotyping), a machine-learning framework for accurately predicting and genotyping CNVs from paired-end sequencing data. CNV-PG can efficiently remove false positive CNVs from existing CNV discovery algorithms, and integrate CNVs from multiple CNV callers into a unified call set with high genotyping accuracy.AvailabilityCNV-PG is available at https://github.com/wonderful1/CNV-PG

Download Full-text