CNV Detection from Exome Sequencing Data in Routine Diagnostics of Rare Genetic Disorders: Opportunities and Limitations

To assess the potential of detecting copy number variations (CNVs) directly from exome sequencing (ES) data in diagnostic settings, we developed a CNV-detection pipeline based on ExomeDepth software and applied it to ES data of 450 individuals. Initially, only CNVs affecting genes in the requested diagnostic gene panels were scored and tested against arrayCGH results. Pathogenic CNVs were detected in 18 individuals. Most detected CNVs were larger than 400 kb (11/18), but three individuals had small CNVs impacting one or a few exons only and were thus not detectable by arrayCGH. Conversely, two pathogenic CNVs were initially missed, as they impacted genes not included in the original gene panel analysed, and a third one was missed as it was in a poorly covered region. The overall combined diagnostic rate (SNVs + CNVs) in our cohort was 36%, with wide differences between clinical domains. We conclude that (1) the ES-based CNV pipeline detects efficiently large and small pathogenic CNVs, (2) the detection of CNV relies on uniformity of sequencing and good coverage, and (3) in patients who remain unsolved by the gene panel analysis, CNV analysis should be extended to all captured genes, as diagnostically relevant CNVs may occur everywhere in the genome.

Download Full-text

CNV Detection from Circulating Tumor DNA in Late Stage Non-Small Cell Lung Cancer Patients

Genes ◽

10.3390/genes10110926 ◽

2019 ◽

Vol 10 (11) ◽

pp. 926 ◽

Cited By ~ 4

Author(s):

Hao Peng ◽

Lan Lu ◽

Zisong Zhou ◽

Jian Liu ◽

Dadong Zhang ◽

...

Keyword(s):

Lung Cancer ◽

Cancer Patients ◽

Limit Of Detection ◽

Estimation Method ◽

Circulating Tumor Dna ◽

Copy Number Variations ◽

Gene Panel ◽

Sequencing Data ◽

Tumor Dna ◽

Cnv Detection

While methods for detecting SNVs and indels in circulating tumor DNA (ctDNA) with hybridization capture-based next-generation sequencing (NGS) have been available, copy number variations (CNVs) detection is more challenging. Here, we present a method enabling CNV detection from a 150-gene panel using a very low amount of ctDNA. First, a read depth-based CNV estimation method without a paired blood sample was developed and cfDNA sequencing data from healthy people were used to build a panel of normal (PoN) model. Then, in silico and in vitro simulations were performed to define the limit of detection (LOD) for EGFR, ERBB2, and MET. Compared to the WES results of the 48 samples, the concordance rate for EGFR, ERBB2, and MET CNVs was 78%, 89.6%, and 92.4%, respectively. In another cohort profiled with the 150-gene panel from 5980 lung cancer ctDNA samples, we detected the three genes’ amplification with comparable population frequency with other cohorts. One lung adenocarcinoma patient with MET amplification detected by our method reached partial response to crizotinib. These findings show that our ctDNA CNV detection pipeline can detect CNVs with high specificity and concordance, which enables CNV calling in a non-invasive way for cancer patients when tissues are not available.

Download Full-text

Variable Phenotypes of Epilepsy, Intellectual Disability, and Schizophrenia Caused by 12p13.33–p13.32 Terminal Microdeletion in a Korean Family: A Case Report and Literature Review

Genes ◽

10.3390/genes12071001 ◽

2021 ◽

Vol 12 (7) ◽

pp. 1001

Author(s):

Jiyoon Han ◽

Joonhong Park

Keyword(s):

Intellectual Disability ◽

Exome Sequencing ◽

Environmental Influence ◽

Copy Number Variations ◽

Genetic Modifiers ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Korean Family ◽

Coverage Analysis ◽

Patient Will

A simultaneous analysis of nucleotide changes and copy number variations (CNVs) based on exome sequencing data was demonstrated as a potential new first-tier diagnosis strategy for rare neuropsychiatric disorders. In this report, using depth-of-coverage analysis from exome sequencing data, we described variable phenotypes of epilepsy, intellectual disability (ID), and schizophrenia caused by 12p13.33–p13.32 terminal microdeletion in a Korean family. We hypothesized that CACNA1C and KDM5A genes of the six candidate genes located in this region were the best candidates for explaining epilepsy, ID, and schizophrenia and may be responsible for clinical features reported in cases with monosomy of the 12p13.33 subtelomeric region. On the background of microdeletion syndrome, which was described in clinical cases with mild, moderate, and severe neurodevelopmental manifestations as well as impairments, the clinician may determine whether the patient will end up with a more severe or milder end‐phenotype, which in turn determines disease prognosis. In our case, the 12p13.33–p13.32 terminal microdeletion may explain the variable expressivity in the same family. However, further comprehensive studies with larger cohorts focusing on careful phenotyping across the lifespan are required to clearly elucidate the possible contribution of genetic modifiers and the environmental influence on the expressivity of 12p13.33 microdeletion and associated characteristics.

Download Full-text

Copy number variant detection with low-coverage whole-genome sequencing is a viable replacement for the traditional array-CGH

10.1101/2020.09.07.20183665 ◽

2020 ◽

Author(s):

Marcel Kucharik ◽

Jaroslav Budis ◽

Michaela Hyblova ◽

Gabriel Minarik ◽

Tomas Szemes

Keyword(s):

In Silico ◽

Copy Number ◽

Normal Population ◽

Genetic Disorders ◽

Prenatal Testing ◽

In Silico Analysis ◽

Copy Number Variant ◽

Detection Algorithm ◽

Copy Number Variations ◽

Cnv Detection

Copy number variations (CNVs) are a type of structural variant involving alterations in the number of copies of specific regions of DNA, which can either be deleted or duplicated. CNVs contribute substantially to normal population variability; however, abnormal CNVs cause numerous genetic disorders. Nowadays, several methods for CNV detection are used, from the conventional cytogenetic analysis through microarray-based methods (aCGH) to next-generation sequencing (NGS). We present GenomeScreen - NGS based CNV detection method based on a previously described CNV detection algorithm used for non-invasive prenatal testing (NIPT). We determined theoretical limits of its accuracy and confirmed it with extensive in-silico study and already genotyped samples. Theoretically, at least 6M uniquely mapped reads are required to detect CNV with a length of 100 kilobases (kb) or more with high confidence (Z-score > 7). In practice, the in-silico analysis showed the requirement at least 8M to obtain >99% accuracy (for 100 kb deviations). We compared GenomeScreen with one of the currently used aCGH methods in diagnostic laboratories, which has a 200 kb mean resolution. GenomeScreen and aCGH both detected 59 deviations, GenomeScreen furthermore detected 134 other (usually) smaller variations. Furthermore, the overall cost per sample is about 2-3x lower in the case of GenomeScreen.

Download Full-text

CNV-P: a machine-learning framework for predicting high confident copy number variations

PeerJ ◽

10.7717/peerj.12564 ◽

2021 ◽

Vol 9 ◽

pp. e12564

Author(s):

Taifu Wang ◽

Jinghua Sun ◽

Xiuqing Zhang ◽

Wen-Jing Wang ◽

Qing Zhou

Keyword(s):

Machine Learning ◽

False Positive ◽

Copy Number ◽

Genetic Disorders ◽

Genetic Diseases ◽

Basic Research ◽

Read Depth ◽

Copy Number Variations ◽

Sequencing Data ◽

Learning Framework

Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.

Download Full-text

Hadoop-CNV-RF: a clinically validated and scalable copy number variation detection tool for next-generation sequencing data

10.21203/rs.2.22176/v1 ◽

2020 ◽

Author(s):

Getiria Onsongo ◽

Ham Ching Lam ◽

Matthew Bower ◽

Bharat Thyagarajan

Keyword(s):

Copy Number ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Large Gene ◽

Data Framework ◽

Number Variation ◽

Targeted Capture ◽

Objective Detection ◽

Gene Panels

Abstract Objective : Detection of small copy number variations (CNVs) in clinically relevant genes is routinely being used to aid diagnosis. We recently developed a tool, CNV-RF , capable of detecting small clinically relevant CNVs. CNV-RF was designed for small gene panels and did not scale well to large gene panels. On large gene panels, CNV-RF routinely failed due to memory limitations. When successful, it took about 2 days to complete a single analysis, making it impractical for routinely analyzing large gene panels. We need a reliable tool capable of detecting CNVs in the clinic that scales well to large gene panels. Results : We have developed Hadoop-CNV-RF, a scalable implementation of CNV-RF . Hadoop-CNV-RF is a freely available tool capable of rapidly analyzing large gene panels. It takes advantage of Hadoop, a big data framework developed to analyze large amounts of data. Preliminary results show it reduces analysis time from about 2 days to less than 4 hours and can seamlessly scale to large gene panels. Hadoop-CNV-RF has been clinically validated for targeted capture data and is currently being used in a CLIA molecular diagnostics laboratory. Its availability and usage instructions are publicly available at: https://github.com/getiria-onsongo/hadoop-cnvrf-public .

Download Full-text

Detecting copy-number variations in whole-exome sequencing data using the eXome Hidden Markov Model: an ‘exome-first’ approach

Journal of Human Genetics ◽

10.1038/jhg.2014.124 ◽

2015 ◽

Vol 60 (4) ◽

pp. 175-182 ◽

Cited By ~ 36

Author(s):

Satoko Miyatake ◽

Eriko Koshimizu ◽

Atsushi Fujita ◽

Ryoko Fukai ◽

Eri Imagawa ◽

...

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Exome Sequencing ◽

Copy Number ◽

Hidden Markov ◽

Copy Number Variations ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data

Download Full-text

DeNovoCNN: A deep learning approach to de novo variant calling in next generation sequencing data

10.1101/2021.09.20.461072 ◽

2021 ◽

Author(s):

Gelana Khazeeva ◽

Karolis Sablauskas ◽

Bart van der Sanden ◽

Wouter Steyaert ◽

Michael Kwint ◽

...

Keyword(s):

Exome Sequencing ◽

De Novo ◽

Genetic Disorders ◽

Variant Calling ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Accurate Identification ◽

Whole Exome ◽

De Novo Variant ◽

Generation Sequencing

De novo mutations (DNMs) are an important cause of genetic disorders. The accurate identification of DNMs from sequencing data is therefore fundamental to rare disease research and diagnostics. Unfortunately, identifying reliable DNMs remains a major challenge due to sequence errors, uneven coverage, and mapping artifacts. Here, we developed a deep convolutional neural network (CNN) DNM caller (DeNovoCNN), that encodes alignment of sequence reads for a trio as 160×164 resolution images. DeNovoCNN was trained on DNMs of whole exome sequencing (WES) of 2003 trios achieving on average 99.2% recall and 93.8% precision. We find that DeNovoCNN has increased recall/sensitivity and precision compared to existing de novo calling approaches (GATK, DeNovoGear, Samtools) based on the Genome in a Bottle reference dataset. Sanger validations of DNMs called in both exome and genome datasets confirm that DeNovoCNN outperforms existing methods. Most importantly, we show that DeNovoCNN is robust against different exome sequencing and analyses approaches, thereby allowing it to be applied on other datasets. DeNovoCNN is freely available and can be run on existing alignment (BAM/CRAM) and variant calling (VCF) files from WES and WGS without a need for variant recalling.

Download Full-text

Detection of Chromosomal Aberrations in Acute Myeloid Leukemia By Copy Number Alteration Analysis of Exome Sequencing Data

Blood ◽

10.1182/blood.v126.23.3859.3859 ◽

2015 ◽

Vol 126 (23) ◽

pp. 3859-3859

Author(s):

Sebastian Vosberg ◽

Luise Hartmann ◽

Stephanie Schneider ◽

Klaus H. Metzeler ◽

Bianka Ksienzyk ◽

...

Keyword(s):

Exome Sequencing ◽

Chromosomal Aberrations ◽

Copy Number ◽

Size Estimation ◽

Sequencing Data ◽

Trisomy 8 ◽

Clone Size ◽

Routine Diagnostics ◽

Chromosome 5Q ◽

Blast Count

Abstract Exome sequencing is widely used and established to detect tumor-specific sequence variants such as point mutations and small insertions/deletions. Beyond single nucleotide resolution, sequencing data can also be used to identify changes in sequence coverage between samples enabling the detection of copy number alterations (CNAs). Somatic CNAs represent gain or loss of genomic material in tumor cells like aneuploidies (e.g. monosomies and trisomies), duplications, or deletions. In order to test the feasibility of somatic CNA detection from exome data, we analyzed 13 acute myeloid leukemia (AML) patients with known cytogenetic alterations detected at diagnosis (n=8) and/or at relapse (n=11). Corresponding remission exomes from all patients were available as germline controls resulting in 19 comparisons of paired leukemia and remission exome data sets. Exome sequencing was performed on a HiSeq 2500 instrument (Illumina) with mean target coverage of >100x. Exons with divergent coverage were detected using a linear regression model on mean exon coverage, and CNAs were called by an exact segmentation algorithm (Rigaill et al. 2012, Bioinformatics). For all samples, cytogenetic information was available either form routine chromosomal analysis or fluorescent in situ hybridization (FISH). Blast count were known for all but one AML sample (n=19). Copy number-neutral cytogenetic alterations such as balanced translocations were excluded from the comparative analysis. By CNA-analysis of exomes we were able to detect chromosomal aberrations consistent with routine cytogenetics in 18 out of 19 (95%) AML samples. In particular, we confirmed 2 out of 2 monosomies (both -7), and 9 out of 10 trisomies (+4, n=1; +8, n=8; +21, n=1), e.g. trisomy 8 in figure 1A. Partial amplifications or deletions of chromosomes were confirmed in 10 out of 10 AML samples (dup(1q), n=3; dup(8q), n=1; del(5q), n=3; del(17p), n=1; del(20q), n=2), e.g. del(5q) in figure 1B. In the one case with inconsistent findings of chromosomal aberrations between exome and cytogenetic data there was a small subclone harboring the alteration described in only 4 out of 21 metaphases (19%). To assess the specificity of our CNA approach, we analyzed the exomes of 44 cytogenetically normal (CN) AML samples. Here we did not detect any CNAs larger than 5 Mb in the vast majority of these samples (43/44, 98%), only one large CNA was detected indicating a trisomy 8. Estimates of the clone size were highly correlated between CNA-analysis of exomes and the parameters from cytogenetics and cytomorphology (p=0.0076, Fisher's exact test, Figure 1C). In CNA-analysis of exomes, we defined the clone size based on the coverage ratio: . Clone size estimation by cytogenetics and cytomorphology was performed by calculating the mean of blast count and abnormal metaphase/interphase count. Of note, clones estimated by CNA-analysis of exomes tended to be slightly larger. This may result from purification by Ficoll gradient centrifugation prior to DNA extraction for sequencing and/or the fact that the fraction of cells analyzed by cytogenetics does not represent the true size of the malignant clone accurately because of differences in the mitotic index between normal and malignant cells. Overall, there was a high correlation between our CNA analysis of exome sequencing data and routine cytogenetics including limitations in the detection of small subclones. Our results confirm that high throughput sequencing is a versatile, valuable, and robust method to detect chromosomal changes resulting in copy number alterations in AML with high specificity and sensitivity (98% and 95%, respectively). Figure 1. (A) Detection of trisomy 8 with an estimated clone size of 100% (B) Detection of deletion on chromosome 5q with an estimated clone size of 90% (C) Correlation of clone size estimation by routine diagnostics and exome sequencing (p=0.0076) Figure 1. (A) Detection of trisomy 8 with an estimated clone size of 100%. / (B) Detection of deletion on chromosome 5q with an estimated clone size of 90%. / (C) Correlation of clone size estimation by routine diagnostics and exome sequencing (p=0.0076) Figure 2. Figure 2. Disclosures No relevant conflicts of interest to declare.

Download Full-text

DeAnnCNV: a tool for online detection and annotation of copy number variations from whole-exome sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkv556 ◽

2015 ◽

Vol 43 (W1) ◽

pp. W289-W294 ◽

Cited By ~ 13

Author(s):

Yuanwei Zhang ◽

Zhenhua Yu ◽

Rongjun Ban ◽

Huan Zhang ◽

Furhan Iqbal ◽

...

Keyword(s):

Exome Sequencing ◽

Whole Exome Sequencing ◽

Copy Number ◽

Copy Number Variations ◽

Online Detection ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data

Download Full-text

Benchmarking germline CNV calling tools from exome sequencing data

Scientific Reports ◽

10.1038/s41598-021-93878-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Veronika Gordeeva ◽

Elena Sharova ◽

Konstantin Babalyan ◽

Rinat Sultanov ◽

Vadim M. Govorun ◽

...

Keyword(s):

Exome Sequencing ◽

False Positive ◽

False Positive Rate ◽

Reference Sample ◽

Length Distribution ◽

Internal Standard ◽

Copy Number Variations ◽

Attractive Alternative ◽

Sequencing Data ◽

Wide Range

AbstractWhole-exome sequencing is an attractive alternative to microarray analysis because of the low cost and potential ability to detect copy number variations (CNV) of various sizes (from 1–2 exons to several Mb). Previous comparison of the most popular CNV calling tools showed a high portion of false-positive calls. Moreover, due to a lack of a gold standard CNV set, the results are limited and incomparable. Here, we aimed to perform a comprehensive analysis of tools capable of germline CNV calling available at the moment using a single CNV standard and reference sample set. Compiling variants from previous studies with Bayesian estimation approach, we constructed an internal standard for NA12878 sample (pilot National Institute of Standards and Technology Reference Material) including 110,050 CNV or non-CNV exons. The standard was used to evaluate the performance of 16 germline CNV calling tools on the NA12878 sample and 10 correlated exomes as a reference set with respect to length distribution, concordance, and efficiency. Each algorithm had a certain range of detected lengths and showed low concordance with other tools. Most tools are focused on detection of a limited number of CNVs one to seven exons long with a false-positive rate below 50%. EXCAVATOR2, exomeCopy, and FishingCNV focused on detection of a wide range of variations but showed low precision. Upon unified comparison, the tools were not equivalent. The analysis performed allows choosing algorithms or ensembles of algorithms most suitable for a specific goal, e.g. population studies or medical genetics.

Download Full-text