Algorithmic improvements for discovery of germline copy number variants in next-generation sequencing data

AbstractCopy number variants (CNVs) play a significant role in human heredity and disease, however sensitive and specific characterization of CNVs from NGS data has remained challenging. Detection is especially problematic for hybridization-capture data in which read counts are the sole source of copy number information. We describe two algorithmic adaptations that improve CNV detection accuracy in a Hidden Markov Model (HMM) context. First, we present a method for com puting target- and copy number state-specific emission distributions. Second, we demonstrate that the Pointwise Maximum a posteriori (PMAP) HMM decoding procedure yields improved sensitivity for small CNV calls compared to the more common Viterbi HMM decoder. We develop a prototype implementation, called Cobalt, and compare it to other CNV detection tools using sets of simulated and previously detected CNVs with sizes spanning a single exon up to a full chromosome. In both the simulation and previously detected CNV studies Cobalt shows similar sensitivity but significantly improved positive predictive value (PPV) compared to other callers. Overall sensitivity is 80%-90% for deletion CNVs spanning 1-4 targets and 90%-100% for larger deletion events, while sensitivity is somewhat lower for small duplication CNVs. Cobalt demonstrates significantly improved positive predictive value (PPV) compared to other callers with similar sensitivity, typically making 5X fewer total calls overall.

Download Full-text

Integrative DNA copy number detection and genotyping from sequencing and array-based platforms

10.1101/172700 ◽

2017 ◽

Cited By ~ 2

Author(s):

Zilu Zhou ◽

Weixin Wang ◽

Li-San Wang ◽

Nancy Ruonan Zhang

Keyword(s):

Copy Number ◽

Association Studies ◽

Snp Array ◽

Supplementary Information ◽

Detection Accuracy ◽

Sequencing Data ◽

Array Data ◽

Combining Data ◽

Allele Specific ◽

Cnv Detection

AbstractMotivationCopy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large-scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous SNP-array data are available. An integrated cross-platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele-specific reads.ResultsWe propose a statistical framework, integrated Copy Number Variation detection algorithm (iCNV), which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP-array data by a Hidden Markov Model (HMM). We compare integrated two-platform CNV detection using iCNV to naive intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only, and show that the utilization of allele-specific reads improve CNV detection accuracy compared to existing methods.Availabilityhttps://github.com/zhouzilu/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

DeviCNV: detection and visualization of exon-level copy number variants in targeted next-generation sequencing data

BMC Bioinformatics ◽

10.1186/s12859-018-2409-6 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 2

Author(s):

Yeeok Kang ◽

Seong-Hyeuk Nam ◽

Kyung Sun Park ◽

Yoonjung Kim ◽

Jong-Won Kim ◽

...

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Targeted Next Generation Sequencing ◽

Exon Level ◽

Generation Sequencing

Download Full-text

Validation of an ultra-fast CNV calling tool for Next Generation Sequencing data using MLPA-verified copy number alterations

10.1101/340505 ◽

2018 ◽

Author(s):

Bas Tolhuis ◽

Hans Karten

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

False Positive Rate ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Single Node ◽

Generation Sequencing ◽

Cnv Detection

AbstractDNA Copy Number Variations (CNVs) are an important source for genetic diversity and pathogenic variants. Next Generation Sequencing (NGS) methods have become increasingly more popular for CNV detection, but its data analysis is a growing bottleneck. Genalice CNV is a novel tool for detection of CNVs. It takes care of turnaround time, scalability and cost issues associated with NGS computational analysis. Here, we validate Genalice CNV with MLPA-verified exon CNVs and genes with normal copy numbers. Genalice CNV detects 61 out of 62 exon CNVs and its false positive rate is less than 1%. It analyzes 96 samples from a targeted NGS assay in less than 45 minutes, including read alignment and CNV detection, using a single node. Furthermore, we describe data quality measures to minimize false discoveries. In conclusion, Genalice CNV is highly sensitive and specific, as well as extremely fast, which will be beneficial for clinical detection of CNVs.

Download Full-text

Performance of copy number variants detection based on whole-genome sequencing by DNBSEQ platforms

BMC Bioinformatics ◽

10.1186/s12859-020-03859-x ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Junhua Rao ◽

Lihua Peng ◽

Xinming Liang ◽

Hui Jiang ◽

Chunyu Geng ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Technology Use ◽

Copy Number ◽

Massively Parallel Sequencing ◽

Copy Number Variants ◽

Whole Genome ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Cnv Detection

Abstract Background DNBSEQ™ platforms are new massively parallel sequencing (MPS) platforms that use DNA nanoball technology. Use of data generated from DNBSEQ™ platforms to detect single nucleotide variants (SNVs) and small insertions and deletions (indels) has proven to be quite effective, while the feasibility of copy number variants (CNVs) detection is unclear. Results Here, we first benchmarked different CNV detection tools based on Illumina whole-genome sequencing (WGS) data of NA12878 and then assessed these tools in CNV detection based on DNBSEQ™ sequencing data from the same sample. When the same tool was used, the CNVs detected based on DNBSEQ™ and Illumina data were similar in quantity, length and distribution, while great differences existed within results from different tools and even based on data from a single platform. We further estimated the CNV detection power based on available CNV benchmarks of NA12878 and found similar precision and sensitivity between the DNBSEQ™ and Illumina platforms. We also found higher precision of CNVs shorter than 1 kbp based on DNBSEQ™ platforms than those based on Illumina platforms by using Pindel, DELLY and LUMPY. We carefully compared these two available benchmarks and found a large proportion of specific CNVs between them. Thus, we constructed a more complete CNV benchmark of NA12878 containing 3512 CNV regions. Conclusions We assessed and benchmarked CNV detections based on WGS with DNBSEQ™ platforms and provide guidelines for future studies.

Download Full-text

Association of Rare Recurrent Copy Number Variants With Congenital Heart Defects Based on Next-Generation Sequencing Data From Family Trios

Frontiers in Genetics ◽

10.3389/fgene.2019.00819 ◽

2019 ◽

Vol 10 ◽

Author(s):

Yichuan Liu ◽

Xiao Chang ◽

Joseph Glessner ◽

Huiqi Qu ◽

Lifeng Tian ◽

...

Keyword(s):

Next Generation Sequencing ◽

Congenital Heart Defects ◽

Copy Number ◽

Congenital Heart ◽

Heart Defects ◽

Copy Number Variants ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

Evaluation of CNV detection tools for NGS panel data in genetic diagnostics

European Journal of Human Genetics ◽

10.1038/s41431-020-0675-z ◽

2020 ◽

Vol 28 (12) ◽

pp. 1645-1655 ◽

Cited By ~ 3

Author(s):

José Marcos Moreno-Cabrera ◽

Jesús del Valle ◽

Elisabeth Castellanos ◽

Lidia Feliubadaló ◽

Marta Pineda ◽

...

Keyword(s):

Copy Number Variants ◽

Hereditary Diseases ◽

Next Generation Sequencing Data ◽

Genetic Diagnostics ◽

Sequencing Data ◽

Targeted Next Generation Sequencing ◽

Highly Sensitive ◽

Ngs Data ◽

Generation Sequencing ◽

Cnv Detection

AbstractAlthough germline copy-number variants (CNVs) are the genetic cause of multiple hereditary diseases, detecting them from targeted next-generation sequencing data (NGS) remains a challenge. Existing tools perform well for large CNVs but struggle with single and multi-exon alterations. The aim of this work is to evaluate CNV calling tools working on gene panel NGS data and their suitability as a screening step before orthogonal confirmation in genetic diagnostics strategies. Five tools (DECoN, CoNVaDING, panelcn.MOPS, ExomeDepth, and CODEX2) were tested against four genetic diagnostics datasets (two in-house and two external) for a total of 495 samples with 231 single and multi-exon validated CNVs. The evaluation was performed using the default and sensitivity-optimized parameters. Results showed that most tools were highly sensitive and specific, but the performance was dataset dependant. When evaluating them in our diagnostics scenario, DECoN and panelcn.MOPS detected all CNVs with the exception of one mosaic CNV missed by DECoN. However, DECoN outperformed panelcn.MOPS specificity achieving values greater than 0.90 when using the optimized parameters. In our in-house datasets, DECoN and panelcn.MOPS showed the highest performance for CNV screening before orthogonal confirmation. Benchmarking and optimization code is freely available at https://github.com/TranslationalBioinformaticsIGTP/CNVbenchmarkeR.

Download Full-text

Copy number variants in clinical next-generation sequencing data can define the relationship between simultaneous tumors in an individual patient

Experimental and Molecular Pathology ◽

10.1016/j.yexmp.2014.05.008 ◽

2014 ◽

Vol 97 (1) ◽

pp. 69-73 ◽

Cited By ~ 10

Author(s):

Jennifer K. Sehn ◽

Haley J. Abel ◽

Eric J. Duncavage

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

The Relationship ◽

Generation Sequencing

Download Full-text

ifCNV: a novel isolation-forest-based package to detect copy number variations from NGS datasets

10.1101/2022.01.03.474771 ◽

2022 ◽

Author(s):

Simon Cabello ◽

Julie A Vendrell ◽

Charles Van Goethem ◽

Mehdi Brousse ◽

Catherine Gozé ◽

...

Keyword(s):

Artificial Intelligence ◽

Copy Number ◽

High Sensitivity ◽

Copy Number Variations ◽

Careful Consideration ◽

Machine Learning Algorithms ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Scoring Method ◽

Cnv Detection

Copy number variations (CNVs) are an essential component of genetic variation distributed across large parts of the human genome. CNV detection from next-generation sequencing data and artificial intelligence algorithms has progressed in recent years. However, only a few tools have taken advantage of machine learning algorithms for CNV detection. The most developed approach is to use a reference dataset to compare with the samples of interest, and it is well known that selecting appropriate normal samples represents a challenging task which dramatically influences the precision of results in all CNV-detecting tools. With careful consideration of these issues, we propose here ifCNV, a new software based on isolation forests that creates its own reference, available in R and python with customisable parameters. ifCNV combines artificial intelligence using two isolation forests and a comprehensive scoring method to faithfully detect CNVs among various samples. It was validated using datasets from diverse origins, and it exhibits high sensitivity, specificity and accuracy. ifCNV is a publicly available open-source software that allows the detection of CNVs in many clinical situations.

Download Full-text

Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives

BMC Bioinformatics ◽

10.1186/1471-2105-14-s11-s1 ◽

2013 ◽

Vol 14 (S11) ◽

Cited By ~ 287

Author(s):

Min Zhao ◽

Qingguo Wang ◽

Quan Wang ◽

Peilin Jia ◽

Zhongming Zhao

Keyword(s):

Next Generation Sequencing ◽

Copy Number Variation ◽

Copy Number ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Computational Tools ◽

Number Variation ◽

Generation Sequencing ◽

Cnv Detection

Download Full-text

T-CNV: a robust tool for detecting and visualizing copy number variants in targeted sequencing data.

10.21203/rs.3.rs-27672/v1 ◽

2020 ◽

Author(s):

liu ye ◽

wu yangming ◽

zheng zexin ◽

zhou tianliangwen

Keyword(s):

Copy Number ◽

Copy Number Variants ◽

Control Sample ◽

Read Depth ◽

Gaussian Mixture ◽

Sequencing Data ◽

Positive Predict Value ◽

Sliding Method ◽

Targeted Ngs ◽

Cnv Detection

Abstract Background Copy number variants (CNVs) are widespread among human genes, causing Mendelian or sporadic traits, or associating with complex diseases. Several tools have been developed for CNV assessment based on next generation sequencing (NGS) data using Read-depth (RD) strategy. However, maintaining high level of sensitivity and specificity is always challenging. Here, we present a novel, powerful, user-friendly and open accessed tool, T-CNV for CNV detection and visualization in targeted NGS panel.Results T-CNV consists of primary CNV detection and CNV candidates confirmation steps. After computing log2 values of normalized read depth ratio of tumor and normal/control sample, T-CNV confirms each possible CNV candidates by bins method, Gaussian Mixture Model (GMM) clustering approach and window-sliding method. We benchmarked its capacity with MLPA-validated dataset. Compared to three other advanced tools, T-CNV presents excellent performance with 95.42% sensitivity, 99.93% specificity and 93.63% positive predict value in MLPA-validated dataset, while achieving satisfactory performance in simulation study (sensitivity 65.95%, positive predict value 88.71% at coverage 100X).Conclusions T-CNV is a novel and robust tool for CNV detection and visualization in targeted NGS panel consisting of determination of possible CNV candidates and further confirmation by three different methods. It’s publicly available at https://github.com/Top-Gene/T-CNV.

Download Full-text