ifCNV: a novel isolation-forest-based package to detect copy number variations from NGS datasets

Copy number variations (CNVs) are an essential component of genetic variation distributed across large parts of the human genome. CNV detection from next-generation sequencing data and artificial intelligence algorithms has progressed in recent years. However, only a few tools have taken advantage of machine learning algorithms for CNV detection. The most developed approach is to use a reference dataset to compare with the samples of interest, and it is well known that selecting appropriate normal samples represents a challenging task which dramatically influences the precision of results in all CNV-detecting tools. With careful consideration of these issues, we propose here ifCNV, a new software based on isolation forests that creates its own reference, available in R and python with customisable parameters. ifCNV combines artificial intelligence using two isolation forests and a comprehensive scoring method to faithfully detect CNVs among various samples. It was validated using datasets from diverse origins, and it exhibits high sensitivity, specificity and accuracy. ifCNV is a publicly available open-source software that allows the detection of CNVs in many clinical situations.

Download Full-text

Validation of an ultra-fast CNV calling tool for Next Generation Sequencing data using MLPA-verified copy number alterations

10.1101/340505 ◽

2018 ◽

Author(s):

Bas Tolhuis ◽

Hans Karten

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

False Positive Rate ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Single Node ◽

Generation Sequencing ◽

Cnv Detection

AbstractDNA Copy Number Variations (CNVs) are an important source for genetic diversity and pathogenic variants. Next Generation Sequencing (NGS) methods have become increasingly more popular for CNV detection, but its data analysis is a growing bottleneck. Genalice CNV is a novel tool for detection of CNVs. It takes care of turnaround time, scalability and cost issues associated with NGS computational analysis. Here, we validate Genalice CNV with MLPA-verified exon CNVs and genes with normal copy numbers. Genalice CNV detects 61 out of 62 exon CNVs and its false positive rate is less than 1%. It analyzes 96 samples from a targeted NGS assay in less than 45 minutes, including read alignment and CNV detection, using a single node. Furthermore, we describe data quality measures to minimize false discoveries. In conclusion, Genalice CNV is highly sensitive and specific, as well as extremely fast, which will be beneficial for clinical detection of CNVs.

Download Full-text

Detection of copy number variations by pair analysis using next-generation sequencing data in inherited kidney diseases

Clinical and Experimental Nephrology ◽

10.1007/s10157-018-1534-x ◽

2018 ◽

Vol 22 (4) ◽

pp. 881-888 ◽

Cited By ~ 14

Author(s):

China Nagano ◽

Kandai Nozu ◽

Naoya Morisada ◽

Masahiko Yazawa ◽

Daisuke Ichikawa ◽

...

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Kidney Diseases ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Pair Analysis ◽

Generation Sequencing

Download Full-text

Hadoop-CNV-RF: a clinically validated and scalable copy number variation detection tool for next-generation sequencing data

10.21203/rs.2.22176/v1 ◽

2020 ◽

Author(s):

Getiria Onsongo ◽

Ham Ching Lam ◽

Matthew Bower ◽

Bharat Thyagarajan

Keyword(s):

Copy Number ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Large Gene ◽

Data Framework ◽

Number Variation ◽

Targeted Capture ◽

Objective Detection ◽

Gene Panels

Abstract Objective : Detection of small copy number variations (CNVs) in clinically relevant genes is routinely being used to aid diagnosis. We recently developed a tool, CNV-RF , capable of detecting small clinically relevant CNVs. CNV-RF was designed for small gene panels and did not scale well to large gene panels. On large gene panels, CNV-RF routinely failed due to memory limitations. When successful, it took about 2 days to complete a single analysis, making it impractical for routinely analyzing large gene panels. We need a reliable tool capable of detecting CNVs in the clinic that scales well to large gene panels. Results : We have developed Hadoop-CNV-RF, a scalable implementation of CNV-RF . Hadoop-CNV-RF is a freely available tool capable of rapidly analyzing large gene panels. It takes advantage of Hadoop, a big data framework developed to analyze large amounts of data. Preliminary results show it reduces analysis time from about 2 days to less than 4 hours and can seamlessly scale to large gene panels. Hadoop-CNV-RF has been clinically validated for targeted capture data and is currently being used in a CLIA molecular diagnostics laboratory. Its availability and usage instructions are publicly available at: https://github.com/getiria-onsongo/hadoop-cnvrf-public .

Download Full-text

RKDOSCNV: A Local Kernel Density-Based Approach to the Detection of Copy Number Variations by Using Next-Generation Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2020.569227 ◽

2020 ◽

Vol 11 ◽

Author(s):

Guojun Liu ◽

Junying Zhang ◽

Xiguo Yuan ◽

Chao Wei

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Kernel Density ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate

Nucleic Acids Research ◽

10.1093/nar/gks003 ◽

2012 ◽

Vol 40 (9) ◽

pp. e69-e69 ◽

Cited By ~ 239

Author(s):

Günter Klambauer ◽

Karin Schwarzbauer ◽

Andreas Mayr ◽

Djork-Arné Clevert ◽

Andreas Mitterecker ◽

...

Keyword(s):

Next Generation Sequencing ◽

False Discovery Rate ◽

Copy Number ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

False Discovery ◽

Generation Sequencing

Download Full-text

Identifying Copy Number Variations based on Next Generation Sequencing Data by a Mixture of Poisson Model

Nature Precedings ◽

10.1038/npre.2010.4716.1 ◽

2010 ◽

Author(s):

Karin Schwarzbauer ◽

Günter Klambauer ◽

Andreas Mayr ◽

Sepp Hochreiter

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Poisson Model ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

A Cluster-Based Approach for the Discovery of Copy Number Variations From Next-Generation Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2021.699510 ◽

2021 ◽

Vol 12 ◽

Author(s):

Guojun Liu ◽

Junying Zhang

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Read Depth ◽

Copy Number Variations ◽

Experimental Results ◽

Data Sets ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing ◽

Cnv Detection

The next-generation sequencing technology offers a wealth of data resources for the detection of copy number variations (CNVs) at a high resolution. However, it is still challenging to correctly detect CNVs of different lengths. It is necessary to develop new CNV detection tools to meet this demand. In this work, we propose a new CNV detection method, called CBCNV, for the detection of CNVs of different lengths from whole genome sequencing data. CBCNV uses a clustering algorithm to divide the read depth segment profile, and assigns an abnormal score to each read depth segment. Based on the abnormal score profile, Tukey’s fences method is adopted in CBCNV to forecast CNVs. The performance of the proposed method is evaluated on simulated data sets, and is compared with those of several existing methods. The experimental results prove that the performance of CBCNV is better than those of several existing methods. The proposed method is further tested and verified on real data sets, and the experimental results are found to be consistent with the simulation results. Therefore, the proposed method can be expected to become a routine tool in the analysis of CNVs from tumor-normal matched samples.

Download Full-text

Detection of Significant Copy Number Variations From Multiple Samples in Next-Generation Sequencing Data

IEEE Transactions on NanoBioscience ◽

10.1109/tnb.2017.2783910 ◽

2018 ◽

Vol 17 (1) ◽

pp. 12-20 ◽

Cited By ~ 6

Author(s):

Xiguo Yuan ◽

Junying Zhang ◽

Liying Yang ◽

Jun Bai ◽

Peizhen Fan

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Multiple Samples ◽

Generation Sequencing

Download Full-text

CNV-MEANN: A Neural Network and Mind Evolutionary Algorithm-Based Detection of Copy Number Variations From Next-Generation Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2021.700874 ◽

2021 ◽

Vol 12 ◽

Author(s):

Tihao Huang ◽

Junqing Li ◽

Baoxian Jia ◽

Hongyan Sang

Keyword(s):

Neural Network ◽

Next Generation Sequencing ◽

Evolutionary Algorithm ◽

Copy Number ◽

Copy Number Variations ◽

Detection Methods ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Copy number variation (CNV), is defined as repetitions or deletions of genomic segments of 1 Kb to 5 Mb, and is a major trigger for human disease. The high-throughput and low-cost characteristics of next-generation sequencing technology provide the possibility of the detection of CNVs in the whole genome, and also greatly improve the clinical practicability of next-generation sequencing (NGS) testing. However, current methods for the detection of CNVs are easily affected by sequencing and mapping errors, and uneven distribution of reads. In this paper, we propose an improved approach, CNV-MEANN, for the detection of CNVs, involving changing the structure of the neural network used in the MFCNV method. This method has three differences relative to the MFCNV method: (1) it utilizes a new feature, mapping quality, to replace two features in MFCNV, (2) it considers the influence of the loss categories of CNV on disease prediction, and refines the output structure, and (3) it uses a mind evolutionary algorithm to optimize the backpropagation (neural network) neural network model, and calculates individual scores for each genome bin to predict CNVs. Using both simulated and real datasets, we tested the performance of CNV-MEANN and compared its performance with those of seven widely used CNV detection methods. Experimental results demonstrated that the CNV-MEANN approach outperformed other methods with respect to sensitivity, precision, and F1-score. The proposed method was able to detect many CNVs that other approaches could not, and it reduced the boundary bias. CNV-MEANN is expected to be an effective method for the analysis of changes in CNVs in the genome.

Download Full-text

Algorithmic improvements for discovery of germline copy number variants in next-generation sequencing data

10.1101/441378 ◽

2018 ◽

Cited By ~ 1

Author(s):

Brendan O’Fallon ◽

Jacob Durtschi ◽

Tracey Lewis ◽

Devin Close

Keyword(s):

Positive Predictive Value ◽

Predictive Value ◽

Copy Number ◽

Copy Number Variants ◽

Sole Source ◽

Next Generation Sequencing Data ◽

Detection Accuracy ◽

Sequencing Data ◽

Copy Number State ◽

Cnv Detection

AbstractCopy number variants (CNVs) play a significant role in human heredity and disease, however sensitive and specific characterization of CNVs from NGS data has remained challenging. Detection is especially problematic for hybridization-capture data in which read counts are the sole source of copy number information. We describe two algorithmic adaptations that improve CNV detection accuracy in a Hidden Markov Model (HMM) context. First, we present a method for com puting target- and copy number state-specific emission distributions. Second, we demonstrate that the Pointwise Maximum a posteriori (PMAP) HMM decoding procedure yields improved sensitivity for small CNV calls compared to the more common Viterbi HMM decoder. We develop a prototype implementation, called Cobalt, and compare it to other CNV detection tools using sets of simulated and previously detected CNVs with sizes spanning a single exon up to a full chromosome. In both the simulation and previously detected CNV studies Cobalt shows similar sensitivity but significantly improved positive predictive value (PPV) compared to other callers. Overall sensitivity is 80%-90% for deletion CNVs spanning 1-4 targets and 90%-100% for larger deletion events, while sensitivity is somewhat lower for small duplication CNVs. Cobalt demonstrates significantly improved positive predictive value (PPV) compared to other callers with similar sensitivity, typically making 5X fewer total calls overall.

Download Full-text