HBOS-CNV: A New Approach to Detect Copy Number Variations From Next-Generation Sequencing Data

Copy number variation (CNV) is a genomic mutation that plays an important role in tumor evolution and tumor genesis. Accurate detection of CNVs from next-generation sequencing (NGS) data is still a challenging task due to artifacts such as uneven mapped reads and unbalanced amplitudes of gains and losses. This study proposes a new approach called HBOS-CNV to detect CNVs from NGS data. The central point of HBOS-CNV is that it uses a new statistic, the histogram-based outlier score (HBOS), to evaluate the fluctuation of genome bins to determine those of changed copy numbers. In comparison with existing statistics in the evaluation of CNVs, HBOS is a non-linearly transformed value from the observed read depth (RD) value of each genome bin, having the potential ability to relieve the effects resulted from the above artifacts. In the calculation of HBOS values, a dynamic width histogram is utilized to depict the density of bins on the genome being analyzed, which can reduce the effects of noises partially contributed by mapping and sequencing errors. The evaluation of genome bins using such a new statistic can lead to less extremely significant CNVs having a high probability of detection. We evaluated this method using a large number of simulation datasets and compared it with four existing methods (CNVnator, CNV-IFTV, CNV-LOF, and iCopyDav). The results demonstrated that our proposed method outperforms the others in terms of sensitivity, precision, and F1-measure. Furthermore, we applied the proposed method to a set of real sequencing samples from the 1000 Genomes Project and determined a number of CNVs with biological meanings. Thus, the proposed method can be regarded as a routine approach in the field of genome mutation analysis for cancer samples.

Download Full-text

Detection of copy number variations by pair analysis using next-generation sequencing data in inherited kidney diseases

Clinical and Experimental Nephrology ◽

10.1007/s10157-018-1534-x ◽

2018 ◽

Vol 22 (4) ◽

pp. 881-888 ◽

Cited By ~ 14

Author(s):

China Nagano ◽

Kandai Nozu ◽

Naoya Morisada ◽

Masahiko Yazawa ◽

Daisuke Ichikawa ◽

...

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Kidney Diseases ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Pair Analysis ◽

Generation Sequencing

Download Full-text

RKDOSCNV: A Local Kernel Density-Based Approach to the Detection of Copy Number Variations by Using Next-Generation Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2020.569227 ◽

2020 ◽

Vol 11 ◽

Author(s):

Guojun Liu ◽

Junying Zhang ◽

Xiguo Yuan ◽

Chao Wei

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Kernel Density ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate

Nucleic Acids Research ◽

10.1093/nar/gks003 ◽

2012 ◽

Vol 40 (9) ◽

pp. e69-e69 ◽

Cited By ~ 239

Author(s):

Günter Klambauer ◽

Karin Schwarzbauer ◽

Andreas Mayr ◽

Djork-Arné Clevert ◽

Andreas Mitterecker ◽

...

Keyword(s):

Next Generation Sequencing ◽

False Discovery Rate ◽

Copy Number ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

False Discovery ◽

Generation Sequencing

Download Full-text

Identifying Copy Number Variations based on Next Generation Sequencing Data by a Mixture of Poisson Model

Nature Precedings ◽

10.1038/npre.2010.4716.1 ◽

2010 ◽

Author(s):

Karin Schwarzbauer ◽

Günter Klambauer ◽

Andreas Mayr ◽

Sepp Hochreiter

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Poisson Model ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

Validation of an ultra-fast CNV calling tool for Next Generation Sequencing data using MLPA-verified copy number alterations

10.1101/340505 ◽

2018 ◽

Author(s):

Bas Tolhuis ◽

Hans Karten

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

False Positive Rate ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Single Node ◽

Generation Sequencing ◽

Cnv Detection

AbstractDNA Copy Number Variations (CNVs) are an important source for genetic diversity and pathogenic variants. Next Generation Sequencing (NGS) methods have become increasingly more popular for CNV detection, but its data analysis is a growing bottleneck. Genalice CNV is a novel tool for detection of CNVs. It takes care of turnaround time, scalability and cost issues associated with NGS computational analysis. Here, we validate Genalice CNV with MLPA-verified exon CNVs and genes with normal copy numbers. Genalice CNV detects 61 out of 62 exon CNVs and its false positive rate is less than 1%. It analyzes 96 samples from a targeted NGS assay in less than 45 minutes, including read alignment and CNV detection, using a single node. Furthermore, we describe data quality measures to minimize false discoveries. In conclusion, Genalice CNV is highly sensitive and specific, as well as extremely fast, which will be beneficial for clinical detection of CNVs.

Download Full-text

Detection of Significant Copy Number Variations From Multiple Samples in Next-Generation Sequencing Data

IEEE Transactions on NanoBioscience ◽

10.1109/tnb.2017.2783910 ◽

2018 ◽

Vol 17 (1) ◽

pp. 12-20 ◽

Cited By ~ 6

Author(s):

Xiguo Yuan ◽

Junying Zhang ◽

Liying Yang ◽

Jun Bai ◽

Peizhen Fan

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Copy Number Variations ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Multiple Samples ◽

Generation Sequencing

Download Full-text

A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2020.632311 ◽

2021 ◽

Vol 11 ◽

Author(s):

Kun Xie ◽

Ye Tian ◽

Xiguo Yuan

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Clustering Algorithm ◽

Complex Diseases ◽

Next Generation Sequencing Data ◽

Two Dimensional ◽

Next Generation ◽

Density Peak ◽

Ngs Data ◽

Generation Sequencing

Copy number variation (CNV) is a common type of structural variations in human genome and confers biological meanings to human complex diseases. Detection of CNVs is an important step for a systematic analysis of CNVs in medical research of complex diseases. The recent development of next-generation sequencing (NGS) platforms provides unprecedented opportunities for the detection of CNVs at a base-level resolution. However, due to the intrinsic characteristics behind NGS data, accurate detection of CNVs is still a challenging task. In this article, we propose a new density peak-based method, called dpCNV, for the detection of CNVs from NGS data. The algorithm of dpCNV is designed based on density peak clustering algorithm. It extracts two features, i.e., local density and minimum distance, from sequencing read depth (RD) profile and generates a two-dimensional data. Based on the generated data, a two-dimensional null distribution is constructed to test the significance of each genome bin and then the significant genome bins are declared as CNVs. We test the performance of the dpCNV method on a number of simulated datasets and make comparison with several existing methods. The experimental results demonstrate that our proposed method outperforms others in terms of sensitivity and F1-score. We further apply it to a set of real sequencing samples and the results demonstrate the validity of dpCNV. Therefore, we expect that dpCNV can be used as a supplementary to existing methods and may become a routine tool in the field of genome mutation analysis.

Download Full-text

CNV-MEANN: A Neural Network and Mind Evolutionary Algorithm-Based Detection of Copy Number Variations From Next-Generation Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2021.700874 ◽

2021 ◽

Vol 12 ◽

Author(s):

Tihao Huang ◽

Junqing Li ◽

Baoxian Jia ◽

Hongyan Sang

Keyword(s):

Neural Network ◽

Next Generation Sequencing ◽

Evolutionary Algorithm ◽

Copy Number ◽

Copy Number Variations ◽

Detection Methods ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Copy number variation (CNV), is defined as repetitions or deletions of genomic segments of 1 Kb to 5 Mb, and is a major trigger for human disease. The high-throughput and low-cost characteristics of next-generation sequencing technology provide the possibility of the detection of CNVs in the whole genome, and also greatly improve the clinical practicability of next-generation sequencing (NGS) testing. However, current methods for the detection of CNVs are easily affected by sequencing and mapping errors, and uneven distribution of reads. In this paper, we propose an improved approach, CNV-MEANN, for the detection of CNVs, involving changing the structure of the neural network used in the MFCNV method. This method has three differences relative to the MFCNV method: (1) it utilizes a new feature, mapping quality, to replace two features in MFCNV, (2) it considers the influence of the loss categories of CNV on disease prediction, and refines the output structure, and (3) it uses a mind evolutionary algorithm to optimize the backpropagation (neural network) neural network model, and calculates individual scores for each genome bin to predict CNVs. Using both simulated and real datasets, we tested the performance of CNV-MEANN and compared its performance with those of seven widely used CNV detection methods. Experimental results demonstrated that the CNV-MEANN approach outperformed other methods with respect to sensitivity, precision, and F1-score. The proposed method was able to detect many CNVs that other approaches could not, and it reduced the boundary bias. CNV-MEANN is expected to be an effective method for the analysis of changes in CNVs in the genome.

Download Full-text