scholarly journals Detection of copy-number variations from NGS data using read depth information: a diagnostic performance evaluation

2020 ◽  
Vol 29 (1) ◽  
pp. 99-109 ◽  
Author(s):  
Olivier Quenez ◽  
◽  
Kevin Cassinari ◽  
Sophie Coutant ◽  
François Lecoquierre ◽  
...  
2017 ◽  
Author(s):  
Hui Yang ◽  
Gary Chen ◽  
Leandro Lima ◽  
Han Fang ◽  
Laura Jimenez ◽  
...  

ABSTRACTBACKGROUNDWhole-genome sequencing (WGS) data may be used to identify copy number variations (CNVs). Existing CNV detection methods mostly rely on read depth or alignment characteristics (paired-end distance and split reads) to infer gains/losses, while neglecting allelic intensity ratios and cannot quantify copy numbers. Additionally, most CNV callers are not scalable to handle a large number of WGS samples.METHODSTo facilitate large-scale and rapid CNV detection from WGS data, we developed a Dynamic Programming Imputation (DPI) based algorithm called HadoopCNV, which infers copy number changes through both allelic frequency and read depth information. Our implementation is built on the Hadoop framework, enabling multiple compute nodes to work in parallel.RESULTSCompared to two widely used tools – CNVnator and LUMPY, HadoopCNV has similar or better performance on both simulated data sets and real data on the NA12878 individual. Additionally, analysis on a 10-member pedigree showed that HadoopCNV has a Mendelian precision that is similar or better than other tools. Furthermore, HadoopCNV can accurately infer loss of heterozygosity (LOH), while other tools cannot. HadoopCNV requires only 1.6 hours for a human genome with 30X coverage, on a 32-node cluster, with a linear relationship between speed improvement and the number of nodes. We further developed a method to combine HadoopCNV and LUMPY result, and demonstrated that the combination resulted in better performance than any individual tools.CONCLUSIONSThe combination of high-resolution, allele-specific read depth from WGS data and Hadoop framework can result in efficient and accurate detection of CNVs.


Author(s):  
Kun Xie ◽  
Kang Liu ◽  
Haque A K Alvi ◽  
Yuehui Chen ◽  
Shuzhen Wang ◽  
...  

Copy number variation (CNV) is a well-known type of genomic mutation that is associated with the development of human cancer diseases. Detection of CNVs from the human genome is a crucial step for the pipeline of starting from mutation analysis to cancer disease diagnosis and treatment. Next-generation sequencing (NGS) data provides an unprecedented opportunity for CNVs detection at the base-level resolution, and currently, many methods have been developed for CNVs detection using NGS data. However, due to the intrinsic complexity of CNVs structures and NGS data itself, accurate detection of CNVs still faces many challenges. In this paper, we present an alternative method, called KNNCNV (K-Nearest Neighbor based CNV detection), for the detection of CNVs using NGS data. Compared to current methods, KNNCNV has several distinctive features: 1) it assigns an outlier score to each genome segment based solely on its first k nearest-neighbor distances, which is not only easy to extend to other data types but also improves the power of discovering CNVs, especially the local CNVs that are likely to be masked by their surrounding regions; 2) it employs the variational Bayesian Gaussian mixture model (VBGMM) to transform these scores into a series of binary labels without a user-defined threshold. To evaluate the performance of KNNCNV, we conduct both simulation and real sequencing data experiments and make comparisons with peer methods. The experimental results show that KNNCNV could derive better performance than others in terms of F1-score.


2014 ◽  
Author(s):  
Sean D Smith ◽  
Joseph K Kawash ◽  
Andrey Grigoriev

Amplifications or deletions of genome segments, known as copy number variants (CNVs), have been associated with many diseases. Read depth analysis of next-generation sequencing (NGS) is an essential method of detecting CNVs. However, genome read coverage is frequently distorted by various biases of NGS platforms, which reduce predictive capabilities of existing approaches. Additionally, the use of read depth tools has been somewhat hindered by imprecise breakpoint identification. We developed GROM-RD, an algorithm that analyzes multiple biases in read coverage to detect CNVs in NGS data. We found non-uniform variance across distinct GC regions after using existing GC bias correction methods and developed a novel approach to normalize such variance. Although complex and repetitive genome segments complicate CNV detection, GROM-RD adjusts for repeat bias and uses a two-pipeline masking approach to detect CNVs in complex and repetitive segments while improving sensitivity in less complicated regions. To overcome a typical weakness of RD methods, GROM-RD employs a CNV search using size-varying overlapping windows to improve breakpoint resolution. We compared our method to two widely used programs based on read depth methods, CNVnator and RDXplorer, and observed improved CNV detection and breakpoint accuracy for GROM-RD. GROM-RD is available at http://grigoriev.rutgers.edu/software/


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12564
Author(s):  
Taifu Wang ◽  
Jinghua Sun ◽  
Xiuqing Zhang ◽  
Wen-Jing Wang ◽  
Qing Zhou

Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.


2018 ◽  
Author(s):  
Whitney Whitford ◽  
Klaus Lehnert ◽  
Russell G. Snell ◽  
Jessie C. Jacobsen

AbstractBackgroundThe popularisation and decreased cost of genome resequencing has resulted in an increased use in molecular diagnostics. While there are a number of established and high quality bioinfomatic tools for identifying small genetic variants including single nucleotide variants and indels, currently there is no established standard for the detection of copy number variants (CNVs) from sequence data. The requirement for CNV detection from high throughput sequencing has resulted in the development of a large number of software packages. These tools typically utilise the sequence data characteristics: read depth, split reads, read pairs, and assembly-based techniques. However the additional source of information from read balance, defined as relative proportion of reads of each allele at each position, has been underutilised in the existing applications.ResultsWe present Read Balance Validator (RBV), a bioinformatic tool which uses read balance for prioritisation and validation of putative CNVs. The software simultaneously interrogates nominated regions for the presence of deletions or multiplications, and can differentiate larger CNVs from diploid regions. Additionally, the utility of RBV to test for inheritance of CNVs is demonstrated in this report.ConclusionsRBV is a CNV validation and prioritisation bioinformatic tool for both genome and exome sequencing available as a python package from https://github.com/whitneywhitford/RBV


2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Shin-ya Nishio ◽  
Shin-ichi Usami

AbstractThe STRC gene, located on chromosome 15q15.3, is one of the genetic causes of autosomal recessive mild-to-moderate sensorineural hearing loss. One of the unique characteristics of STRC-associated hearing loss is the high prevalence of long deletions or copy number variations observed on chromosome 15q15.3. Further, the deletion of chromosome 15q15.3 from STRC to CATSPER2 is also known to be a genetic cause of deafness infertility syndrome (DIS), which is associated with not only hearing loss but also male infertility, as CATSPER2 plays crucial roles in sperm motility. Thus, information regarding the deletion range for each patient is important to the provision of appropriate genetic counselling for hearing loss and male infertility. In the present study, we performed next-generation sequencing (NGS) analysis for 9956 Japanese hearing loss patients and analyzed copy number variations in the STRC gene based on NGS read depth data. In addition, we performed Multiplex Ligation-dependent Probe Amplification analysis to determine the deletion range including the PPIP5K1, CKMT1B, STRC and CATSPER2 genomic region to estimate the prevalence of the STRC-CATSPER deletion, which is causative for DIS among the STRC-associated hearing loss patients. As a result, we identified 276 cases with STRC-associated hearing loss. The prevalence of STRC-associated hearing loss in Japanese hearing loss patients was 2.77% (276/9956). In addition, 77.1% of cases with STRC homozygous deletions carried a two copy loss of the entire CKMT1B-STRC-CATSPER2 gene region. This information will be useful for the provision of more appropriate genetic counselling regarding hearing loss and male infertility for the patients with a STRC deletion.


2021 ◽  
Vol 12 ◽  
Author(s):  
Guojun Liu ◽  
Junying Zhang

The next-generation sequencing technology offers a wealth of data resources for the detection of copy number variations (CNVs) at a high resolution. However, it is still challenging to correctly detect CNVs of different lengths. It is necessary to develop new CNV detection tools to meet this demand. In this work, we propose a new CNV detection method, called CBCNV, for the detection of CNVs of different lengths from whole genome sequencing data. CBCNV uses a clustering algorithm to divide the read depth segment profile, and assigns an abnormal score to each read depth segment. Based on the abnormal score profile, Tukey’s fences method is adopted in CBCNV to forecast CNVs. The performance of the proposed method is evaluated on simulated data sets, and is compared with those of several existing methods. The experimental results prove that the performance of CBCNV is better than those of several existing methods. The proposed method is further tested and verified on real data sets, and the experimental results are found to be consistent with the simulation results. Therefore, the proposed method can be expected to become a routine tool in the analysis of CNVs from tumor-normal matched samples.


2021 ◽  
Vol 5 (7) ◽  
pp. 1991-2002
Author(s):  
Lieselot Buedts ◽  
Iwona Wlodarska ◽  
Julio Finalet-Ferreiro ◽  
Olivier Gheysens ◽  
Luc Dehaspe ◽  
...  

Abstract The low abundance of Hodgkin/Reed-Sternberg (HRS) cells in lymph node biopsies in classical Hodgkin lymphoma (cHL) complicates the analysis of somatic genetic alterations in HRS cells. As circulating cell-free DNA (cfDNA) contains circulating tumor DNA (ctDNA) from HRS cells, we prospectively collected cfDNA from 177 patients with newly diagnosed, mostly early-stage cHL in a monocentric study at Leuven, Belgium (n = 59) and the multicentric BREACH study by Lymphoma Study Association (n = 118). To catalog the patterns and frequencies of genomic copy number aberrations (CNAs), cfDNA was sequenced at low coverage (0.26×), and data were analyzed with ichorCNA to yield read depth-based copy number profiles and estimated clonal fractions in cfDNA. At diagnosis, the cfDNA concentration, estimated clonal fraction, and ctDNA concentration were significantly higher in cHL cases than controls. More than 90% of patients exhibited CNAs in cfDNA. The most frequent gains encompassed 2p16 (69%), 5p14 (50%), 12q13 (50%), 9p24 (50%), 5q (44%), 17q (43%), 2q (41%). Losses mostly affected 13q (57%), 6q25-q27 (55%), 4q35 (50%), 11q23 (44%), 8p21 (43%). In addition, we identified loss of 3p13-p26 and of 12q21-q24 and gain of 15q21-q26 as novel recurrent CNAs in cHL. At diagnosis, ctDNA concentration was associated with advanced disease, male sex, extensive nodal disease, elevated erythrocyte sedimentation rate, metabolic tumor volume, and HRS cell burden. CNAs and ctDNA rapidly diminished upon treatment initiation, and persistence of CNAs was associated with increased probability of relapse. This study endorses the development of ctDNA as gateway to the HRS genome and substrate for early disease response evaluation.


2014 ◽  
Author(s):  
Sean D Smith ◽  
Joseph K Kawash ◽  
Andrey Grigoriev

Amplifications or deletions of genome segments, known as copy number variants (CNVs), have been associated with many diseases. Read depth analysis of next-generation sequencing (NGS) is an essential method of detecting CNVs. However, genome read coverage is frequently distorted by various biases of NGS platforms, which reduce predictive capabilities of existing approaches. Additionally, the use of read depth tools has been somewhat hindered by imprecise breakpoint identification. We developed GROM-RD, an algorithm that analyzes multiple biases in read coverage to detect CNVs in NGS data. We found non-uniform variance across distinct GC regions after using existing GC bias correction methods and developed a novel approach to normalize such variance. Although complex and repetitive genome segments complicate CNV detection, GROM-RD adjusts for repeat bias and uses a two-pipeline masking approach to detect CNVs in complex and repetitive segments while improving sensitivity in less complicated regions. To overcome a typical weakness of RD methods, GROM-RD employs a CNV search using size-varying overlapping windows to improve breakpoint resolution. We compared our method to two widely used programs based on read depth methods, CNVnator and RDXplorer, and observed improved CNV detection and breakpoint accuracy for GROM-RD. GROM-RD is available at http://grigoriev.rutgers.edu/software/


Sign in / Sign up

Export Citation Format

Share Document