scholarly journals CrowdVariant: a crowdsourcing approach to classify copy number variants

2016 ◽  
Author(s):  
Peyton Greenside ◽  
Justin M. Zook ◽  
Marc Salit ◽  
Ryan Poplin ◽  
Madeleine Cule ◽  
...  

AbstractCopy number variants (CNVs) are an important type of genetic variation and play a causal role in many diseases. However, they are also notoriously difficult to identify accurately from next-generation sequencing (NGS) data. For larger CNVs, genotyping arrays provide reasonable benchmark data, but NGS allows us to assay a far larger number of small (< 10kbp) CNVs that are poorly captured by array-based methods. The lack of high quality benchmark callsets of small-scale CNVs has limited our ability to assess and improve CNV calling algorithms for NGS data. To address this issue we developed a crowdsourcing framework, called CrowdVariant, that leverages Google’s high-throughput crowdsourcing platform to create a high confidence set of copy number variants for NA24385 (NIST HG002/RM 8391), an Ashkenazim reference sample developed in partnership with the Genome In A Bottle Consortium. In a pilot study we show that crowdsourced classifications, even from non-experts, can be used to accurately assign copy number status to putative CNV calls and thereby identify a high-quality subset of these calls. We then scale our framework genome-wide to identify 1,781 high confidence CNVs, which multiple lines of evidence suggest are a substantial improvement over existing CNV callsets, and are likely to prove useful in benchmarking and improving CNV calling algorithms. Our crowdsourcing methodology may be a useful guide for other genomics applications.

2014 ◽  
Author(s):  
Sean D Smith ◽  
Joseph K Kawash ◽  
Andrey Grigoriev

Amplifications or deletions of genome segments, known as copy number variants (CNVs), have been associated with many diseases. Read depth analysis of next-generation sequencing (NGS) is an essential method of detecting CNVs. However, genome read coverage is frequently distorted by various biases of NGS platforms, which reduce predictive capabilities of existing approaches. Additionally, the use of read depth tools has been somewhat hindered by imprecise breakpoint identification. We developed GROM-RD, an algorithm that analyzes multiple biases in read coverage to detect CNVs in NGS data. We found non-uniform variance across distinct GC regions after using existing GC bias correction methods and developed a novel approach to normalize such variance. Although complex and repetitive genome segments complicate CNV detection, GROM-RD adjusts for repeat bias and uses a two-pipeline masking approach to detect CNVs in complex and repetitive segments while improving sensitivity in less complicated regions. To overcome a typical weakness of RD methods, GROM-RD employs a CNV search using size-varying overlapping windows to improve breakpoint resolution. We compared our method to two widely used programs based on read depth methods, CNVnator and RDXplorer, and observed improved CNV detection and breakpoint accuracy for GROM-RD. GROM-RD is available at http://grigoriev.rutgers.edu/software/


2014 ◽  
Author(s):  
Sean D Smith ◽  
Joseph K Kawash ◽  
Andrey Grigoriev

Amplifications or deletions of genome segments, known as copy number variants (CNVs), have been associated with many diseases. Read depth analysis of next-generation sequencing (NGS) is an essential method of detecting CNVs. However, genome read coverage is frequently distorted by various biases of NGS platforms, which reduce predictive capabilities of existing approaches. Additionally, the use of read depth tools has been somewhat hindered by imprecise breakpoint identification. We developed GROM-RD, an algorithm that analyzes multiple biases in read coverage to detect CNVs in NGS data. We found non-uniform variance across distinct GC regions after using existing GC bias correction methods and developed a novel approach to normalize such variance. Although complex and repetitive genome segments complicate CNV detection, GROM-RD adjusts for repeat bias and uses a two-pipeline masking approach to detect CNVs in complex and repetitive segments while improving sensitivity in less complicated regions. To overcome a typical weakness of RD methods, GROM-RD employs a CNV search using size-varying overlapping windows to improve breakpoint resolution. We compared our method to two widely used programs based on read depth methods, CNVnator and RDXplorer, and observed improved CNV detection and breakpoint accuracy for GROM-RD. GROM-RD is available at http://grigoriev.rutgers.edu/software/


2019 ◽  
Author(s):  
Junhua Rao ◽  
Lihua Peng ◽  
Fang Chen ◽  
Hui Jiang ◽  
Chunyu Geng ◽  
...  

AbstractBackgroundNext-generation sequence (NGS) has rapidly developed in past years which makes whole-genome sequencing (WGS) becoming a more cost- and time-efficient choice in wide range of biological researches. We usually focus on some variant detection via WGS data, such as detection of single nucleotide polymorphism (SNP), insertion and deletion (Indel) and copy number variant (CNV), which playing an important role in many human diseases. However, the feasibility of CNV detection based on WGS by DNBSEQ™ platforms was unclear. We systematically analysed the genome-wide CNV detection power of DNBSEQ™ platforms and Illumina platforms on NA12878 with five commonly used tools, respectively.ResultsDNBSEQ™ platforms showed stable ability to detect slighter more CNVs on genome-wide (average 1.24-fold than Illumina platforms). Then, CNVs based on DNBSEQ™ platforms and Illumina platforms were evaluated with two public benchmarks of NA12878, respectively. DNBSEQ™ and Illumina platforms showed similar sensitivities and precisions on both two benchmarks. Further, the difference between tools for CNV detection was analyzed, and indicated the selection of tool for CNV detection could affected the CNV performance, such as count, distribution, sensitivity and precision.ConclusionThe major contribution of this paper is providing a comprehensive guide for CNV detection based on WGS by DNBSEQ™ platforms for the first time.


2016 ◽  
Vol 15 ◽  
pp. CIN.S36612 ◽  
Author(s):  
Lun-Ching Chang ◽  
Biswajit Das ◽  
Chih-Jian Lih ◽  
Han Si ◽  
Corinne E. Camalier ◽  
...  

With rapid advances in DNA sequencing technologies, whole exome sequencing (WES) has become a popular approach for detecting somatic mutations in oncology studies. The initial intent of WES was to characterize single nucleotide variants, but it was observed that the number of sequencing reads that mapped to a genomic region correlated with the DNA copy number variants (CNVs). We propose a method RefCNV that uses a reference set to estimate the distribution of the coverage for each exon. The construction of the reference set includes an evaluation of the sources of variability in the coverage distribution. We observed that the processing steps had an impact on the coverage distribution. For each exon, we compared the observed coverage with the expected normal coverage. Thresholds for determining CNVs were selected to control the false-positive error rate. RefCNV prediction correlated significantly ( r = 0.96–0.86) with CNV measured by digital polymerase chain reaction for MET (7q31), EGFR (7p12), or ERBB2 (17q12) in 13 tumor cell lines. The genome-wide CNV analysis showed a good overall correlation (Spearman's coefficient = 0.82) between RefCNV estimation and publicly available CNV data in Cancer Cell Line Encyclopedia. RefCNV also showed better performance than three other CNV estimation methods in genome-wide CNV analysis.


2011 ◽  
Vol 17 (4) ◽  
pp. 421-432 ◽  
Author(s):  
L Priebe ◽  
F A Degenhardt ◽  
S Herms ◽  
B Haenisch ◽  
M Mattheisen ◽  
...  

2012 ◽  
Vol 22 (4) ◽  
pp. 816-824 ◽  
Author(s):  
Jade Chapman ◽  
Elliott Rees ◽  
Denise Harold ◽  
Dobril Ivanov ◽  
Amy Gerrish ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document