scholarly journals Whole-genome sequencing analysis of copy number variation (CNV) using low-coverage and paired-end strategies is efficient and outperforms array-based CNV analysis

2017 ◽  
Author(s):  
Bo Zhou ◽  
Steve S. Ho ◽  
Xianglong Zhang ◽  
Reenal Pattni ◽  
Rajini R. Haraksingh ◽  
...  

ABSTRACTBackgroundCNV analysis is an integral component to the study of human genomes in both research and clinical settings. Array-based CNV analysis is the current first-tier approach in clinical cytogenetics. Decreasing costs in high-throughput sequencing and cloud computing have opened doors for the development of sequencing-based CNV analysis pipelines with fast turnaround times. We carry out a systematic and quantitative comparative analysis for several low-coverage whole-genome sequencing (WGS) strategies to detect CNV in the human genome.MethodsWe compared the CNV detection capabilities of WGS strategies (short-insert, 3kb-, and 5kb-insert mate-pair) each at 1x, 3x, and 5x coverages relative to each other and to 17 currently used high-density oligonucleotide arrays. For benchmarking, we used a set of Gold Standard (GS) CNVs generated for the 1000-Genomes-Project CEU subject NA12878.ResultsOverall, low-coverage WGS strategies detect drastically more GS CNVs compared to arrays and are accompanied with smaller percentages of CNV calls without validation. Furthermore, we show that WGS (at ≥1x coverage) is able to detect all seven GS deletion-CNVs >100 kb in NA12878 whereas only one is detected by most arrays. Lastly, we show that the much larger 15 Mbp Cri-du-chat deletion can be readily detected with short-insert paired-end WGS at even just 1x coverage.ConclusionsCNV analysis using low-coverage WGS is efficient and outperforms the array-based analysis that is currently used for clinical cytogenetics.


2018 ◽  
Vol 55 (11) ◽  
pp. 735-743 ◽  
Author(s):  
Bo Zhou ◽  
Steve S Ho ◽  
Xianglong Zhang ◽  
Reenal Pattni ◽  
Rajini R Haraksingh ◽  
...  

BackgroundCopy number variation (CNV) analysis is an integral component of the study of human genomes in both research and clinical settings. Array-based CNV analysis is the current first-tier approach in clinical cytogenetics. Decreasing costs in high-throughput sequencing and cloud computing have opened doors for the development of sequencing-based CNV analysis pipelines with fast turnaround times. We carry out a systematic and quantitative comparative analysis for several low-coverage whole-genome sequencing (WGS) strategies to detect CNV in the human genome.MethodsWe compared the CNV detection capabilities of WGS strategies (short insert, 3 kb insert mate pair and 5 kb insert mate pair) each at 1×, 3× and 5× coverages relative to each other and to 17 currently used high-density oligonucleotide arrays. For benchmarking, we used a set of gold standard (GS) CNVs generated for the 1000 Genomes Project CEU subject NA12878.ResultsOverall, low-coverage WGS strategies detect drastically more GS CNVs compared with arrays and are accompanied with smaller percentages of CNV calls without validation. Furthermore, we show that WGS (at ≥1× coverage) is able to detect all seven GS deletion CNVs >100 kb in NA12878, whereas only one is detected by most arrays. Lastly, we show that the much larger 15 Mbp Cri du chat deletion can be readily detected with short-insert paired-end WGS at even just 1× coverage.ConclusionsCNV analysis using low-coverage WGS is efficient and outperforms the array-based analysis that is currently used for clinical cytogenetics.







2018 ◽  
Vol 16 (9) ◽  
pp. 1644-1653 ◽  
Author(s):  
Sneha Datta ◽  
Joanna Jankowicz-Cieslak ◽  
Stephan Nielen ◽  
Ivan Ingelbrecht ◽  
Bradley J. Till






2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sung Yong Park ◽  
Gina Faraci ◽  
Pamela M. Ward ◽  
Jane F. Emerson ◽  
Ha Youn Lee

AbstractCOVID-19 global cases have climbed to more than 33 million, with over a million total deaths, as of September, 2020. Real-time massive SARS-CoV-2 whole genome sequencing is key to tracking chains of transmission and estimating the origin of disease outbreaks. Yet no methods have simultaneously achieved high precision, simple workflow, and low cost. We developed a high-precision, cost-efficient SARS-CoV-2 whole genome sequencing platform for COVID-19 genomic surveillance, CorvGenSurv (Coronavirus Genomic Surveillance). CorvGenSurv directly amplified viral RNA from COVID-19 patients’ Nasopharyngeal/Oropharyngeal (NP/OP) swab specimens and sequenced the SARS-CoV-2 whole genome in three segments by long-read, high-throughput sequencing. Sequencing of the whole genome in three segments significantly reduced sequencing data waste, thereby preventing dropouts in genome coverage. We validated the precision of our pipeline by both control genomic RNA sequencing and Sanger sequencing. We produced near full-length whole genome sequences from individuals who were COVID-19 test positive during April to June 2020 in Los Angeles County, California, USA. These sequences were highly diverse in the G clade with nine novel amino acid mutations including NSP12-M755I and ORF8-V117F. With its readily adaptable design, CorvGenSurv grants wide access to genomic surveillance, permitting immediate public health response to sudden threats.



2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 81-82
Author(s):  
Joaquim Casellas ◽  
Melani Martín de Hijas-Villalba ◽  
Marta Vázquez-Gómez ◽  
Samir Id Lahoucine

Abstract Current European regulations for autochthonous livestock breeds put a special emphasis on pedigree completeness, which requires laboratory paternity testing by genetic markers in most cases. This entails significant economic expenditure for breed societies and precludes other investments in breeding programs, such as genomic evaluation. Within this context, we developed paternity testing through low-coverage whole-genome data in order to reuse these data for genomic evaluation at no cost. Simulations relied on diploid genomes composed by 30 chromosomes (100 cM each) with 3,000,000 SNP per chromosome. Each population evolved during 1,000 non-overlapping generations with effective size 100, mutation rate 10–4, and recombination by Kosambi’s function. Only those populations with 1,000,000 ± 10% polymorphic SNP per chromosome in generation 1,000 were retained for further analyses, and expanded to the required number of parents and offspring. Individuals were sequenced at 0.01, 0.05, 0.1, 0.5 and 1X depth, with 100, 500, 1,000 or 10,000 base-pair reads and by assuming a random sequencing error rate per SNP between 10–2 and 10–5. Assuming known allele frequencies in the population and sequencing error rate, 0.05X depth sufficed to corroborate the true father (85,0%) and to discard other candidates (96,3%). Those percentages increased up to 99,6% and 99,9% with 0,1X depth, respectively (read length = 10,000 bp; smaller read lengths slightly improved the results because they increase the number of sequenced SNP). Results were highly sensitive to biases in allele frequencies and robust to inaccuracies regarding sequencing error rate. Low-coverage whole-genome sequencing data could be subsequently integrated into genomic BLUP equations by appropriately constructing the genomic relationship matrix. This approach increased the correlation between simulated and predicted breeding values by 1.21% (h2 = 0.25; 100 parents and 900 offspring; 0.1X depth by 10,000 bp reads). Although small, this increase opens the door to genomic evaluation in local livestock breeds.



Sign in / Sign up

Export Citation Format

Share Document