scholarly journals Genotyping by low-coverage whole-genome sequencing in intercross pedigrees from outbred founders: a cost efficient approach

2018 ◽  
Author(s):  
Yanjun Zan ◽  
Thibaut Payen ◽  
Mette Lillie ◽  
Christa F. Honaker ◽  
Paul B. Siegel ◽  
...  

ABSTRACTBackgroundExperimental intercrosses between outbred founder populations are powerful resources for mapping loci contributing to complex traits (Quantitative Trait Loci or QTL). Here, we present an approach and accompanying software for high-resolution genotype imputation in such populations using whole-genome high coverage sequence data on founder individuals (∼30×) and low coverage sequence data on intercross individuals (∼0.4×). The method is illustrated in a large F2 pedigree between lines of chickens that have been divergently selected for 40 generations for the same trait (body weight at 8 weeks of age).ResultsDescribed is how hundreds of individuals were whole-genome sequenced in a cost- and time-efficient manner using a Tn5-based library preparation protocol optimized for this application. In total, 7.6M markers segregated in this pedigree and 10.0 to 13.7% were informative for imputing the founder line genotypes within the F0-F2 families. The genotypes imputed from low coverage sequence data were consistent with the founder line genotypes estimated using SNP and microsatellite markers both at individual imputed sites (92%) and across the genome of individual chickens (93%). The resolution of the recombination breakpoints was high with 50% being resolved within <10kb.ConclusionsA method for genotype imputation from low-coverage whole-genome sequencing in outbred intercrosses is described and evaluated. By applying it to an outbred chicken F2 cross it is illustrated that it provides high quality, high-resolution genotypes in a time and cost efficient manner.

2021 ◽  
Author(s):  
Changheng Zhao ◽  
Jun Teng ◽  
Xinhao Zhang ◽  
Dan Wang ◽  
Xinyi Zhang ◽  
...  

Abstract Background Low coverage whole genome sequencing is a low-cost genotyping technology. Combining with genotype imputation approaches, it is likely to become a critical component of cost-efficient genomic selection programs in agricultural livestock. Here, we used the low-coverage sequence data of 617 Dezhou donkeys to investigate the performance of genotype imputation for low coverage whole genome sequence data and genomic selection based on the imputed genotype data. The specific aims were: (i) to measure the accuracy of genotype imputation under different sequencing depths, sample sizes, MAFs, and imputation pipelines; and (ii) to assess the accuracy of genomic selection under different marker densities derived from the imputed sequence data, different strategies for constructing the genomic relationship matrixes, and single- vs multi-trait models. Results We found that a high imputation accuracy (> 0.95) can be achieved for sequence data with sequencing depth as low as 1x and the number of sequenced individuals equal to 400. For genomic selection, the best performance was obtained by using a marker density of 410K and a G matrix constructed using marker dosage information. Multi-trait GBLUP performed better than single-trait GBLUP. Conclusions Our study demonstrates that low coverage whole genome sequencing would be a cost-effective method for genomic selection in Dezhou Donkey.


2019 ◽  
Vol 51 (1) ◽  
Author(s):  
Yanjun Zan ◽  
Thibaut Payen ◽  
Mette Lillie ◽  
Christa F. Honaker ◽  
Paul B. Siegel ◽  
...  

Author(s):  
Marta Byrska-Bishop ◽  
Uday S. Evani ◽  
Xuefang Zhao ◽  
Anna O. Basile ◽  
Haley J. Abel ◽  
...  

ABSTRACTThe 1000 Genomes Project (1kGP), launched in 2008, is the largest fully open resource of whole genome sequencing (WGS) data consented for public distribution of raw sequence data without access or use restrictions. The final (phase 3) 2015 release of 1kGP included 2,504 unrelated samples from 26 populations, representing five continental regions of the world and was based on a combination of technologies including low coverage WGS (mean depth 7.4X), high coverage whole exome sequencing (mean depth 65.7X), and microarray genotyping. Here, we present a new, high coverage WGS resource encompassing the original 2,504 1kGP samples, as well as an additional 698 related samples that result in 602 complete trios in the 1kGP cohort. We sequenced this expanded 1kGP cohort of 3,202 samples to a targeted depth of 30X using Illumina NovaSeq 6000 instruments. We performed SNV/INDEL calling against the GRCh38 reference using GATK’s HaplotypeCaller, and generated a comprehensive set of SVs by integrating multiple analytic methods through a sophisticated machine learning model, upgrading the 1kGP dataset to current state-of-the-art standards. Using this strategy, we defined over 111 million SNVs, 14 million INDELs, and ∼170 thousand SVs across the entire cohort of 3,202 samples with estimated false discovery rate (FDR) of 0.3%, 1.0%, and 1.8%, respectively. By comparison to the low-coverage phase 3 callset, we observed substantial improvements in variant discovery and estimated FDR that were facilitated by high coverage re-sequencing and expansion of the cohort. Specifically, we called 7% more SNVs, 59% more INDELs, and 170% more SVs per genome than the phase 3 callset. Moreover, we leveraged the presence of families in the cohort to achieve superior haplotype phasing accuracy and we demonstrate improvements that the high coverage panel brings especially for INDEL imputation. We make all the data generated as part of this project publicly available and we envision this updated version of the 1kGP callset to become the new de facto public resource for the worldwide scientific community working on genomics and genetics.


Author(s):  
Nikki E. Freed ◽  
Markéta Vlková ◽  
Muhammad B. Faisal ◽  
Olin K. Silander

AbstractRapid and cost-efficient whole-genome sequencing of SARS-CoV-2, the virus that causes COVID-19, is critical for understanding viral transmission dynamics. Here we show that using a new multiplexed set of primers in conjunction with the Oxford Nanopore Rapid Barcode library kit allows for faster, simpler, and less expensive SARS-CoV-2 genome sequencing. This primer set results in amplicons that exhibit lower levels of variation in coverage compared to other commonly used primer sets. Using five SARS-CoV-2 patient samples with Cq values between 20 and 31, we show that high-quality genomes can be generated with as few as 10,000 reads (approximately 5 Mbp of sequence data). We also show that mis-classification of barcodes, which may be more likely when using the Oxford Nanopore Rapid Barcode library prep, is unlikely to cause problems in variant calling. This method reduces the time from RNA to genome sequence by more than half compared to the more standard ligation-based Oxford Nanopore library preparation method at considerably lower costs.


2020 ◽  
Vol 5 (1) ◽  
Author(s):  
Nikki E Freed ◽  
Markéta Vlková ◽  
Muhammad B Faisal ◽  
Olin K Silander

Abstract Rapid and cost-efficient whole-genome sequencing of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes coronavirus disease 2019, is critical for understanding viral transmission dynamics. Here we show that using a new multiplexed set of primers in conjunction with the Oxford Nanopore Rapid Barcode library kit allows for faster, simpler, and less expensive SARS-CoV-2 genome sequencing. This primer set results in amplicons that exhibit lower levels of variation in coverage compared to other commonly used primer sets. Using five SARS-CoV-2 patient samples with Cq values between 20 and 31, we show that high-quality genomes can be generated with as few as 10 000 reads (∼5 Mbp of sequence data). We also show that mis-classification of barcodes, which may be more likely when using the Oxford Nanopore Rapid Barcode library prep, is unlikely to cause problems in variant calling. This method reduces the time from RNA to genome sequence by more than half compared to the more standard ligation-based Oxford Nanopore library preparation method at considerably lower costs.


2019 ◽  
Author(s):  
Ruifei Yang ◽  
Xiaoli Guo ◽  
Di Zhu ◽  
Cheng Bian ◽  
Yiqiang Zhao ◽  
...  

AbstractHigh-density markers discovered in large size samples are essential for mapping complex traits at the gene-level resolution for agricultural livestock and crops. However, the unavailability of large reference panels and array designs for a target population of agricultural species limits the improvement of array-based genotype imputation. Recent studies showed very low coverage sequencing (LCS) of a large number of individuals is a cost-effective approach to discover variations in much greater detail in association studies. Here, we performed cohort-wide whole-genome sequencing at an average depth of 0.73× and identified more than 11.3 M SNPs. We also evaluated the data set and performed genome-wide association analysis (GWAS) in 2885 Duroc boars. We compared two different pipelines and selected a proper method (BaseVar/STITCH) for LCS analyses and determined that sequencing of 1000 individuals with 0.2× depth is enough for identifying SNPs with high accuracy in this population. Of the seven association signals derived from the genome-wide association analysis of the LCS variants, which were associated with four economic traits, we found two QTLs with narrow intervals were possibly responsible for the teat number and back fat thickness traits and identified 7 missense variants in a single sequencing step. This strategy (BaseVar/STITCH) is generally applicable to any populations and any species which have no suitable reference panels. These findings show that the LCS strategy is a proper approach for the construction of new genetic resources to facilitate genome-wide association studies, fine mapping of QTLs, and genomic selection, and implicate that it can be widely used for agricultural animal breeding in the future.


2022 ◽  
Vol 12 ◽  
Author(s):  
Tianyu Deng ◽  
Pengfei Zhang ◽  
Dorian Garrick ◽  
Huijiang Gao ◽  
Lixian Wang ◽  
...  

Genotype imputation is the term used to describe the process of inferring unobserved genotypes in a sample of individuals. It is a key step prior to a genome-wide association study (GWAS) or genomic prediction. The imputation accuracy will directly influence the results from subsequent analyses. In this simulation-based study, we investigate the accuracy of genotype imputation in relation to some factors characterizing SNP chip or low-coverage whole-genome sequencing (LCWGS) data. The factors included the imputation reference population size, the proportion of target markers /SNP density, the genetic relationship (distance) between the target population and the reference population, and the imputation method. Simulations of genotypes were based on coalescence theory accounting for the demographic history of pigs. A population of simulated founders diverged to produce four separate but related populations of descendants. The genomic data of 20,000 individuals were simulated for a 10-Mb chromosome fragment. Our results showed that the proportion of target markers or SNP density was the most critical factor affecting imputation accuracy under all imputation situations. Compared with Minimac4, Beagle5.1 reproduced higher-accuracy imputed data in most cases, more notably when imputing from the LCWGS data. Compared with SNP chip data, LCWGS provided more accurate genotype imputation. Our findings provided a relatively comprehensive insight into the accuracy of genotype imputation in a realistic population of domestic animals.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sung Yong Park ◽  
Gina Faraci ◽  
Pamela M. Ward ◽  
Jane F. Emerson ◽  
Ha Youn Lee

AbstractCOVID-19 global cases have climbed to more than 33 million, with over a million total deaths, as of September, 2020. Real-time massive SARS-CoV-2 whole genome sequencing is key to tracking chains of transmission and estimating the origin of disease outbreaks. Yet no methods have simultaneously achieved high precision, simple workflow, and low cost. We developed a high-precision, cost-efficient SARS-CoV-2 whole genome sequencing platform for COVID-19 genomic surveillance, CorvGenSurv (Coronavirus Genomic Surveillance). CorvGenSurv directly amplified viral RNA from COVID-19 patients’ Nasopharyngeal/Oropharyngeal (NP/OP) swab specimens and sequenced the SARS-CoV-2 whole genome in three segments by long-read, high-throughput sequencing. Sequencing of the whole genome in three segments significantly reduced sequencing data waste, thereby preventing dropouts in genome coverage. We validated the precision of our pipeline by both control genomic RNA sequencing and Sanger sequencing. We produced near full-length whole genome sequences from individuals who were COVID-19 test positive during April to June 2020 in Los Angeles County, California, USA. These sequences were highly diverse in the G clade with nine novel amino acid mutations including NSP12-M755I and ORF8-V117F. With its readily adaptable design, CorvGenSurv grants wide access to genomic surveillance, permitting immediate public health response to sudden threats.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 81-82
Author(s):  
Joaquim Casellas ◽  
Melani Martín de Hijas-Villalba ◽  
Marta Vázquez-Gómez ◽  
Samir Id Lahoucine

Abstract Current European regulations for autochthonous livestock breeds put a special emphasis on pedigree completeness, which requires laboratory paternity testing by genetic markers in most cases. This entails significant economic expenditure for breed societies and precludes other investments in breeding programs, such as genomic evaluation. Within this context, we developed paternity testing through low-coverage whole-genome data in order to reuse these data for genomic evaluation at no cost. Simulations relied on diploid genomes composed by 30 chromosomes (100 cM each) with 3,000,000 SNP per chromosome. Each population evolved during 1,000 non-overlapping generations with effective size 100, mutation rate 10–4, and recombination by Kosambi’s function. Only those populations with 1,000,000 ± 10% polymorphic SNP per chromosome in generation 1,000 were retained for further analyses, and expanded to the required number of parents and offspring. Individuals were sequenced at 0.01, 0.05, 0.1, 0.5 and 1X depth, with 100, 500, 1,000 or 10,000 base-pair reads and by assuming a random sequencing error rate per SNP between 10–2 and 10–5. Assuming known allele frequencies in the population and sequencing error rate, 0.05X depth sufficed to corroborate the true father (85,0%) and to discard other candidates (96,3%). Those percentages increased up to 99,6% and 99,9% with 0,1X depth, respectively (read length = 10,000 bp; smaller read lengths slightly improved the results because they increase the number of sequenced SNP). Results were highly sensitive to biases in allele frequencies and robust to inaccuracies regarding sequencing error rate. Low-coverage whole-genome sequencing data could be subsequently integrated into genomic BLUP equations by appropriately constructing the genomic relationship matrix. This approach increased the correlation between simulated and predicted breeding values by 1.21% (h2 = 0.25; 100 parents and 900 offspring; 0.1X depth by 10,000 bp reads). Although small, this increase opens the door to genomic evaluation in local livestock breeds.


Sign in / Sign up

Export Citation Format

Share Document