High throughput crop genome genotyping by a combination of pool next generation sequencing and haplotype-based data processing

Author(s):  
Michael Schneider ◽  
Asis Shrestha ◽  
Agim Ballvora ◽  
Jens Leon

Abstract BackgroundThe identification of environmentally specific alleles and the observation of evolutional processes is a goal of conservation genomics. By generational changes of allele frequencies in populations, questions regarding effective population size, gene flow, drift, and selection can be addressed. The observation of such effects often is a trade-off of costs and resolution, when a decent sample of genotypes should be genotyped for many loci. Pool genotyping approaches can derive a high resolution and precision in allele frequency estimation, when high coverage sequencing is utilized. Still, pool high coverage pool sequencing of big genomes comes along with high costs.ResultsHere we present a reliable method to estimate a barley population’s allele frequency at low coverage sequencing. Three hundred genotypes were sampled from a barley backcross population to estimate the entire population’s allele frequency. The allele frequency estimation accuracy and yield were compared for three next generation sequencing methods. To reveal accurate allele frequency estimates on a low coverage sequencing level, a haplotyping approach was performed. Low coverage allele frequency of positional connected single polymorphisms were aggregated to a single haplotype allele frequency, resulting in two to 271 times higher depth and increased precision. We compared different haplotyping tactics, showing that gene and chip marker-based haplotypes perform on par or better than simple contig haplotype windows. The comparison of multiple pool samples and the referencing against an individual sequencing approach revealed whole genome pool resequencing having the highest correlation to individual genotyping (up to 0.97), while transcriptomics and genotyping by sequencing indicated higher error rates and lower correlations.ConclusionUsing the proposed method allows to identify the allele frequency of populations with high accuracy at low cost. This is particularly interesting for conservation genomics in species with big genomes, like barley or wheat. Whole genome low coverage resequencing at 10x coverage can deliver a highly accurate estimation of the allele frequency, when a loci-based haplotyping approach is applied. Using annotated haplotypes allows to capitalize from biological background and statistical robustness.

2017 ◽  
Vol 16 ◽  
pp. 117693511771923
Author(s):  
Jenna VanLiere Canzoniero ◽  
Karen Cravero ◽  
Ben Ho Park

Barcoding techniques are used to reduce error from next-generation sequencing, with applications ranging from understanding tumor subclone populations to detecting circulating tumor DNA. Collisions occur when more than one sample molecule is tagged by the same unique identifier (UID) and can result in failure to detect very-low-frequency mutations and error in estimating mutation frequency. Here, we created computer models of barcoding technique, with and without amplification bias introduced by the UID, and analyzed the effect of collisions for a range of mutant allele frequencies (1e−6 to 0.2), number of sample molecules (10 000 to 1e7), and number of UIDs (410-414). Inability to detect rare mutant alleles occurred in 0% to 100% of simulations, depending on collisions and number of mutant molecules. Collisions also introduced error in estimating mutant allele frequency resulting in underestimation of minor allele frequency. Incorporating an understanding of the effect of collisions into experimental design can allow for optimization of the number of sample molecules and number of UIDs to minimize the negative impact on rare mutant detection and mutant frequency estimation.


2008 ◽  
Vol 18 (10) ◽  
pp. 1638-1642 ◽  
Author(s):  
D. R. Smith ◽  
A. R. Quinlan ◽  
H. E. Peckham ◽  
K. Makowsky ◽  
W. Tao ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document