genotyping error
Recently Published Documents


TOTAL DOCUMENTS

90
(FIVE YEARS 8)

H-INDEX

25
(FIVE YEARS 0)

2021 ◽  
pp. gr.275579.121
Author(s):  
Daniel P Cooke ◽  
David C Wedge ◽  
Gerton Lunter

Genotyping from sequencing is the basis of emerging strategies in the molecular breeding of polyploid plants. However, compared with the situation for diploids, where genotyping accuracies are confidently determined with comprehensive benchmarks, polyploids have been neglected; there are no benchmarks measuring genotyping error rates for small variants using real sequencing reads. We previously introduced a variant calling method - Octopus - that accurately calls germline variants in diploids and somatic mutations in tumors. Here, we evaluate Octopus and other popular tools on whole-genome tetraploid and hexaploid datasets created using in silico mixtures of diploid Genome In a Bottle (GIAB) samples. We find that genotyping errors are abundant for typical sequencing depths, but that Octopus makes 25% fewer errors than other methods on average. We supplement our benchmarks with concordance analysis in real autotriploid banana datasets.



2021 ◽  
Author(s):  
Yilei Huang ◽  
Harald Ringbauer

Human ancient DNA (aDNA) studies have surged in recent years, revolutionizing the study of the human past. Typically, aDNA is preserved poorly, making such data prone to contamination from other human DNA. Therefore, it is important to rule out substantial contamination before proceeding to downstream analysis. As most aDNA samples can only be sequenced to low coverages (<1x average depth), computational methods that can robustly estimate contamination in the low coverage regime are needed. However, the ultra low-coverage regime (0.1x and below) remains a challenging task for existing approaches. We present a new method to estimate contamination in aDNA for male individuals. It utilizes a Li&Stephen's haplotype copying model for haploid X chromosomes, with mismatches modelled as genotyping error or contamination. We assessed an implementation of this new approach, hapCon, on simulated and down-sampled empirical aDNA data. Our results demonstrate that hapCon outperforms a commonly used tool for estimating male X contamination (ANGSD), with substantially lower variance and narrower confidence intervals, especially in the low coverage regime. We found that hapCon provides useful contamination estimates for coverages as low as 0.1x for SNP capture data (1240k) and 0.02x for whole genome sequencing data (WGS), substantially extending the coverage limit of previous male X chromosome based contamination estimation methods.



Author(s):  
Russ Jasper ◽  
Tegan Krista McDonald ◽  
Pooja Singh ◽  
Mengmeng Lu ◽  
Clément Rougeux ◽  
...  

The use of NGS datasets has increased dramatically over the last decade, however, there have been few systematic analyses quantifying the accuracy of the commonly used variant caller programs. Here we used a familial design consisting of diploid tissue from a single Pinus contorta parent and the maternally derived haploid tissue from 106 full-sibling offspring, where mismatches could only arise due to mutation or bioinformatic error. Given the rarity of mutation, we used the rate of mismatches between parent and offspring genotype calls to infer the SNP genotyping error rates of FreeBayes, HaplotypeCaller, SAMtools, UnifiedGenotyper, and VarScan. With baseline filtering HaplotypeCaller and UnifiedGenotyper yielded one to two orders of magnitude larger numbers of SNPs and error rates, whereas FreeBayes, SAMtools and VarScan yielded lower numbers of SNPs and more modest error rates. To facilitate comparison between variant callers we standardized each SNP set to the same number of SNPs using additional filtering, where UnifiedGenotyper consistently produced the smallest proportion of genotype errors, followed by HaplotypeCaller, VarScan, SAMtools, and FreeBayes. Additionally, we found that error rates were minimized for SNPs called by more than one variant caller. Finally, we evaluated the performance of various commonly used filtering metrics on SNP calling. Our analysis provides a quantitative assessment of the accuracy of five widely used variant calling programs and offers valuable insights into both the choice of variant caller program and the choice of filtering metrics, especially for researchers using non-model study systems.



Author(s):  
Russ Jasper ◽  
Tegan Krista McDonald ◽  
Pooja Singh ◽  
Menhmeng Lu ◽  
Clément Rougeux ◽  
...  

The use of NGS datasets has increased dramatically over the last decade, however, there have been few systematic analyses quantifying the accuracy of the commonly used variant caller programs. Here we used a familial design consisting of diploid tissue from a single Pinus contorta parent and the maternally derived haploid tissue from 106 full-sibling offspring, where mismatches could only arise due to mutation or bioinformatic error. Given the rarity of mutation, we used the rate of mismatches between parent and offspring genotype calls to infer the SNP genotyping error rates of FreeBayes, HaplotypeCaller, SAMtools, UnifiedGenotyper, and VarScan. With baseline filtering HaplotypeCaller and UnifiedGenotyper yielded one to two orders of magnitude larger numbers of SNPs and error rates, whereas FreeBayes, SAMtools and VarScan yielded lower numbers of SNPs and more modest error rates. To facilitate comparison between variant callers we standardized each SNP set to the same number of SNPs using additional filtering, where UnifiedGenotyper consistently produced the smallest proportion of genotype errors, followed by HaplotypeCaller, VarScan, SAMtools, and FreeBayes. Additionally, we found that error rates were minimized for SNPs called by more than one variant caller. Finally, we evaluated the performance of various commonly used filtering metrics on SNP calling. Our analysis provides a quantitative assessment of the accuracy of five widely used variant calling programs and offers valuable insights into both the choice of variant caller program and the choice of filtering metrics, especially for researchers using non-model study systems.



2021 ◽  
Author(s):  
Luis Gomez-Raya ◽  
Emilio Izquierdo ◽  
Eduardo Mercado de la Peña ◽  
Fabian Garcia-Ruiz ◽  
Wendy Mercedes Rauw

Abstract Background Two individuals with a first-degree relationship share about 50 percent of their alleles. Parent-offspring relationships cannot be homozygotes for alternative alleles (genetic exclusion). Methods Applying the concept of genetic exclusion to HD arrays typed in animals for experimental purposes or genomic selection allows estimation of the rate of rejection of first-degree relationships as the rate at which two individuals typed for a large number of SNPs do not share at least one allele. An Expectation–Maximization algorithm is applied to estimate parentage. In addition, genotyping errors are estimated in true parent-offspring relationships due to the large number of SNPs. Nine candidate Duroc sires and 55 Iberian dams producing 214 Duroc × Iberian barrows were typed for the HD porcine Affymetrix array. Results We were able to establish paternity and maternity of 75 and 86 piglets, respectively. A lower bound of the genotyping error of 0.003345 was estimated based on the rate of rejection of true parent-offspring relationships among autosomal SNPs. The true genotyping error is estimated to be between twice and three times the average of the rate of rejection observed in true relationships, i.e., between approximately 0.0067 and 0.0100. A total of 8,558 SNPs were rejected in six or more true parent-offspring relationships facilitating identification of “problematic” SNPs with inconsistent inheritance. Conclusions This study shows that animal experiments and routine genotyping in genomic selection allow to establish or to verify first-degree relationships as well as to estimate genotyping errors for each batch of animals or experiment.



2021 ◽  
Author(s):  
Mohamed Thani Ibouroi ◽  
Ali Cheha ◽  
Aurelien Besnard

Noninvasive genetic sampling techniques are useful tools for providing genetic data that are crucially needed for determining suitable conservation actions. Yet these methods may be highly unreliable in certain situations for instance, when working with faecal samples of frugivorous species in tropical areas. In this study, we tested the applicability of noninvasive genetic sampling on two Comoro Islands flying fox species: Pteropus livingstonii and P.seychellensis comorensis in order to optimize the sampling and laboratory process. Both mitochondrial (mtDNA) and microsatellite markers were tested using two common faeces conservation protocols (ethanol and silica gel), and the polymerase chain reaction (PCR) success and genotyping error rates were assessed. The average proportion of mtDNA PCRs positive results was 55% for P.livingstonii and 38% for P.s.comorensis, and higher amplification success was obtained for samples preserved in ethanol as compared to silica gel. The average genotyping success rate was high (74% for P.livingstonii and 95% for P.s.comorensis) and the genotyping error rate was low for both species. Despite our results confirm the effectiveness of using noninvasive genetic sampling methods to study flying fox species, the protocol we used can be optimized to provide higher efficiency. Some recommendations related to field sampling protocols and laboratory methods are proposed in order to optimize amplification rate and minimize genotyping errors.



2021 ◽  
Author(s):  
Daniel P Cooke ◽  
David C Wedge ◽  
Gerton Lunter

Genotyping from sequencing is the basis of emerging strategies in the molecular breeding of polyploid plants. However, compared with the situation for diploids, where genotyping accuracies are confidently determined with comprehensive benchmarks, polyploids have been neglected; there are no benchmarks measuring genotyping error rates for small variants using real sequencing reads. We previously introduced a variant calling method – Octopus – that accurately calls germline variants in diploids and somatic mutations in tumors. Here, we evaluate Octopus and other popular tools on whole-genome tetraploid and hexaploid datasets created using in silico mixtures of diploid Genome In a Bottle samples. We find that genotyping errors are abundant for typical sequencing depths, but that Octopus makes 25% fewer errors than other methods on average. We supplement our benchmarks with concordance analysis in real autotriploid banana datasets.



2020 ◽  
Author(s):  
Lin Zhang ◽  
Lei Sun

AbstractIn a case-control association study, deviation from Hardy-Weinberg equilibrium (HWE) or Hardy-Weinberg dis-equilibrium (HWD) in the control group is usually considered as evidence for potential genotyping error, and the corresponding SNP is then removed from the study. On the other hand, assuming HWE holds in the study population, a truly associated SNP is expected to be out of HWE in the case group. Efforts have been made in combining association tests with tests of HWE in the cases to increase the power of detecting disease susceptibility loci (Song and Elston (2006), Wang and Shete (2010)). However, these existing methods are ad-hoc and sensitive to model assumptions. Utilizing the recent robust allele-based (RA) regression model for conducting allelic association tests (Zhang and Sun (2020)), here we propose a joint RA test that naturally integrates association evidence from the traditional association test and a test that evaluates the difference in HWD between the case and control groups. The proposed test is robust to genotyping error, as well as to potential HWD in the population attributed to factors that are unrelated to phenotype-genotype association. We provide the asymptotic distribution of the proposed test statistic so that it is easy to implement, and we demonstrate the accuracy and efficiency of the test through extensive simulation studies and an application.



2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Nan Wang ◽  
Yibing Yuan ◽  
Hui Wang ◽  
Diansi Yu ◽  
Yubo Liu ◽  
...  

Abstract Genotyping-by-Sequencing (GBS) is a low-cost, high-throughput genotyping method that relies on restriction enzymes to reduce genome complexity. GBS is being widely used for various genetic and breeding applications. In the present study, 2240 individuals from eight maize populations, including two association populations (AM), backcross first generation (BC1), BC1F2, F2, double haploid (DH), intermated B73 × Mo17 (IBM), and a recombinant inbred line (RIL) population, were genotyped using GBS. A total of 955,120 of raw data for SNPs was obtained for each individual, with an average genotyping error of 0.70%. The rate of missing genotypic data for these SNPs was related to the level of multiplex sequencing: ~ 25% missing data for 96-plex and ~ 55% for 384-plex. Imputation can greatly reduce the rate of missing genotypes to 12.65% and 3.72% for AM populations and bi-parental populations, respectively, although it increases total genotyping error. For analysis of genetic diversity and linkage mapping, unimputed data with a low rate of genotyping error is beneficial, whereas, for association mapping, imputed data would result in higher marker density and would improve map resolution. Because imputation does not influence the prediction accuracy, both unimputed and imputed data can be used for genomic prediction. In summary, GBS is a versatile and efficient SNP discovery approach for homozygous materials and can be effectively applied for various purposes in maize genetics and breeding.



Sign in / Sign up

Export Citation Format

Share Document