genotyping errors
Recently Published Documents


TOTAL DOCUMENTS

109
(FIVE YEARS 16)

H-INDEX

33
(FIVE YEARS 1)

2022 ◽  
Author(s):  
Alejandro Thérèse Navarro ◽  
Peter M. Bourke ◽  
Eric van de Weg ◽  
Paul Arens ◽  
Richard Finkers ◽  
...  

Abstract Linkage mapping is an approach to order markers based on recombination events. Mapping algorithms cannot easily handle genotyping errors, which are common in high-throughput genotyping data. To solve this issue, strategies have been developed, aimed mostly at identifying and eliminating these errors. One such strategy is SMOOTH (van Os et al. 2005), an iterative algorithm to detect genotyping errors. Unlike other approaches, SMOOTH can also be used to impute the most probable alternative genotypes, but its application is limited to diploid species and to markers heterozygous in only one of the parents. In this study we adapted SMOOTH to expand its use to any marker type and to autopolyploids with the use of identity-by-descent probabilities, naming the updated algorithm Smooth Descent (SD). We applied SD to real and simulated data, showing that in the presence of genotyping errors this method produces better genetic maps in terms of marker order and map length. SD is particularly useful for error rates between 5% and 20% and when error rates are not homogeneous among markers or individuals. Moreover, the simplicity of the algorithm allows thousands of markers to be efficiently processed, thus being particularly useful for error detection in high-density datasets. We have implemented this algorithm in the R package SmoothDescent.


2021 ◽  
pp. gr.275579.121
Author(s):  
Daniel P Cooke ◽  
David C Wedge ◽  
Gerton Lunter

Genotyping from sequencing is the basis of emerging strategies in the molecular breeding of polyploid plants. However, compared with the situation for diploids, where genotyping accuracies are confidently determined with comprehensive benchmarks, polyploids have been neglected; there are no benchmarks measuring genotyping error rates for small variants using real sequencing reads. We previously introduced a variant calling method - Octopus - that accurately calls germline variants in diploids and somatic mutations in tumors. Here, we evaluate Octopus and other popular tools on whole-genome tetraploid and hexaploid datasets created using in silico mixtures of diploid Genome In a Bottle (GIAB) samples. We find that genotyping errors are abundant for typical sequencing depths, but that Octopus makes 25% fewer errors than other methods on average. We supplement our benchmarks with concordance analysis in real autotriploid banana datasets.


2021 ◽  
Author(s):  
Jan-Niklas Runge ◽  
Barbara König ◽  
Anna K. Lindholm ◽  
Andres Bendesky

Genealogical relationships are fundamental components of genetic studies. However, it is often challenging to infer correct and complete pedigrees even when genome-wide information is available. For example, inbreeding can obfuscate genetic differences between individuals, making it difficult to even distinguish first-degree relatives such as parent-offspring from full siblings. Similarly, genotyping errors can interfere with the detection of genetic similarity between parents and their offspring. Inbreeding is common in natural, domesticated, and experimental populations and genotyping of these populations often has more errors than in human datasets, so efficient methods for building pedigrees under these conditions are necessary. Here, we present a new method for parent-offspring inference in inbred pedigrees called SPORE (Specific Parent-Offspring Relationship Estimation). SPORE is vastly superior to existing pedigree-inference methods at detecting parent-offspring relationships, in particular when inbreeding is high or in the presence of genotyping errors, or both. SPORE therefore fills an important void in the arsenal of pedigree inference tools.


2021 ◽  
Author(s):  
Luis Gomez-Raya ◽  
Emilio Izquierdo ◽  
Eduardo Mercado de la Peña ◽  
Fabian Garcia-Ruiz ◽  
Wendy Mercedes Rauw

Abstract Background Two individuals with a first-degree relationship share about 50 percent of their alleles. Parent-offspring relationships cannot be homozygotes for alternative alleles (genetic exclusion). Methods Applying the concept of genetic exclusion to HD arrays typed in animals for experimental purposes or genomic selection allows estimation of the rate of rejection of first-degree relationships as the rate at which two individuals typed for a large number of SNPs do not share at least one allele. An Expectation–Maximization algorithm is applied to estimate parentage. In addition, genotyping errors are estimated in true parent-offspring relationships due to the large number of SNPs. Nine candidate Duroc sires and 55 Iberian dams producing 214 Duroc × Iberian barrows were typed for the HD porcine Affymetrix array. Results We were able to establish paternity and maternity of 75 and 86 piglets, respectively. A lower bound of the genotyping error of 0.003345 was estimated based on the rate of rejection of true parent-offspring relationships among autosomal SNPs. The true genotyping error is estimated to be between twice and three times the average of the rate of rejection observed in true relationships, i.e., between approximately 0.0067 and 0.0100. A total of 8,558 SNPs were rejected in six or more true parent-offspring relationships facilitating identification of “problematic” SNPs with inconsistent inheritance. Conclusions This study shows that animal experiments and routine genotyping in genomic selection allow to establish or to verify first-degree relationships as well as to estimate genotyping errors for each batch of animals or experiment.


2021 ◽  
Author(s):  
Steven Andrew Yates ◽  
Bruno Studer

BACKGROUND: Genotyping-by-sequencing (GBS) has revolutionised molecular genetic analysis. It enables simultaneous genotyping of thousands of DNA markers in the genome of any species. In contrast to whole-genome shotgun sequencing, GBS exploits a restriction enzyme to reduce genome complexity and directs the sequencing to begin at fixed digestion sites. However, currently used tools for the analysis of GBS data, such as SAMtools, often neglect the fundamental technical differences between GBS and shotgun sequencing. RESULTS: Here we present GBSmode, a dedicated pipeline to call DNA sequence variants using whole-read information from GBS data. It removes false positives by incorporating biological features such as the ploidy level and the number of possible alleles in the population under investigation. Comparison of GBSmode with SAMtools in an F2 population of rice (Oryza sativa L.) showed both identified a similar number of polymorphisms (13,449 and 14,445, respectively) with a high overlap (8,143). However, differences were found in the number of read misalignments (8.0% and 14.3% for GBSmode and SAMtools, respectively) and genotyping errors (5.0% and 8.3% for GBSmode and SAMtools, respectively). Further tests in a bi-parental F1 population of cassava (Manihot esculenta Crantz) showed GBSmode found 31,489 polymorphic loci, whereas the number was higher with SAMtools (43,860). However, this difference was mainly attributable to GBSmode rejecting 11,695 loci that were biologically not possible. CONCLUSIONS: This study shows that GBSmode is a versatile tool for the analysis of GBS data. Moreover, GBSmode was able to reduce genotyping errors arising from read misalignments by combining haplotype data with biological information. Whilst other tools may find more markers, GBSmode is designed for accuracy.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Erminia Scarpulla ◽  
Alessio Boattini ◽  
Mario Cozzo ◽  
Patrizia Giangregorio ◽  
Paolo Ciucci ◽  
...  

Abstract Background The low cost and rapidity of microsatellite analysis have led to the development of several markers for many species. Because in non-invasive genetics it is recommended to genotype individuals using few loci, generally a subset of markers is selected. The choice of different marker panels by different research groups studying the same population can cause problems and bias in data analysis. A priority issue in conservation genetics is the comparability of data produced by different labs with different methods. Here, we compared data from previous and ongoing studies to identify a panel of microsatellite loci efficient for the long-term monitoring of Apennine brown bears (Ursus arctos marsicanus), aiming at reducing genotyping uncertainty and allowing reliable individual identifications overtimes. Results We examined all microsatellite markers used up to now and identified 19 candidate loci. We evaluated the efficacy of 13 of the most commonly used loci analyzing 194 DNA samples belonging to 113 distinct bears selected from the Italian national biobank. We compared data from 4 different marker subsets on the basis of genotyping errors, allelic patterns, observed and expected heterozygosity, discriminatory powers, number of mismatching pairs, and probability of identity. The optimal marker set was selected evaluating the low molecular weight, the high discriminatory power, and the low occurrence of genotyping errors of each primer. We calibrated allele calls and verified matches among genotypes obtained in previous studies using the complete set of 13 STRs (Short Tandem Repeats), analyzing six invasive DNA samples from distinct individuals. Differences in allele-sizing between labs were consistent, showing a substantial overlap of the individual genotyping. Conclusions The proposed marker set comprises 11 Ursus specific markers with the addition of cxx20, the canid-locus less prone to genotyping errors, in order to prevent underestimation (maximizing the discriminatory power) and overestimation (minimizing the genotyping errors) of the number of Apennine brown bears. The selected markers allow saving time and costs with the amplification in multiplex of all loci thanks to the same annealing temperature. Our work optimizes the available resources by identifying a shared panel and a uniform methodology capable of improving comparisons between past and future studies.


Author(s):  
Stella C. Yuan ◽  
Eric Malekos ◽  
Melissa T. R. Hawkins

AbstractThe use of museum specimens held in natural history repositories for population and conservation genetic research is increasing in tandem with the use of massively parallel sequencing technologies. Short Tandem Repeats (STRs), or microsatellite loci, are commonly used genetic markers in wildlife and population genetic studies. However, they traditionally suffered from a host of issues including length homoplasy, high costs, low throughput, and difficulties in reproducibility across laboratories. Massively parallel sequencing technologies can address these problems, but the incorporation of museum specimen derived DNA suffers from significant fragmentation and exogenous DNA contamination. Combatting these issues requires extra measures of stringency in the lab and during data analysis, yet there have not been any high-throughput sequencing studies evaluating microsatellite allelic dropout from museum specimen extracted DNA. In this study, we evaluate genotyping errors derived from mammalian museum skin DNA extracts for previously characterized microsatellites across PCR replicates utilizing high-throughput sequencing. We found it useful to classify samples based on DNA concentration, which determined the rate by which genotypes were accurately recovered. Longer microsatellites performed worse in all museum specimens. Allelic dropout rates across loci were dependent on sample quantity, with high concentration museum specimens performing as well and recovering quality metrics nearly as high as the frozen tissue sample. Based on our results, we provide a set of best practices for quality assurance and incorporation of reliable genotypes from museum specimens.


2021 ◽  
Author(s):  
Daniel P Cooke ◽  
David C Wedge ◽  
Gerton Lunter

Genotyping from sequencing is the basis of emerging strategies in the molecular breeding of polyploid plants. However, compared with the situation for diploids, where genotyping accuracies are confidently determined with comprehensive benchmarks, polyploids have been neglected; there are no benchmarks measuring genotyping error rates for small variants using real sequencing reads. We previously introduced a variant calling method – Octopus – that accurately calls germline variants in diploids and somatic mutations in tumors. Here, we evaluate Octopus and other popular tools on whole-genome tetraploid and hexaploid datasets created using in silico mixtures of diploid Genome In a Bottle samples. We find that genotyping errors are abundant for typical sequencing depths, but that Octopus makes 25% fewer errors than other methods on average. We supplement our benchmarks with concordance analysis in real autotriploid banana datasets.


2021 ◽  
Author(s):  
Gil-Muñoz Francisco ◽  
Abrahamsson Sara ◽  
García-Gil M Rosario

AbstractGenotyping mistakes represent a challenge in parental assignment where even small errors can lead to significant amounts of unassigned siblings. Different parental assignment algorithms have been designed to approach this problem. The Exclusion method is the most applied for its reliability and biological meaning. However, the resolving power of this method is the lowest for data containing genotyping errors. We introduce a new distance-based approach which we coin as Distance-Based Exclusion (DBE). The DBE method calculates the distance between the offspring haplotype and haplotype of each of the potential fathers. The father with the lowest distance is then assigned as candidate father according to a distance ratio (α). We have tested the Exclusion and DBE methods using a real dataset of 1230 offsprings subdivided into families of 25 individuals. Each family had six potential fathers and one known mother. Compared with the Exclusion method, the DBE method is able to solve 4.7% more individuals (64.4% Exclusion vs 69.1% DBE) using the most restrictive α tested without errors. DBE method can also be used together with the Exclusion method for error calculation and to further solve unassigned individuals. Using a two-step approach, we were able to assign 98.1% of the offsprings with a total predicted error of 4.7%. Considering the results obtained, we propose the use of the DBE method in combination with the Exclusion method for parental assignment.


Genetics ◽  
2020 ◽  
Vol 217 (1) ◽  
Author(s):  
Richard J Wang ◽  
Predrag Radivojac ◽  
Matthew W Hahn

Abstract Errors in genotype calling can have perverse effects on genetic analyses, confounding association studies, and obscuring rare variants. Analyses now routinely incorporate error rates to control for spurious findings. However, reliable estimates of the error rate can be difficult to obtain because of their variance between studies. Most studies also report only a single estimate of the error rate even though genotypes can be miscalled in more than one way. Here, we report a method for estimating the rates at which different types of genotyping errors occur at biallelic loci using pedigree information. Our method identifies potential genotyping errors by exploiting instances where the haplotypic phase has not been faithfully transmitted. The expected frequency of inconsistent phase depends on the combination of genotypes in a pedigree and the probability of miscalling each genotype. We develop a model that uses the differences in these frequencies to estimate rates for different types of genotype error. Simulations show that our method accurately estimates these error rates in a variety of scenarios. We apply this method to a dataset from the whole-genome sequencing of owl monkeys (Aotus nancymaae) in three-generation pedigrees. We find significant differences between estimates for different types of genotyping error, with the most common being homozygous reference sites miscalled as heterozygous and vice versa. The approach we describe is applicable to any set of genotypes where haplotypic phase can reliably be called and should prove useful in helping to control for false discoveries.


Sign in / Sign up

Export Citation Format

Share Document