genotype calling
Recently Published Documents


TOTAL DOCUMENTS

91
(FIVE YEARS 14)

H-INDEX

22
(FIVE YEARS 2)

2022 ◽  
Author(s):  
Miguel Vallebueno-Estrada ◽  
Sonja Steindl ◽  
Vasilina Akulova ◽  
Julia Riefler ◽  
Lucyna Slusarz ◽  
...  

Reduced representation library approaches are still a valuable tool for breeding and population and ecological genomics, even with impressive increases in sequencing capacity in recent years. Unfortunately, current approaches only allow for multiplexing up to 384 samples. To take advantage of increased sequencing capacity, we present Multi-GBS, a massively multiplexable extension to Genotyping-by-Sequencing that is also optimized for large conifer genomes. In Norway Spruce, a highly repetitive 20Gbp diploid genome with high population genetic variation, we call over a million variants in 32 genotypes from three populations, two natural forest in the Alps and Bohemian Alps, and a managed population from southeastern Austria using the existing TASSEL GBSv2 pipeline. Metric MDS analysis of replicated genotypes shows that technical bias in resulting genotype calling is minimal and that populations cluster in biologically meaningful ways.


2021 ◽  
Vol 12 ◽  
Author(s):  
Frédéric Jehl ◽  
Fabien Degalez ◽  
Maria Bernard ◽  
Frédéric Lecerf ◽  
Laetitia Lagoutte ◽  
...  

In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequencing is expensive and exome sequencing tools are unavailable. These SNPs detected in expressed regions can be used to characterize variants affecting protein functions, and to study cis-regulated genes by analyzing allele-specific expression (ASE) in the tissue of interest. However, gene expression can be highly variable, and filters for SNP detection using the popular GATK toolkit are not yet standardized, making SNP detection and genotype calling by RNA-seq a challenging endeavor. We compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue. We showed, in expressed regions, a RNA-seq precision of 91% (SNPs detected by RNA-seq and shared by DNA-seq) and we characterized the remaining 9% of SNPs. We then studied the genotype (GT) obtained by RNA-seq and the impact of two factors (GT call-rate and read number per GT) on the concordance of GT with DNA-seq; we proposed thresholds for them leading to a 95% concordance. Applying these thresholds to 767 multi-tissue RNA-seq of 382 birds of 11 chicken populations, we found 9.5 M SNPs in total, of which ∼550,000 SNPs per tissue and population with a reliable GT (call rate ≥ 50%) and among them, ∼340,000 with a MAF ≥ 10%. We showed that such RNA-seq data from one tissue can be used to (i) detect SNPs with a strong predicted impact on proteins, despite their scarcity in each population (16,307 SIFT deleterious missenses and 590 stop-gained), (ii) study, on a large scale, cis-regulations of gene expression, with ∼81% of protein-coding and 68% of long non-coding genes (TPM ≥ 1) that can be analyzed for ASE, and with ∼29% of them that were cis-regulated, and (iii) analyze population genetic using such SNPs located in expressed regions. This work shows that RNA-seq data can be used with good confidence to detect SNPs and associated GT within various populations and used them for different analyses as GTEx studies.


Author(s):  
Yanlin Liao ◽  
Roeland E. Voorrips ◽  
Peter M. Bourke ◽  
Giorgio Tumino ◽  
Paul Arens ◽  
...  

Abstract Key message In polyploids, linkage mapping is carried out using genotyping with discrete dosage scores. Here, we use probabilistic genotypes and we validate it for the construction of polyploid linkage maps. Abstract Marker genotypes are generally called as discrete values: homozygous versus heterozygous in the case of diploids, or an integer allele dosage in the case of polyploids. Software for linkage map construction and/or QTL analysis usually relies on such discrete genotypes. However, it may not always be possible, or desirable, to assign definite values to genotype observations in the presence of uncertainty in the genotype calling. Here, we present an approach that uses probabilistic marker dosages for linkage map construction in polyploids. We compare our method to an approach based on discrete dosages, using simulated SNP array and sequence reads data with varying levels of data quality. We validate our approach using experimental data from a potato (Solanum tuberosum L.) SNP array applied to an F1 mapping population. In comparison to the approach based on discrete dosages, we mapped an additional 562 markers. All but three of these were mapped to the expected chromosome and marker position. For the remaining three markers, no physical position was known. The use of dosage probabilities is of particular relevance for map construction in polyploids using sequencing data, as these often result in a higher level of uncertainty regarding allele dosage.


2021 ◽  
Author(s):  
Yanlin Liao ◽  
Roeland E. Voorrips ◽  
Peter M. Bourke ◽  
Giorgio Tumino ◽  
Paul Arens ◽  
...  

Abstract Marker genotypes are generally called as discrete values: homozygous versus heterozygous in the case of diploids, or an integer allele dosage in the case of polyploids. Software for linkage map construction and/or QTL analysis usually relies on such discrete genotypes. However, it may not always be possible, or desirable, to assign definite values to genotype observations in the presence of uncertainty in the genotype calling. Here, we present an approach that uses probabilistic marker dosages for linkage map construction in polyploids. We compare our method to an approach based on discrete dosages, using simulated SNP array and sequence reads data with varying levels of data quality. We validate our approach using experimental data from a potato (Solanum tuberosum L.) SNP array applied to an F1 mapping population. In comparison to the approach based on discrete dosages, we mapped an additional 562 markers. All but three of these were mapped to the expected chromosome and marker position. For the remaining three markers, no physical position was known. The use of dosage probabilities is of particular relevance for map construction in polyploids using sequencing data, as these often result in a higher level of uncertainty regarding allele dosage.


2021 ◽  
Vol 282 ◽  
pp. 02007
Author(s):  
Natalia Safina ◽  
Shamil Shakirov ◽  
Elza Gaynutdinova ◽  
Ziliya Fattakhova

The aim of the work was to study the traits of dairy productivity of Holstein heifers with different genotypes of the paraoxonase-1 (PON1) gene. The research was conducted in 148 animals of Integrated Agricultural Production Centre “Stud farm named after Lenin” of Atninsky district of the Republic of Tatarstan. Genotyping of cattle was carried out by the PCR-RFLP method at the laboratory of the Department of Agrobiological Research of Tatar Scientific Research Institute of Agriculture, FRC Kazan Scientific Center, Russian Academy of Sciences. The results of allele and genotype calling of the PON1 gene showed that the study population is polymorphic and differs in genetic biodiversity. During the analysis of daity productivity, qualitative composition of milk and lactational activity, it was found that cow-heifers with the GG genotype of the PON1 gene were superior to animals with other genotypes in all the test parameters. Thus, it follows that the GG genotype of the PON1 gene has a positive effect on the economic characters of cattle, which can be used in breeding in the future.


Genetics ◽  
2020 ◽  
Vol 217 (1) ◽  
Author(s):  
Richard J Wang ◽  
Predrag Radivojac ◽  
Matthew W Hahn

Abstract Errors in genotype calling can have perverse effects on genetic analyses, confounding association studies, and obscuring rare variants. Analyses now routinely incorporate error rates to control for spurious findings. However, reliable estimates of the error rate can be difficult to obtain because of their variance between studies. Most studies also report only a single estimate of the error rate even though genotypes can be miscalled in more than one way. Here, we report a method for estimating the rates at which different types of genotyping errors occur at biallelic loci using pedigree information. Our method identifies potential genotyping errors by exploiting instances where the haplotypic phase has not been faithfully transmitted. The expected frequency of inconsistent phase depends on the combination of genotypes in a pedigree and the probability of miscalling each genotype. We develop a model that uses the differences in these frequencies to estimate rates for different types of genotype error. Simulations show that our method accurately estimates these error rates in a variety of scenarios. We apply this method to a dataset from the whole-genome sequencing of owl monkeys (Aotus nancymaae) in three-generation pedigrees. We find significant differences between estimates for different types of genotyping error, with the most common being homozygous reference sites miscalled as heterozygous and vice versa. The approach we describe is applicable to any set of genotypes where haplotypic phase can reliably be called and should prove useful in helping to control for false discoveries.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S7) ◽  
Author(s):  
Kirill A. Danilov ◽  
Dimitri A. Nikogosov ◽  
Sergey V. Musienko ◽  
Ancha V. Baranova

Abstract Background Head-to-head comparison of BeadChip and WGS/WES genotyping techniques for their precision is far from straightforward. A tool for validation of high-throughput genotyping calls such as Sanger sequencing is neither scalable nor practical for large-scale DNA processing. Here we report a cross-validation analysis of genotyping calls obtained via Illumina GSA BeadChip and WGS (Illumina HiSeq X Ten) techniques. Results When compared to each other, the average precision and accuracy of BeadChip and WGS genotyping techniques exceeded 0.991 and 0.997, respectively. The average fraction of discordant variants for both platforms was found to be 0.639%. A sliding window approach was utilized to explore genomic regions not exceeding 500 bp encompassing a maximal amount of discordant variants for further validation by Sanger sequencing. Notably, 12 variants out of 26 located within eight identified regions were consistently discordant in related calls made by WGS and BeadChip. When Sanger sequenced, a total of 16 of these genotypes were successfully resolved, indicating that a precision of WGS and BeadChip genotyping for this genotype subset was at 0.81 and 0.5, respectively, with accuracy values of 0.87 and 0.61. Conclusions We conclude that WGS genotype calling exhibits higher overall precision within the selected variety of discordantly genotyped variants, though the amount of validated variants remained insufficient.


2020 ◽  
Vol 52 (1) ◽  
Author(s):  
Kim Erik Grashei ◽  
Jørgen Ødegård ◽  
Theo H. E. Meuwissen
Keyword(s):  

2020 ◽  
Author(s):  
Richard J. Wang ◽  
Predrag Radivojac ◽  
Matthew W. Hahn

AbstractErrors in genotype calling can have perverse effects on genetic analyses, confounding association studies and obscuring rare variants. Analyses now routinely incorporate error rates to control for spurious findings. However, reliable estimates of the error rate can be difficult to obtain because of their variance between studies. Most studies also report only a single estimate of the error rate even though genotypes can be miscalled in more than one way. Here, we report a method for estimating the rates at which different types of genotyping errors occur at biallelic loci using pedigree information. Our method identifies potential genotyping errors by exploiting instances where the haplotypic phase has not been faithfully transmitted. The expected frequency of inconsistent phase depends on the combination of genotypes in a pedigree and the probability of miscalling each genotype. We develop a model that uses the differences in these frequencies to estimate rates for different types of genotype error. Simulations show that our method accurately estimates these error rates in a variety of scenarios. We apply this method to a dataset from the whole-genome sequencing of owl monkeys (Aotus nancymaae) in three-generation pedigrees. We find significant differences between estimates for different types of genotyping error, with the most common being homozygous reference sites miscalled as heterozygous and vice versa. The approach we describe is applicable to any set of genotypes where haplotypic phase can reliably be called, and should prove useful in helping to control for false discoveries.


2020 ◽  
Author(s):  
Lindsay V. Clark ◽  
Wittney Mays ◽  
Alexander E. Lipka ◽  
Erik J. Sacks

AbstractGiven the economic and environmental importance of allopolyploids and other species with highly duplicated genomes, there is a need for accurate genotyping methodology that distinguishes paralogs in order to yield Mendelian markers. Methods such as comparing observed and expected heterozygosity are frequently used for identifying collapsed paralogs, but have limitations in genotyping-by-sequencing datasets, in which observed heterozygosity is difficult to estimate due to undersampling of alleles. These limitations are especially pronounced when the species is highly heterozygous or the expected inheritance is polysomic. We introduce a novel statistic, Hind/HE, that uses the probability of sampling reads of two different alleles at a sample*locus, instead of observed heterozygosity. The expected value of Hind/HE is the same across all loci in a dataset, regardless of read depth or allele frequency. In contrast to methods based on observed heterozygosity, it can be estimated and used for filtering loci prior to genotype calling. We also introduce an algorithm that can choose among multiple alignment locations for a given sequence tag in order to optimize the value of Hind/HE for each locus, correcting alignment errors that frequently occur in highly duplicated genomes. Our methodology is implemented in polyRAD v1.2, available at https://github.com/lvclark/polyRAD.


Sign in / Sign up

Export Citation Format

Share Document