genotype calling Latest Research Papers

Reduced representation library approaches are still a valuable tool for breeding and population and ecological genomics, even with impressive increases in sequencing capacity in recent years. Unfortunately, current approaches only allow for multiplexing up to 384 samples. To take advantage of increased sequencing capacity, we present Multi-GBS, a massively multiplexable extension to Genotyping-by-Sequencing that is also optimized for large conifer genomes. In Norway Spruce, a highly repetitive 20Gbp diploid genome with high population genetic variation, we call over a million variants in 32 genotypes from three populations, two natural forest in the Alps and Bohemian Alps, and a managed population from southeastern Austria using the existing TASSEL GBSv2 pipeline. Metric MDS analysis of replicated genotypes shows that technical bias in resulting genotype calling is minimal and that populations cluster in biologically meaningful ways.

Download Full-text

RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species

Frontiers in Genetics ◽

10.3389/fgene.2021.655707 ◽

2021 ◽

Vol 12 ◽

Author(s):

Frédéric Jehl ◽

Fabien Degalez ◽

Maria Bernard ◽

Frédéric Lecerf ◽

Laetitia Lagoutte ◽

...

Keyword(s):

Gene Expression ◽

Call Rate ◽

Rna Seq ◽

Snp Detection ◽

Specific Expression ◽

Genotype Calling ◽

Allele Specific Expression ◽

Livestock Species ◽

Allele Specific ◽

The Impact

In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequencing is expensive and exome sequencing tools are unavailable. These SNPs detected in expressed regions can be used to characterize variants affecting protein functions, and to study cis-regulated genes by analyzing allele-specific expression (ASE) in the tissue of interest. However, gene expression can be highly variable, and filters for SNP detection using the popular GATK toolkit are not yet standardized, making SNP detection and genotype calling by RNA-seq a challenging endeavor. We compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue. We showed, in expressed regions, a RNA-seq precision of 91% (SNPs detected by RNA-seq and shared by DNA-seq) and we characterized the remaining 9% of SNPs. We then studied the genotype (GT) obtained by RNA-seq and the impact of two factors (GT call-rate and read number per GT) on the concordance of GT with DNA-seq; we proposed thresholds for them leading to a 95% concordance. Applying these thresholds to 767 multi-tissue RNA-seq of 382 birds of 11 chicken populations, we found 9.5 M SNPs in total, of which ∼550,000 SNPs per tissue and population with a reliable GT (call rate ≥ 50%) and among them, ∼340,000 with a MAF ≥ 10%. We showed that such RNA-seq data from one tissue can be used to (i) detect SNPs with a strong predicted impact on proteins, despite their scarcity in each population (16,307 SIFT deleterious missenses and 590 stop-gained), (ii) study, on a large scale, cis-regulations of gene expression, with ∼81% of protein-coding and 68% of long non-coding genes (TPM ≥ 1) that can be analyzed for ASE, and with ∼29% of them that were cis-regulated, and (iii) analyze population genetic using such SNPs located in expressed regions. This work shows that RNA-seq data can be used with good confidence to detect SNPs and associated GT within various populations and used them for different analyses as GTEx studies.

Download Full-text

Using probabilistic genotypes in linkage analysis of polyploids

Theoretical and Applied Genetics ◽

10.1007/s00122-021-03834-x ◽

2021 ◽

Author(s):

Yanlin Liao ◽

Roeland E. Voorrips ◽

Peter M. Bourke ◽

Giorgio Tumino ◽

Paul Arens ◽

...

Keyword(s):

Linkage Map ◽

Linkage Mapping ◽

Mapping Population ◽

Snp Array ◽

Linkage Maps ◽

Sequencing Data ◽

Genotype Calling ◽

Map Construction ◽

Allele Dosage ◽

Discrete Values

Abstract Key message In polyploids, linkage mapping is carried out using genotyping with discrete dosage scores. Here, we use probabilistic genotypes and we validate it for the construction of polyploid linkage maps. Abstract Marker genotypes are generally called as discrete values: homozygous versus heterozygous in the case of diploids, or an integer allele dosage in the case of polyploids. Software for linkage map construction and/or QTL analysis usually relies on such discrete genotypes. However, it may not always be possible, or desirable, to assign definite values to genotype observations in the presence of uncertainty in the genotype calling. Here, we present an approach that uses probabilistic marker dosages for linkage map construction in polyploids. We compare our method to an approach based on discrete dosages, using simulated SNP array and sequence reads data with varying levels of data quality. We validate our approach using experimental data from a potato (Solanum tuberosum L.) SNP array applied to an F1 mapping population. In comparison to the approach based on discrete dosages, we mapped an additional 562 markers. All but three of these were mapped to the expected chromosome and marker position. For the remaining three markers, no physical position was known. The use of dosage probabilities is of particular relevance for map construction in polyploids using sequencing data, as these often result in a higher level of uncertainty regarding allele dosage.

Download Full-text

Using Probabilistic Genotypes in Linkage Analysis of polyploids

10.21203/rs.3.rs-247800/v1 ◽

2021 ◽

Author(s):

Yanlin Liao ◽

Roeland E. Voorrips ◽

Peter M. Bourke ◽

Giorgio Tumino ◽

Paul Arens ◽

...

Keyword(s):

Experimental Data ◽

Linkage Map ◽

Mapping Population ◽

Snp Array ◽

Sequencing Data ◽

Genotype Calling ◽

Map Construction ◽

Allele Dosage ◽

Physical Position ◽

Discrete Values

Abstract Marker genotypes are generally called as discrete values: homozygous versus heterozygous in the case of diploids, or an integer allele dosage in the case of polyploids. Software for linkage map construction and/or QTL analysis usually relies on such discrete genotypes. However, it may not always be possible, or desirable, to assign definite values to genotype observations in the presence of uncertainty in the genotype calling. Here, we present an approach that uses probabilistic marker dosages for linkage map construction in polyploids. We compare our method to an approach based on discrete dosages, using simulated SNP array and sequence reads data with varying levels of data quality. We validate our approach using experimental data from a potato (Solanum tuberosum L.) SNP array applied to an F1 mapping population. In comparison to the approach based on discrete dosages, we mapped an additional 562 markers. All but three of these were mapped to the expected chromosome and marker position. For the remaining three markers, no physical position was known. The use of dosage probabilities is of particular relevance for map construction in polyploids using sequencing data, as these often result in a higher level of uncertainty regarding allele dosage.

Download Full-text

Dairy productivity of Holstein cattle with different genotypes of the paraoxonase-1 (PON1) gene

E3S Web of Conferences ◽

10.1051/e3sconf/202128202007 ◽

2021 ◽

Vol 282 ◽

pp. 02007

Author(s):

Natalia Safina ◽

Shamil Shakirov ◽

Elza Gaynutdinova ◽

Ziliya Fattakhova

Keyword(s):

Paraoxonase 1 ◽

Scientific Center ◽

Genotype Calling ◽

Pon1 Gene ◽

Test Parameters ◽

Pcr Rflp ◽

Study Population ◽

The Republic ◽

Academy Of Sciences ◽

Positive Effect

The aim of the work was to study the traits of dairy productivity of Holstein heifers with different genotypes of the paraoxonase-1 (PON1) gene. The research was conducted in 148 animals of Integrated Agricultural Production Centre “Stud farm named after Lenin” of Atninsky district of the Republic of Tatarstan. Genotyping of cattle was carried out by the PCR-RFLP method at the laboratory of the Department of Agrobiological Research of Tatar Scientific Research Institute of Agriculture, FRC Kazan Scientific Center, Russian Academy of Sciences. The results of allele and genotype calling of the PON1 gene showed that the study population is polymorphic and differs in genetic biodiversity. During the analysis of daity productivity, qualitative composition of milk and lactational activity, it was found that cow-heifers with the GG genotype of the PON1 gene were superior to animals with other genotypes in all the test parameters. Thus, it follows that the GG genotype of the PON1 gene has a positive effect on the economic characters of cattle, which can be used in breeding in the future.

Download Full-text

Distinct error rates for reference and nonreference genotypes estimated by pedigree analysis

Genetics ◽

10.1093/genetics/iyaa014 ◽

2020 ◽

Vol 217 (1) ◽

Author(s):

Richard J Wang ◽

Predrag Radivojac ◽

Matthew W Hahn

Keyword(s):

Error Rate ◽

Rare Variants ◽

Association Studies ◽

Pedigree Analysis ◽

Error Rates ◽

Pedigree Information ◽

Genotyping Errors ◽

Genotype Calling ◽

Different Types ◽

False Discoveries

Abstract Errors in genotype calling can have perverse effects on genetic analyses, confounding association studies, and obscuring rare variants. Analyses now routinely incorporate error rates to control for spurious findings. However, reliable estimates of the error rate can be difficult to obtain because of their variance between studies. Most studies also report only a single estimate of the error rate even though genotypes can be miscalled in more than one way. Here, we report a method for estimating the rates at which different types of genotyping errors occur at biallelic loci using pedigree information. Our method identifies potential genotyping errors by exploiting instances where the haplotypic phase has not been faithfully transmitted. The expected frequency of inconsistent phase depends on the combination of genotypes in a pedigree and the probability of miscalling each genotype. We develop a model that uses the differences in these frequencies to estimate rates for different types of genotype error. Simulations show that our method accurately estimates these error rates in a variety of scenarios. We apply this method to a dataset from the whole-genome sequencing of owl monkeys (Aotus nancymaae) in three-generation pedigrees. We find significant differences between estimates for different types of genotyping error, with the most common being homozygous reference sites miscalled as heterozygous and vice versa. The approach we describe is applicable to any set of genotypes where haplotypic phase can reliably be called and should prove useful in helping to control for false discoveries.

Download Full-text

A comparison of BeadChip and WGS genotyping outputs using partial validation by sanger sequencing

BMC Genomics ◽

10.1186/s12864-020-06919-x ◽

2020 ◽

Vol 21 (S7) ◽

Author(s):

Kirill A. Danilov ◽

Dimitri A. Nikogosov ◽

Sergey V. Musienko ◽

Ancha V. Baranova

Keyword(s):

Sanger Sequencing ◽

Large Scale ◽

Cross Validation ◽

Sliding Window ◽

Average Precision ◽

Illumina Hiseq ◽

Genotype Calling ◽

Average Fraction ◽

Window Approach ◽

Genomic Regions

Abstract Background Head-to-head comparison of BeadChip and WGS/WES genotyping techniques for their precision is far from straightforward. A tool for validation of high-throughput genotyping calls such as Sanger sequencing is neither scalable nor practical for large-scale DNA processing. Here we report a cross-validation analysis of genotyping calls obtained via Illumina GSA BeadChip and WGS (Illumina HiSeq X Ten) techniques. Results When compared to each other, the average precision and accuracy of BeadChip and WGS genotyping techniques exceeded 0.991 and 0.997, respectively. The average fraction of discordant variants for both platforms was found to be 0.639%. A sliding window approach was utilized to explore genomic regions not exceeding 500 bp encompassing a maximal amount of discordant variants for further validation by Sanger sequencing. Notably, 12 variants out of 26 located within eight identified regions were consistently discordant in related calls made by WGS and BeadChip. When Sanger sequenced, a total of 16 of these genotypes were successfully resolved, indicating that a precision of WGS and BeadChip genotyping for this genotype subset was at 0.81 and 0.5, respectively, with accuracy values of 0.87 and 0.61. Conclusions We conclude that WGS genotype calling exhibits higher overall precision within the selected variety of discordantly genotyped variants, though the amount of validated variants remained insufficient.

Download Full-text

Genotype calling of triploid offspring from diploid parents

Genetics Selection Evolution ◽

10.1186/s12711-020-00534-w ◽

2020 ◽

Vol 52 (1) ◽

Author(s):

Kim Erik Grashei ◽

Jørgen Ødegård ◽

Theo H. E. Meuwissen

Keyword(s):

Genotype Calling

Download Full-text

Distinct error rates for reference and non-reference genotypes estimated by pedigree analysis

10.1101/2020.02.06.937649 ◽

2020 ◽

Author(s):

Richard J. Wang ◽

Predrag Radivojac ◽

Matthew W. Hahn

Keyword(s):

Error Rate ◽

Rare Variants ◽

Association Studies ◽

Pedigree Analysis ◽

Error Rates ◽

Pedigree Information ◽

Genotyping Errors ◽

Genotype Calling ◽

Different Types ◽

False Discoveries

AbstractErrors in genotype calling can have perverse effects on genetic analyses, confounding association studies and obscuring rare variants. Analyses now routinely incorporate error rates to control for spurious findings. However, reliable estimates of the error rate can be difficult to obtain because of their variance between studies. Most studies also report only a single estimate of the error rate even though genotypes can be miscalled in more than one way. Here, we report a method for estimating the rates at which different types of genotyping errors occur at biallelic loci using pedigree information. Our method identifies potential genotyping errors by exploiting instances where the haplotypic phase has not been faithfully transmitted. The expected frequency of inconsistent phase depends on the combination of genotypes in a pedigree and the probability of miscalling each genotype. We develop a model that uses the differences in these frequencies to estimate rates for different types of genotype error. Simulations show that our method accurately estimates these error rates in a variety of scenarios. We apply this method to a dataset from the whole-genome sequencing of owl monkeys (Aotus nancymaae) in three-generation pedigrees. We find significant differences between estimates for different types of genotyping error, with the most common being homozygous reference sites miscalled as heterozygous and vice versa. The approach we describe is applicable to any set of genotypes where haplotypic phase can reliably be called, and should prove useful in helping to control for false discoveries.

Download Full-text

A population-level statistic for assessing Mendelian behavior of genotyping-by-sequencing data from highly duplicated genomes

10.1101/2020.01.11.902890 ◽

2020 ◽

Author(s):

Lindsay V. Clark ◽

Wittney Mays ◽

Alexander E. Lipka ◽

Erik J. Sacks

Keyword(s):

Genotyping By Sequencing ◽

Population Level ◽

Read Depth ◽

Multiple Alignment ◽

Sequencing Data ◽

Genotype Calling ◽

Expected Heterozygosity ◽

Level Statistic ◽

Alignment Errors ◽

Environmental Importance

AbstractGiven the economic and environmental importance of allopolyploids and other species with highly duplicated genomes, there is a need for accurate genotyping methodology that distinguishes paralogs in order to yield Mendelian markers. Methods such as comparing observed and expected heterozygosity are frequently used for identifying collapsed paralogs, but have limitations in genotyping-by-sequencing datasets, in which observed heterozygosity is difficult to estimate due to undersampling of alleles. These limitations are especially pronounced when the species is highly heterozygous or the expected inheritance is polysomic. We introduce a novel statistic, Hind/HE, that uses the probability of sampling reads of two different alleles at a sample*locus, instead of observed heterozygosity. The expected value of Hind/HE is the same across all loci in a dataset, regardless of read depth or allele frequency. In contrast to methods based on observed heterozygosity, it can be estimated and used for filtering loci prior to genotype calling. We also introduce an algorithm that can choose among multiple alignment locations for a given sequence tag in order to optimize the value of Hind/HE for each locus, correcting alignment errors that frequently occur in highly duplicated genomes. Our methodology is implemented in polyRAD v1.2, available at https://github.com/lvclark/polyRAD.

Download Full-text

genotype calling
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Multi-GBS: A massively multiplexed GBS-based protocol optimized for large, repetitive conifer genomes

RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species

Using probabilistic genotypes in linkage analysis of polyploids

Using Probabilistic Genotypes in Linkage Analysis of polyploids

Dairy productivity of Holstein cattle with different genotypes of the paraoxonase-1 (PON1) gene

Distinct error rates for reference and nonreference genotypes estimated by pedigree analysis

A comparison of BeadChip and WGS genotyping outputs using partial validation by sanger sequencing

Genotype calling of triploid offspring from diploid parents

Distinct error rates for reference and non-reference genotypes estimated by pedigree analysis

A population-level statistic for assessing Mendelian behavior of genotyping-by-sequencing data from highly duplicated genomes

Export Citation Format

genotype callingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Multi-GBS: A massively multiplexed GBS-based protocol optimized for large, repetitive conifer genomes

RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species

Using probabilistic genotypes in linkage analysis of polyploids

Using Probabilistic Genotypes in Linkage Analysis of polyploids

Dairy productivity of Holstein cattle with different genotypes of the paraoxonase-1 (PON1) gene

Distinct error rates for reference and nonreference genotypes estimated by pedigree analysis

A comparison of BeadChip and WGS genotyping outputs using partial validation by sanger sequencing

Genotype calling of triploid offspring from diploid parents

Distinct error rates for reference and non-reference genotypes estimated by pedigree analysis

A population-level statistic for assessing Mendelian behavior of genotyping-by-sequencing data from highly duplicated genomes

genotype calling
Recently Published Documents