scholarly journals Robust, flexible, and scalable tests for Hardy–Weinberg equilibrium across diverse ancestries

Genetics ◽  
2021 ◽  
Author(s):  
Alan M Kwong ◽  
Thomas W Blackwell ◽  
Jonathon LeFaive ◽  
Mariza de Andrade ◽  
John Barnard ◽  
...  

Abstract Traditional Hardy–Weinberg equilibrium (HWE) tests (the χ2 test and the exact test) have long been used as a metric for evaluating genotype quality, as technical artifacts leading to incorrect genotype calls often can be identified as deviations from HWE. However, in data sets composed of individuals from diverse ancestries, HWE can be violated even without genotyping error, complicating the use of HWE testing to assess genotype data quality. In this manuscript, we present the Robust Unified Test for HWE (RUTH) to test for HWE while accounting for population structure and genotype uncertainty, and to evaluate the impact of population heterogeneity and genotype uncertainty on the standard HWE tests and alternative methods using simulated and real sequence data sets. Our results demonstrate that ignoring population structure or genotype uncertainty in HWE tests can inflate false-positive rates by many orders of magnitude. Our evaluations demonstrate different tradeoffs between false positives and statistical power across the methods, with RUTH consistently among the best across all evaluations. RUTH is implemented as a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats. RUTH is publicly available at https://www.github.com/statgen/ruth.

2020 ◽  
Author(s):  
Alan M. Kwong ◽  
Thomas W. Blackwell ◽  
Jonathon LeFaive ◽  
Mariza de Andrade ◽  
John Barnard ◽  
...  

ABSTRACTTraditional Hardy-Weinberg equilibrium (HWE) tests (the χ2 test and the exact test) have long been used as a metric for evaluating genotype quality, as technical artifacts leading to incorrect genotype calls often can be identified as deviations from HWE. However, in datasets comprised of individuals from diverse ancestries, HWE can be violated even without genotyping error, complicating the use of HWE testing to assess genotype data quality. In this manuscript, we present the Robust Unified Test for HWE (RUTH) to test for HWE while accounting for population structure and genotype uncertainty, and evaluate the impact of population heterogeneity and genotype uncertainty on the standard HWE tests and alternative methods using simulated and real sequence datasets. Our results demonstrate that ignoring population structure or genotype uncertainty in HWE tests can inflate false positive rates by many orders of magnitude. Our evaluations demonstrate different tradeoffs between false positives and statistical power across the methods, with RUTH consistently amongst the best across all evaluations. RUTH is implemented as a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats. RUTH is publicly available at https://www.github.com/statgen/ruth.


2019 ◽  
Author(s):  
Emil Jørsboe ◽  
Anders Albrechtsen

1AbstractIntroductionAssociation studies using genetic data from SNP-chip based imputation or low depth sequencing data provide a cost efficient design for large scale studies. However, these approaches provide genetic data with uncertainty of the observed genotypes. Here we explore association methods that can be applied to data where the genotype is not directly observed. We investigate how using different priors when estimating genotype probabilities affects the association results in different scenarios such as studies with population structure and varying depth sequencing data. We also suggest a method (ANGSD-asso) that is computational feasible for analysing large scale low depth sequencing data sets, such as can be generated by the non-invasive prenatal testing (NIPT) with low-pass sequencing.MethodsANGSD-asso’s EM model works by modelling the unobserved genotype as a latent variable in a generalised linear model framework. The software is implemented in C/C++ and can be run multi-threaded enabling the analysis of big data sets. ANGSD-asso is based on genotype probabilities, they can be estimated in various ways, such as using the sample allele frequency as a prior, using the individual allele frequencies as a prior or using haplotype frequencies from haplotype imputation. Using simulations of sequencing data we explore how genotype probability based method compares to using genetic dosages in large association studies with genotype uncertainty.Results & DiscussionOur simulations show that in a structured population using the individual allele frequency prior has better power than the sample allele frequency. If there is a correlation between genotype uncertainty and phenotype, then the individual allele frequency prior also helps control the false positive rate. In the absence of population structure the sample allele frequency prior and the individual allele frequency prior perform similarly. In scenarios with sequencing depth and phenotype correlation ANGSD-asso’s EM model has better statistical power and less bias compared to using dosages. Lastly when adding additional covariates to the linear model ANGSD-asso’s EM model has more statistical power and provides less biased effect sizes than other methods that accommodate genotype uncertainly, while also being much faster. This makes it possible to properly account for genotype uncertainty in large scale association studies.


2015 ◽  
Vol 63 (4) ◽  
pp. 275
Author(s):  
Andrea Bertram ◽  
P. Joana Dias ◽  
Sherralee Lukehurst ◽  
W. Jason Kennington ◽  
David Fairclough ◽  
...  

Bight redfish, Centroberyx gerrardi, is a demersal teleost endemic to continental shelf and upper slope waters of southern Australia. Throughout most of its range, C. gerrardi is targeted by a number of separately managed commercial and recreational fisheries across several jurisdictions. However, it is currently unknown whether stock assessments and management for this shared resource are being conducted at appropriate spatial scales, thereby requiring knowledge of population structure and connectivity. To investigate population structure and connectivity, we developed 16 new polymorphic microsatellite markers using 454 shotgun sequencing. Two to 15 alleles per locus were detected. There was no evidence of linkage disequilibrium between pairs of loci and all loci except one were in Hardy–Weinberg equilibrium. Cross-amplification trials in the congeneric C. australis and C. lineatus revealed that 11 and 16 loci are potentially useful, respectively. However, deviations from Hardy–Weinberg equilibrium and linkage disequilibrium between pairs of loci were detected at several of the 16 markers for C. australis, and therefore the number of markers useful for population genetic analyses with C. lineatus is likely considerably lower than 11.


2019 ◽  
Author(s):  
Xinzhu Wei ◽  
Rasmus Nielsen

AbstractPrevious analyses of the UK Biobank (UKB) genotyping array data in the CCR5-Δ32 locus show evidence for deviations from Hardy-Weinberg Equilibrium (HWE) and an increased mortality rate of homozygous individuals, consistent with a recessive deleterious effect of the deletion mutation. We here examine if similar deviations from HWE can be observed in the newly released UKB Whole Exome Sequencing (WES) data and in the sequencing data of the Genome Aggregation Database (gnomAD). We also examine the reliability of the genotype calls in the UKB array data. The UKB genotyping array probe targeting CCR5-Δ32 (rs62625034) and the WES of Δ32 are strongly correlated (r2 = 0.97). This contrasts to tag SNPs of CCR5-Δ32 in the UKB which have high missing data rates and imputation errors rates. We also show that, while different data sets are subject to different biases, both the UKB-WES and the gnomAD data have a deficiency of homozygous CCR5-Δ32 individuals compared to the HWE expectation (combined P-value < 0.01), consistent with an increased mortality rate in homozygotes. Finally, we perform a survival analysis on data from parents of UKB volunteers, that, while underpowered, is also consistent with the original report of a deleterious effect of CCR5-Δ32 in the homozygous state.


Stats ◽  
2020 ◽  
Vol 3 (1) ◽  
pp. 34-39
Author(s):  
Vladimir Ostrovski

We consider testing equivalence to Hardy–Weinberg Equilibrium in case of multiple alleles. Two different test statistics are proposed for this test problem. The asymptotic distribution of the test statistics is derived. The corresponding tests can be carried out using asymptotic approximation. Alternatively, the variance of the test statistics can be estimated by the bootstrap method. The proposed tests are applied to three real data sets. The finite sample performance of the tests is studied by simulations, which are inspired by the real data sets.


2000 ◽  
Vol 76 (3) ◽  
pp. 305-317 ◽  
Author(s):  
HAJA N. KADARMIDEEN ◽  
LUC L. G. JANSS ◽  
JACK C. M. DEKKERS

A generalized interval mapping (GIM) method to map quantitative trait loci (QTL) for binary polygenic traits in a multi-family half-sib design is developed based on threshold theory and implemented using a Newton–Raphson algorithm. Statistical power and bias of QTL mapping for binary traits by GIM is compared with linear regression interval mapping (RIM) using simulation. Data on 20 paternal half-sib families were simulated with two genetic markers that bracketed an additive QTL. Data simulated and analysed were: (1) data on the underlying normally distributed liability (NDL) scale, (2) binary data created by truncating NDL data based on three thresholds yielding data sets with three different incidences, and (3) NDL data with polygenic and QTL effects reduced by a proportion equal to the ratio of the heritabilities on the binary versus NDL scale (reduced-NDL). Binary data were simulated with and without systematic environmental (herd) effects in an unbalanced design. GIM and RIM gave similar power to detect the QTL and similar estimates of QTL location, effects and variances. Presence of fixed effects caused differences in bias between RIM and GIM, where GIM showed smaller bias which was affected less by incidence. The original NDL data had higher power and lower bias in QTL parameter estimates than binary and reduced-NDL data. RIM for reduced-NDL and binary data gave similar power and estimates of QTL parameters, indicating that the impact of the binary nature of data on QTL analysis is equivalent to its impact on heritability.


2021 ◽  
Author(s):  
William S Pearman ◽  
Lara Urban ◽  
Alana Alexander

Reduced representation sequencing (RRS) is a widely used method to assay the diversity of genetic loci across the genome of an organism. The dominant class of RRS approaches assay loci associated with restriction sites within the genome (restriction site associated DNA sequencing, or RADseq). RADseq is frequently applied to non-model organisms since it enables population genetic studies without relying on well-characterized reference genomes. However, RADseq requires the use of many bioinformatic filters to ensure the quality of genotyping calls. These filters can have direct impacts on population genetic inference, and therefore require careful consideration. One widely used filtering approach is the removal of loci which do not conform to expectations of Hardy-Weinberg equilibrium (HWE). Despite being widely used, we show that this filtering approach is rarely described in sufficient detail to enable replication. Furthermore, through analyses of in silico and empirical datasets we show that some of the most widely used HWE filtering approaches dramatically impact inference of population structure. In particular, the removal of loci exhibiting departures from HWE after pooling across samples significantly reduces the degree of inferred population structure within a dataset (despite this approach being widely used). Based on these results, we provide recommendations for best practice regarding the implementation of HWE filtering for RADseq datasets.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Prashant Singh ◽  
Sylvain Santoni ◽  
Audrey Weber ◽  
Patrice This ◽  
Jean-Pierre Péros

Abstract Impacts of plant genotype on microbial assemblage in the phyllosphere (above-ground parts of plants, which predominantly consists of the set of photosynthetic leaves) of Vitis vinifera cultivars have been studied previously but the impact of grape species (under the grape family Vitaceae) was never investigated. Considering the fact, that the phyllosphere microbiome may have profound effects on host plant health and its performance traits, studying the impact of grape species in microbial taxa structuring in the phyllosphere could be of crucial importance. We performed 16S and ITS profiling (for bacteria and fungi respectively) to access genus level characterization of the microflora present in the leaf phyllosphere of five species within this plant family, sampled in two successive years from the repository situated in the Mediterranean. We also performed α and β-diversity analyses with robust statistical estimates to test the impacts of grape species and growing year, over a two-year period. Our results indicated the presence of complex microbial diversity and assemblages in the phyllosphere with a significant effect of both factors (grape species and growing year), the latter effect is being more pronounced. We also compared separate normalization methods for high-throughput microbiome data-sets followed by differential taxa abundance analyses. The results suggested the predominance of a particular normalization method over others. This also indicated the need for more robust normalization methods to study the differential taxa abundance among groups in microbiome research.


2017 ◽  
Author(s):  
Wei Hao ◽  
John D. Storey

AbstractTesting for Hardy-Weinberg equilibrium (HWE) is an important component in almost all analyses of population genetic data. Genetic markers that violate HWE are often treated as special cases; for example, they may be flagged as possible genotyping errors or they may be investigated more closely for evolutionary signatures of interest. The presence of population structure is one reason why genetic markers may fail a test of HWE. This is problematic because almost all natural populations studied in the modern setting show some degree of structure. Therefore, it is important to be able to detect deviations from HWE for reasons other than structure. To this end, we extend statistical tests of HWE to allow for population structure, which we call a test of “structural HWE” (sHWE). Additionally, our new test allows one to automatically choose tuning parameters and identify accurate models of structure. We demonstrate our approach on several important studies, provide theoretical justification for the test, and present empirical evidence for its utility. We anticipate the proposed test will be useful in a broad range of analyses of genome-wide population genetic data.


Sign in / Sign up

Export Citation Format

Share Document