scholarly journals Posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure

2015 ◽  
Vol 112 (26) ◽  
pp. E3441-E3450 ◽  
Author(s):  
David Mimno ◽  
David M. Blei ◽  
Barbara E. Engelhardt

Admixture models are a ubiquitous approach to capture latent population structure in genetic samples. Despite the widespread application of admixture models, little thought has been devoted to the quality of the model fit or the accuracy of the estimates of parameters of interest for a particular study. Here we develop methods for validating admixture models based on posterior predictive checks (PPCs), a Bayesian method for assessing the quality of fit of a statistical model to a specific dataset. We develop PPCs for five population-level statistics of interest: within-population genetic variation, background linkage disequilibrium, number of ancestral populations, between-population genetic variation, and the downstream use of admixture parameters to correct for population structure in association studies. Using PPCs, we evaluate the quality of the admixture model fit to four qualitatively different population genetic datasets: the population reference sample (POPRES) European individuals, the HapMap phase 3 individuals, continental Indians, and African American individuals. We found that the same model fitted to different genomic studies resulted in highly study-specific results when evaluated using PPCs, illustrating the utility of PPCs for model-based analyses in large genomic studies.

Genome ◽  
1991 ◽  
Vol 34 (3) ◽  
pp. 396-406 ◽  
Author(s):  
Hedi Baatout ◽  
Daniel Combes ◽  
Mohamed Marrakchi

Several samples of wild populations of two subspecies of the genus Hedysarum (H. spinosissimum subspecies capitatum, an outcrosser, and H. spinosissimum subspecies euspinosissimum, a selfer) were examined with respect to variability of 25 quantitative characters and allozyme variation at 13 loci. The amount of phenotypic and genetic variation within and among populations was documented. For most of the 25 quantitative characters, the differences between population means and between the total variances of the populations were higher in the selfer than in the outbreeder. Significant among-population genetic variation was found for nearly all characters in the two subspecies, but the outbreeder had higher within-population variability than the selfer with heterogeneity among characters. However, allozyme variation at 13 loci in about the same number of populations showed higher levels of genetic variability in the outcrossing subspecies capitatum compared with the selfing subspecies euspinosissimum, based on measures of mean number of alleles per locus, mean proportion of polymorphic loci, and mean heterozygosity. Therefore, H. spinosissimum subsp. capitatum appeared to be highly polymorphic in contrast to the greater monomorphism within populations of H. spinosissimum subsp. euspinosissimum. The genetic affinities of different populations of a subspecies are uniformly high, with Nei's genetic identity ranging from 0.983 to 0.997 in the selfing subspecies euspinosissimum and from 0.922 to 1.000 in the outcrossing subspecies capitatum.Key words: Hedysarum, genetic variation, populations, electrophoresis.


Genetics ◽  
1997 ◽  
Vol 146 (2) ◽  
pp. 471-479 ◽  
Author(s):  
Michael Travisano

The effect of environment on adaptation and divergence was examined in two sets of populations of Escherichia coli selected for 1000 generations in either maltose- or glucose-limited media. Twelve replicate populations selected in maltose-limited medium improved in fitness in the selected environment, by an average of 22.5%. Statistically significant among-population genetic variation for fitness was observed during the course of the propagation, but this variation was small relative to the fitness improvement. Mean fitness in a novel nutrient environment, glucose-limited medium, improved to the same extent as in the selected environment, with no statistically significant among-population genetic variation. In contrast, 12 replicate populations previously selected for 1000 generations in glucose-limited medium showed no improvement, as a group, in fitness in maltose-limited medium and substantial genetic variation. This asymmetric pattern of correlated responses suggests that small changes in the environment can have profound effects on adaptation and divergence.


Heredity ◽  
2013 ◽  
Vol 111 (1) ◽  
pp. 77-85 ◽  
Author(s):  
T M Bradford ◽  
M Adams ◽  
M T Guzik ◽  
W F Humphreys ◽  
A D Austin ◽  
...  

PLoS Genetics ◽  
2021 ◽  
Vol 17 (7) ◽  
pp. e1009665
Author(s):  
Olivier François ◽  
Clément Gain

Wright’s inbreeding coefficient, FST, is a fundamental measure in population genetics. Assuming a predefined population subdivision, this statistic is classically used to evaluate population structure at a given genomic locus. With large numbers of loci, unsupervised approaches such as principal component analysis (PCA) have, however, become prominent in recent analyses of population structure. In this study, we describe the relationships between Wright’s inbreeding coefficients and PCA for a model of K discrete populations. Our theory provides an equivalent definition of FST based on the decomposition of the genotype matrix into between and within-population matrices. The average value of Wright’s FST over all loci included in the genotype matrix can be obtained from the PCA of the between-population matrix. Assuming that a separation condition is fulfilled and for reasonably large data sets, this value of FST approximates the proportion of genetic variation explained by the first (K − 1) principal components accurately. The new definition of FST is useful for computing inbreeding coefficients from surrogate genotypes, for example, obtained after correction of experimental artifacts or after removing adaptive genetic variation associated with environmental variables. The relationships between inbreeding coefficients and the spectrum of the genotype matrix not only allow interpretations of PCA results in terms of population genetic concepts but extend those concepts to population genetic analyses accounting for temporal, geographical and environmental contexts.


Author(s):  
Asher D. Cutter

Chapter 3, “Quantifying genetic variation at the molecular level,” introduces quantitative methods for measuring variation directly in DNA sequences to help decipher fundamental properties of populations and what they can tell us about evolution. It provides an overview of the evolutionary factors that contribute to genetic variation, like mutational input, effective population size, genetic drift, migration rate, and models of migration. This chapter surveys the principal ways to measure and summarize polymorphisms within a single population and across multiple populations of a species, including heterozygosity, nucleotide polymorphism estimators of θ‎, the site frequency spectrum, and F ST, and by providing illustrative natural examples. Populations are where evolution starts, after mutations arise as the spark of population genetic variation, and Chapter 3 describes how to quantify the variation to connect observations to predictions about how much polymorphism there ought to be under different circumstances.


2019 ◽  
Vol 67 (3) ◽  
pp. 172 ◽  
Author(s):  
Siegfried L. Krauss ◽  
Janet M. Anthony

Tetratheca erubescens is a narrowly endemic species including ~6300 plants restricted to a 2-km2 distribution on the south Koolyanobbing Range Banded Ironstone Formation (BIF) in Western Australia. A key objective of the present study was to characterise population genetic variation, and its spatial structuring across the entire distribution of T. erubescens, to enable a quantification of genetic variation that may be affected by proposed mining of the BIF. In total, 436 plants (~30 at each of 14 sites) from across the entire distribution were sampled, genotyped and scored for allelic variation at 11 polymorphic microsatellite loci. Fifty-nine alleles were detected (mean alleles per locus=5.36, range 2–10), and observed heterozygosity was low to moderate and typically lower than expected heterozygosity across all loci (mean observed heterozygosity (Ho)=0.41, mean expected heterozygosity (He)=0.48). Given the restricted distribution of T. erubescens, overall genetic structuring was surprisingly strong (overall FST=0.098). A range-wide spatial autocorrelation analysis indicated a significant positive genetic correlation at distances up to 450m, largely corresponding to the scale of more-or-less continuous distribution within each of two geographic clusters. In support, a STRUCTURE analysis identified an optimal number of genetic clusters as K=2, with assignment of individuals to one of two genetic clusters corresponding with the main geographic clusters. The genetic impact of proposed mining on T. erubescens was assessed on the basis of identifying plants within the proposed mine footprint (all plants from 4 of 14 sites). Repeating analyses of genetic variation after removal of these samples, and comparing to the complete dataset adjusted for sample size, resulted in the loss of one (very rare: overall frequency=0.001) allele (i.e. 58 of 59 alleles (98.3%) were recovered). All other parameters of genetic variation (mean Na, Ne, I, Ho, He, F) were unaffected. Consequently, although up to 22% of all plants fall within the mine footprint and, therefore, may be lost, <2% of alleles detected will be lost, and other genetic parameters remained unaffected. Although these results suggest that the proposed mining will result in a negligible impact on the assessed genetic variation and its spatial structuring in T. erubescens, further research on impacts to, and management of, quantitative genetic variation and key population genetic processes is required.


Sign in / Sign up

Export Citation Format

Share Document