scholarly journals An empirical examination of sample size effects on population demographic estimates in birds using single nucleotide polymorphism (SNP) data

Author(s):  
Jessica F. McLaughlin ◽  
Kevin Winker

AbstractSample size is a critical aspect of study design in population genomics research, yet few empirical studies have examined the impacts of small sample sizes. We used datasets from eight diverging bird lineages to make pairwise comparisons at different levels of taxonomic divergence (populations, subspecies, and species). Our data are from loci linked to ultraconserved elements (UCEs) and our analyses used one SNP per locus. All individuals were genotyped at all loci (McLaughlin et al. 2020). We estimated population demographic parameters (effective population size, migration rate, and time since divergence) in a coalescent framework using Diffusion Approximation for Demographic Inference (δaδi; Gutenkunst et al. 2009), an allele frequency spectrum (AFS) method. Using divergence-with-gene-flow models optimized with full datasets, we subsampled at sequentially smaller sample sizes from full datasets of 6 – 8 diploid individuals per population (with both alleles called) down to 1:1, and then we compared estimates and their changes in accuracy. Accuracy was strongly affected by sample size, with considerable differences among estimated parameters and among lineages. Effective population size parameters (ν) tended to be underestimated at low sample sizes (fewer than 3 diploid individuals per population, or 6:6 haplotypes in coalescent terms). Migration (m) was fairly consistently estimated until ≤ 2 individuals per population, and no consistent trend of over- or underestimation was found in either time since divergence (T) or Θ (4Nrefμ). Lineages that were taxonomically recognized above the population level (subspecies and species pairs; i.e., deeper divergences) tended to have lower variation in scaled root mean square error (SMRSE) of parameter estimation at smaller sample sizes than population-level divergences, and many parameters were estimated accurately down to 3 diploid individuals per population. Shallower divergence levels (i.e., populations) often required at least 5 individuals per population for reliable demographic inferences using this approach. Although divergence levels might be unknown at the outset of study design, our results provide a framework for planning appropriate sampling and for interpreting results if smaller sample sizes must be used.

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9939
Author(s):  
Jessica F. McLaughlin ◽  
Kevin Winker

Sample size is a critical aspect of study design in population genomics research, yet few empirical studies have examined the impacts of small sample sizes. We used datasets from eight diverging bird lineages to make pairwise comparisons at different levels of taxonomic divergence (populations, subspecies, and species). Our data are from loci linked to ultraconserved elements and our analyses used one single nucleotide polymorphism per locus. All individuals were genotyped at all loci, effectively doubling sample size for coalescent analyses. We estimated population demographic parameters (effective population size, migration rate, and time since divergence) in a coalescent framework using Diffusion Approximation for Demographic Inference, an allele frequency spectrum method. Using divergence-with-gene-flow models optimized with full datasets, we subsampled at sequentially smaller sample sizes from full datasets of 6–8 diploid individuals per population (with both alleles called) down to 1:1, and then we compared estimates and their changes in accuracy. Accuracy was strongly affected by sample size, with considerable differences among estimated parameters and among lineages. Effective population size parameters (ν) tended to be underestimated at low sample sizes (fewer than three diploid individuals per population, or 6:6 haplotypes in coalescent terms). Migration (m) was fairly consistently estimated until <2 individuals per population, and no consistent trend of over-or underestimation was found in either time since divergence (T) or theta (Θ = 4Nrefμ). Lineages that were taxonomically recognized above the population level (subspecies and species pairs; that is, deeper divergences) tended to have lower variation in scaled root mean square error of parameter estimation at smaller sample sizes than population-level divergences, and many parameters were estimated accurately down to three diploid individuals per population. Shallower divergence levels (i.e., populations) often required at least five individuals per population for reliable demographic inferences using this approach. Although divergence levels might be unknown at the outset of study design, our results provide a framework for planning appropriate sampling and for interpreting results if smaller sample sizes must be used.


Heredity ◽  
2019 ◽  
Vol 124 (2) ◽  
pp. 299-312 ◽  
Author(s):  
Tetsuya Akita

Abstract In this study, we developed a nearly unbiased estimator of contemporary effective mother size in a population, which is based on a known maternal half-sibling relationship found within the same cohort. Our method allows for variance of the average number of offspring per mother (i.e., parental variation, such as age-specific fecundity) and variance of the number of offspring among mothers with identical reproductive potential (i.e., nonparental variation, such as family-correlated survivorship). We also developed estimators of the variance and coefficient of variation of contemporary effective mother size and qualitatively evaluated the performance of the estimators by running an individual-based model. Our results provide guidance for (i) a sample size to ensure the required accuracy and precision when the order of effective mother size is available and (ii) a degree of uncertainty regarding the estimated effective mother size when information about the size is unavailable. To the best of our knowledge, this is the first report to demonstrate the derivation of a nearly unbiased estimator of effective population size; however, its current application is limited to effective mother size and situations, in which the sample size is not particularly small and maternal half-sibling relationships can be detected without error. The results of this study demonstrate the usefulness of a sibship assignment method for estimating effective population size; in addition, they have the potential to greatly widen the scope of genetic monitoring, especially in the situation of small sample size.


2018 ◽  
Vol 48 (14) ◽  
pp. 1149-1154 ◽  
Author(s):  
Lúcio M. Barbosa ◽  
Bruna C. Barros ◽  
Moreno de Souza Rodrigues ◽  
Luciano K. Silva ◽  
Mitermayer G. Reis ◽  
...  

2019 ◽  
Author(s):  
M. Elise Lauterbur

AbstractPopulation genetics employs two major models for conceptualizing genetic relationships among individuals – outcome-driven (coalescent) and process-driven (forward). These models are complementary, but the basic Kingman coalescent and its extensions make fundamental assumptions to allow analytical approximations: a constant effective population size much larger than the sample size. These make the probability of multiple coalescent events per generation negligible. Although these assumptions are often violated in species of conservation concern, conservation genetics often uses coalescent models of effective population sizes and trajectories in endangered species. Despite this, the effect of very small effective population sizes, and their interaction with bottlenecks and sample sizes, on such analyses of genetic diversity remains unexplored. Here, I use simulations to analyze the influence of small effective population size, population decline, and their relationship with sample size, on coalescent-based estimates of genetic diversity. Compared to forward process-based estimates, coalescent models significantly overestimate genetic diversity in oversampled populations with very small effective sizes. When sampled soon after a decline, coalescent models overestimate genetic diversity in small populations regardless of sample size. Such overestimates artificially inflate estimates of both bottleneck and population split times. For conservation applications with small effective population sizes, forward simulations that do not make population size assumptions are computationally tractable and should be considered instead of coalescent-based models. These findings underscore the importance of the theoretical basis of analytical techniques as applied to conservation questions.


2018 ◽  
Author(s):  
Stephan Geuter ◽  
Guanghao Qi ◽  
Robert C. Welsh ◽  
Tor D. Wager ◽  
Martin A. Lindquist

AbstractMulti-subject functional magnetic resonance imaging (fMRI) analysis is often concerned with determining whether there exists a significant population-wide ‘activation’ in a comparison between two or more conditions. Typically this is assessed by testing the average value of a contrast of parameter estimates (COPE) against zero in a general linear model (GLM) analysis. In this work we investigate several aspects of this type of analysis. First, we study the effects of sample size on the sensitivity and reliability of the group analysis, allowing us to evaluate the ability of small sampled studies to effectively capture population-level effects of interest. Second, we assess the difference in sensitivity and reliability when using volumetric or surface based data. Third, we investigate potential biases in estimating effect sizes as a function of sample size. To perform this analysis we utilize the task-based fMRI data from the 500-subject release from the Human Connectome Project (HCP). We treat the complete collection of subjects (N = 491) as our population of interest, and perform a single-subject analysis on each subject in the population. We investigate the ability to recover population level effects using a subset of the population and standard analytical techniques. Our study shows that sample sizes of 40 are generally able to detect regions with high effect sizes (Cohen’s d > 0.8), while sample sizes closer to 80 are required to reliably recover regions with medium effect sizes (0.5 < d < 0.8). We find little difference in results when using volumetric or surface based data with respect to standard mass-univariate group analysis. Finally, we conclude that special care is needed when estimating effect sizes, particularly for small sample sizes.


2020 ◽  
Vol 10 (4) ◽  
pp. 1929-1937 ◽  
Author(s):  
Florianne Marandel ◽  
Grégory Charrier ◽  
Jean‐Baptiste Lamy ◽  
Sabrina Le Cam ◽  
Pascal Lorance ◽  
...  

2017 ◽  
Author(s):  
Aaron P. Ragsdale ◽  
Ryan N. Gutenkunst

AbstractPopulation demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are indeed more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster. Notably, using both single– and two-locus statistics, we found substantially lower estimates of effective population size than previous works. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference.


Genetics ◽  
1997 ◽  
Vol 145 (3) ◽  
pp. 833-846 ◽  
Author(s):  
Jody Hey ◽  
John Wakeley

Population genetic models often use a population recombination parameter 4Nc, where N is the effective population size and c is the recombination rate per generation. In many ways 4Nc is comparable to 4Nu, the population mutation rate. Both combine genome level and population level processes, and together they describe the rate of production of genetic variation in a population. However, 4Nc is more difficult to estimate. For a population sample of DNA sequences, historical recombination can only be detected if polymorphisms exist, and even then most recombination events are not detectable. This paper describes an estimator of 4Nc, hereafter designated γ (gamma), that was developed using a coalescent model for a sample of four DNA sequences with recombination. The reliability of γ was assessed using multiple coalescent simulations. In general γ has low to moderate bias, and the reliability of γ is comparable, though less, than that for a widely used estimator of 4Nu. If there exists an independent estimate of the recombination rate (per generation, per base pair), γ can be used to estimate the effective population size or the neutral mutation rate.


1981 ◽  
Vol 38 (3) ◽  
pp. 209-216 ◽  
Author(s):  
William G. Hill

SUMMARYA method is proposed for estimating effective population size (N) from data on linkage disequilibrium among neutral genes at several polymorphic loci or restriction sites. The efficiency of the method increases with larger sample size and more tightly linked genes; but for very tightly linked genes estimates of N are more dependent on long-term than on recent population history. Two sets of data are analysed as examples.


Sign in / Sign up

Export Citation Format

Share Document