scholarly journals Fast and accurate approximation of the joint site frequency spectrum of multiple populations

2020 ◽  
Author(s):  
Ethan M. Jewett

AbstractThe site frequency spectrum (SFS) is a statistic that summarizes the distribution of derived allele frequencies in a sample of DNA sequences. The SFS provides useful information about genetic variation within and among populations and it can used to make population genetic inferences. Methods for computing the SFS based on the diffusion approximation are computationally efficient when computing all terms of the SFS simultaneously and they can handle complicated demographic scenarios. However, in practice it is sometimes only necessary to compute a subset of terms of the SFS, in which case coalescent-based methods can achieve greater computational efficiency. Here, we present simple and accurate approximate formulas for the expected joint SFS for multiple populations connected by migration. Compared with existing exact approaches, our approximate formulas greatly reduce the complexity of computing each entry of the SFS and have simple forms. The computational complexity of our method depends on the index of the entry to be computed, rather than on the sample size, and the accuracy of our approximation improves as the sample size increases.

Author(s):  
Asher D. Cutter

Chapter 3, “Quantifying genetic variation at the molecular level,” introduces quantitative methods for measuring variation directly in DNA sequences to help decipher fundamental properties of populations and what they can tell us about evolution. It provides an overview of the evolutionary factors that contribute to genetic variation, like mutational input, effective population size, genetic drift, migration rate, and models of migration. This chapter surveys the principal ways to measure and summarize polymorphisms within a single population and across multiple populations of a species, including heterozygosity, nucleotide polymorphism estimators of θ‎, the site frequency spectrum, and F ST, and by providing illustrative natural examples. Populations are where evolution starts, after mutations arise as the spark of population genetic variation, and Chapter 3 describes how to quantify the variation to connect observations to predictions about how much polymorphism there ought to be under different circumstances.


2014 ◽  
Author(s):  
Yu-Ping Poh ◽  
Vera S Domingues ◽  
Hopi Hoekstra ◽  
Jeffrey Jensen

Identifying adaptively important loci in recently bottlenecked populations?be it natural selection acting on a population following the colonization of novel habitats in the wild, or artificial selection during the domestication of a breed?remains a major challenge. Here we report the results of a simulation study examining the performance of available population-genetic tools for identifying genomic regions under selection. To illustrate our findings, we examined the interplay between selection and demography in two species of Peromyscus mice, for which we have independent evidence of selection acting on phenotype as well as functional evidence identifying the underlying genotype. With this unusual information, we tested whether population-genetic-based approaches could have been utilized to identify the adaptive locus. Contrary to published claims, we conclude that the use of the background site frequency spectrum as a null model is largely ineffective in bottlenecked populations. Results are quantified both for site frequency spectrum and linkage disequilibrium-based predictions, and are found to hold true across a large parameter space that encompasses many species and populations currently under study. These results suggest that the genomic footprint left by selection on both new and standing variation in strongly bottlenecked populations will be difficult, if not impossible, to find using current approaches.


2016 ◽  
Author(s):  
Luca Ferretti ◽  
Alice Ledda ◽  
Thomas Wiehe ◽  
Guillaume Achaz ◽  
Sebastian E. Ramos-Onsins

AbstractWe investigate the dependence of the site frequency spectrum (SFS) on the topological structure of genealogical trees. We show that basic population genetic statistics – for instance estimators ofθor neutrality tests such as Tajima’sD– can be decomposed into components of waiting times between coalescent events and of tree topology. Our results clarify the relative impact of the two components on these statistics. We provide a rigorous interpretation of positive or negative values of an important class of neutrality tests in terms of the underlying tree shape. In particular, we show that values of Tajima’sDand Fay and Wu’sHdepend in a direct way on a peculiar measure of tree balance which is mostly determined by the root balance of the tree. We present a new test for selection in the same class as Fay and Wu’sHand discuss its interpretation and power. Finally, we determine the trees corresponding to extreme expected values of these neutrality tests and present formulae for these extreme values as a function of sample size and number of segregating sites.


2019 ◽  
Vol 66 (3) ◽  
pp. 275-283 ◽  
Author(s):  
Andrés Martínez-Aquino ◽  
Víctor M Vidal-Martínez ◽  
F Sara Ceccarelli ◽  
Oscar Méndez ◽  
Lilia C Soler-Jiménez ◽  
...  

Abstract Despite the diversity and ecological importance of cestodes, there is a paucity of studies on their life stages (i.e., complete lists of intermediate, paratenic, and definitive hosts) and genetic variation. For example, in the Gulf of Mexico (GoM) 98 species of cestodes have been reported to date; however, data on their intraspecific genetic variation and population genetic studies are lacking. The trypanorhynch cestode, Oncomegas wageneri, is found (among other places) off the American Western Atlantic Coast, including the GoM, and has been reported as an adult from stingrays and from several teleost species in its larval form (as plerocerci). This study represents the first report of 2 previously unregistered definitive hosts for O. wageneri, namely the Atlantic sharpnose shark Rhizoprionodon terraenovae and the southern stingray Hypanus americanus. In this work, partial sequences of the 28S (region D1–D2) ribosomal DNA were analyzed to include O. wageneri within an eutetrarhynchoid phylogenetic framework. All O. wageneri individuals (which included plerocerci and adults) were recovered as monophyletic and Oncomegas celatus was identified as the sister species of O. wageneri. Furthermore, population genetic analyses of O. wageneri from the southern GoM were carried out using DNA sequences of the mitochondrial cytochrome c oxidase subunit 1 (COI) gene, which reflected high genetic variation and a lack of genetic structure among the 9 oceanographic sampling sites. Based on these results, O. wageneri is panmictic in the southern GoM. More extensive sampling along the species entire distribution is necessary to make more accurate inferences of population genetics of O. wageneri.


2015 ◽  
Author(s):  
Jochen Blath ◽  
Mathias C Cronjager ◽  
Bjarki Eldon ◽  
Matthias Hammer

We give recursions for the expected site-frequency spectrum associated with so-calledXi-coalescents, that is exchangeable coalescents which admitsimultaneous multiple mergersof ancestral lineages. Xi-coalescents arise, for example, in association with population models of skewed offspring distributions with diploidy, recurrent advantageous mutations, or strong bottlenecks. In contrast, the simplerLambda-coalescentsadmit multiple mergers of lineages, but at most one such merger each time. Xi-coalescents, as well as Lambda-coalescents, can predict an excess of singletons, compared to the Kingman coalescent. We compare estimates of coalescent parameters when Xi-coalescents are applied to data generated by Lambda-coalescents, and vice versa. In general, Xi-coalescents predict fewer singletons than corresponding Lambda-coalescents, but a higher count of mutations of size larger than singletons. We fit examples of Xi-coalescents to unfolded site-frequency spectra obtained for autosomal loci of the diploid Atlantic cod, and obtain different coalescent parameter estimates than obtained with corresponding Lambda-coalescents. Our results provide new inference tools, and suggest that for autosomal population genetic data from diploid or polyploid highly fecund populations who may have skewed offspring distributions, one should not apply Lambda-coalescents, but Xi-coalescents.


Genetics ◽  
1989 ◽  
Vol 123 (3) ◽  
pp. 585-595 ◽  
Author(s):  
F Tajima

Abstract The relationship between the two estimates of genetic variation at the DNA level, namely the number of segregating sites and the average number of nucleotide differences estimated from pairwise comparison, is investigated. It is found that the correlation between these two estimates is large when the sample size is small, and decreases slowly as the sample size increases. Using the relationship obtained, a statistical method for testing the neutral mutation hypothesis is developed. This method needs only the data of DNA polymorphism, namely the genetic variation within population at the DNA level. A simple method of computer simulation, that was used in order to obtain the distribution of a new statistic developed, is also presented. Applying this statistical method to the five regions of DNA sequences in Drosophila melanogaster, it is found that large insertion/deletion (greater than 100 bp) is deleterious. It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.


2018 ◽  
Author(s):  
Andrew Melfi ◽  
Divakar Viswanath

AbstractThe first terms of the Wright-Fisher (WF) site frequency spectrum that follow the coalescent approximation are determined precisely, with a view to understanding the accuracy of the coalescent approximation for large samples. The perturbing terms show that the probability of a single mutant in the sample (singleton probability) is elevated in WF but the rest of the frequency spectrum is lowered. A part of the perturbation can be attributed to a mismatch in rates of merger between WF and the coalescent. The rest of it can be attributed to the difference in the way WF and the coalescent partition children between parents. In particular, the number of children of a parent is approximately Poisson under WF and approximately geometric under the coalescent. Whereas the mismatch in rates raises the probability of singletons under WF, its offspring distribution being approximately Poisson lowers it. The two effects are of opposite sense everywhere except at the tail of the frequency spectrum. The WF frequency spectrum begins to depart from that of the coalescent only for sample sizes that are comparable to the population size. These conclusions are confirmed by a separate analysis that assumes the sample size n to be equal to the population size N. Partly thanks to the canceling effects, the total variation distance of WF minus coalescent is 0.12/ log N for a population sized sample with n = N, which is only 1% for N = 2 × 104.


Parasitology ◽  
2014 ◽  
Vol 142 (S1) ◽  
pp. S98-S107 ◽  
Author(s):  
HSIAO-HAN CHANG ◽  
DANIEL L. HARTL

SUMMARYDetecting signals of selection in the genome of malaria parasites is a key to identify targets for drug and vaccine development. Malaria parasites have a unique life cycle alternating between vector and host organism with a population bottleneck at each transition. These recurrent bottlenecks could influence the patterns of genetic diversity and the power of existing population genetic tools to identify sites under positive selection. We therefore simulated the site-frequency spectrum of a beneficial mutant allele through time under the malaria life cycle. We investigated the power of current population genetic methods to detect positive selection based on the site-frequency spectrum as well as temporal changes in allele frequency. We found that a within-host selective advantage is difficult to detect using these methods. Although a between-host transmission advantage could be detected, the power is decreased when compared with the classical Wright–Fisher (WF) population model. Using an adjusted null site-frequency spectrum that takes the malaria life cycle into account, the power of tests based on the site-frequency spectrum to detect positive selection is greatly improved. Our study demonstrates the importance of considering the life cycle in genetic analysis, especially in parasites with complex life cycles.


Sign in / Sign up

Export Citation Format

Share Document