scholarly journals The third moments of the site frequency spectrum

2017 ◽  
Author(s):  
A. Klassmann ◽  
L. Ferretti

AbstractThe analysis of patterns of segregating (i.e. polymorphic) sites in aligned sequences is routine in population genetics. Quantities of interest include the total number of segregating sites and the number of sites with mutations of different frequencies, the so-called site frequency spectrum. For neutrally evolving sequences, some classical results are available, including the expected value and variance of the spectrum in the Kingman coalescent model without recombination as calculated by Fu (1995).In this work, we use similar techniques to compute the third moments of the site frequency spectrum without recombination. We also account for the linkage pattern of mutations, yielding the full haplotype spectrum of three polymorphic sites. Based on these results, we derive analytical results for the bias of Tajima’s D and other neutrality tests.As an application, we obtain the second moments of the spectrum of linked sites, which is related to the neutral spectrum of chromosomal inversions and other structural variants. These moments can be used for the normalisation of new neutrality tests relying on these spectra.


2016 ◽  
Author(s):  
Luca Ferretti ◽  
Alice Ledda ◽  
Thomas Wiehe ◽  
Guillaume Achaz ◽  
Sebastian E. Ramos-Onsins

AbstractWe investigate the dependence of the site frequency spectrum (SFS) on the topological structure of genealogical trees. We show that basic population genetic statistics – for instance estimators ofθor neutrality tests such as Tajima’sD– can be decomposed into components of waiting times between coalescent events and of tree topology. Our results clarify the relative impact of the two components on these statistics. We provide a rigorous interpretation of positive or negative values of an important class of neutrality tests in terms of the underlying tree shape. In particular, we show that values of Tajima’sDand Fay and Wu’sHdepend in a direct way on a peculiar measure of tree balance which is mostly determined by the root balance of the tree. We present a new test for selection in the same class as Fay and Wu’sHand discuss its interpretation and power. Finally, we determine the trees corresponding to extreme expected values of these neutrality tests and present formulae for these extreme values as a function of sample size and number of segregating sites.



Genetics ◽  
1995 ◽  
Vol 140 (2) ◽  
pp. 783-796 ◽  
Author(s):  
J M Braverman ◽  
R R Hudson ◽  
N L Kaplan ◽  
C H Langley ◽  
W Stephan

Abstract The level of DNA sequence variation is reduced in regions of the Drosophila melanogaster genome where the rate of crossing over per physical distance is also reduced. This observation has been interpreted as support for the simple model of genetic hitchhiking, in which directional selection on rare variants, e.g., newly arising advantageous mutants, sweeps linked neutral alleles to fixation, thus eliminating polymorphisms near the selected site. However, the frequency spectra of segregating sites of several loci from some populations exhibiting reduced levels of nucleotide diversity and reduced numbers of segregating sites did not appear different from what would be expected under a neutral equilibrium model. Specifically, a skew toward an excess of rare sites was not observed in these samples, as measured by Tajima's D. Because this skew was predicted by a simple hitchhiking model, yet it had never been expressed quantitatively and compared directly to DNA polymorphism data, this paper investigates the hitchhiking effect on the site frequency spectrum, as measured by Tajima's D and several other statistics, using a computer simulation model based on the coalescent process and recurrent hitchhiking events. The results presented here demonstrate that under the simple hitchhiking model (1) the expected value of Tajima's D is large and negative (indicating a skew toward rare variants), (2) that Tajima's test has reasonable power to detect a skew in the frequency spectrum for parameters comparable to those from actual data sets, and (3) that the Tajima's Ds observed in several data sets are very unlikely to have been the result of simple hitchhiking. Consequently, the simple hitchhiking model is not a sufficient explanation for the DNA polymorphism at those loci exhibiting a decreased number of segregating sites yet not exhibiting a skew in the frequency spectrum.



Genetics ◽  
2017 ◽  
Vol 207 (1) ◽  
pp. 229-240 ◽  
Author(s):  
Luca Ferretti ◽  
Alice Ledda ◽  
Thomas Wiehe ◽  
Guillaume Achaz ◽  
Sebastian E. Ramos-Onsins


2018 ◽  
Vol 120 ◽  
pp. 16-28 ◽  
Author(s):  
A. Klassmann ◽  
L. Ferretti


2017 ◽  
Author(s):  
Berit Lindum Waltoft ◽  
Asger Hobolth

AbstractThe variability in population size is a key quantity for understanding the evolutionary history of a species. We present a new method, CubSFS, for estimating the changes in population size of a panmictic population from the site frequency spectrum. First, we provide a straightforward proof for the expression of the expected site frequency spectrum depending only on the population size. Our derivation is based on an eigenvalue decomposition of the instantaneous coalescent rate matrix. Second, we solve the inverse problem of determining the variability in population size from an observed SFS. Our solution is based on a cubic spline for the population size. The cubic spline is determined by minimizing the weighted average of two terms, namely (i) the goodness of fit to the SFS, and (ii) a penalty term based on the smoothness of the changes. The weight is determined by cross-validation. The new method is validated on simulated demographic histories and applied on data from nine different human populations.



2018 ◽  
Author(s):  
Christelle Fraïsse ◽  
Camille Roux ◽  
Pierre-Alexandre Gagnaire ◽  
Jonathan Romiguier ◽  
Nicolas Faivre ◽  
...  

AbstractGenome-scale diversity data are increasingly available in a variety of biological systems, and can be used to reconstruct the past evolutionary history of species divergence. However, extracting the full demographic information from these data is not trivial, and requires inferential methods that account for the diversity of coalescent histories throughout the genome. Here, we evaluate the potential and limitations of one such approach. We reexamine a well-known system of mussel sister species, using the joint site frequency spectrum (jSFS) of synonymous mutations computed either from exome capture or RNA-seq, in an Approximate Bayesian Computation (ABC) framework. We first assess the best sampling strategy (number of: individuals, loci, and bins in the jSFS), and show that model selection is robust to variation in the number of individuals and loci. In contrast, different binning choices when summarizing the joint site frequency spectrum, strongly affect the results: including classes of low and high frequency shared polymorphisms can more effectively reveal recent migration events. We then take advantage of the flexibility of ABC to compare more realistic models of speciation, including variation in migration rates through time (i.e. periodic connectivity) and across genes (i.e. genome-wide heterogeneity in migration rates). We show that these models were consistently selected as the most probable, suggesting that mussels have experienced a complex history of gene flow during divergence and that the species boundary is semi-permeable. Our work provides a comprehensive evaluation of ABC demographic inference in mussels based on the coding site frequency spectrum, and supplies guidelines for employing different sequencing techniques and sampling strategies. We emphasize, perhaps surprisingly, that inferences are less limited by the volume of data, than by the way in which they are analyzed.



2019 ◽  
Vol 36 (12) ◽  
pp. 2906-2921 ◽  
Author(s):  
Austin H Patton ◽  
Mark J Margres ◽  
Amanda R Stahlke ◽  
Sarah Hendricks ◽  
Kevin Lewallen ◽  
...  

Abstract Reconstructing species’ demographic histories is a central focus of molecular ecology and evolution. Recently, an expanding suite of methods leveraging either the sequentially Markovian coalescent (SMC) or the site-frequency spectrum has been developed to reconstruct population size histories from genomic sequence data. However, few studies have investigated the robustness of these methods to genome assemblies of varying quality. In this study, we first present an improved genome assembly for the Tasmanian devil using the Chicago library method. Compared with the original reference genome, our new assembly reduces the number of scaffolds (from 35,975 to 10,010) and increases the scaffold N90 (from 0.101 to 2.164 Mb). Second, we assess the performance of four contemporary genomic methods for inferring population size history (PSMC, MSMC, SMC++, Stairway Plot), using the two devil genome assemblies as well as simulated, artificially fragmented genomes that approximate the hypothesized demographic history of Tasmanian devils. We demonstrate that each method is robust to assembly quality, producing similar estimates of Ne when simulated genomes were fragmented into up to 5,000 scaffolds. Overall, methods reliant on the SMC are most reliable between ∼300 generations before present (gbp) and 100 kgbp, whereas methods exclusively reliant on the site-frequency spectrum are most reliable between the present and 30 gbp. Our results suggest that when used in concert, genomic methods for reconstructing species’ effective population size histories 1) can be applied to nonmodel organisms without highly contiguous reference genomes, and 2) are capable of detecting independently documented effects of historical geological events.



Genetics ◽  
2016 ◽  
Vol 202 (4) ◽  
pp. 1549-1561 ◽  
Author(s):  
Jeffrey P. Spence ◽  
John A. Kamm ◽  
Yun S. Song


2018 ◽  
Vol 124 ◽  
pp. 81-92
Author(s):  
Andrew Melfi ◽  
Divakar Viswanath


Sign in / Sign up

Export Citation Format

Share Document