scholarly journals Decomposing the site frequency spectrum: the impact of tree topology on neutrality tests

2016 ◽  
Author(s):  
Luca Ferretti ◽  
Alice Ledda ◽  
Thomas Wiehe ◽  
Guillaume Achaz ◽  
Sebastian E. Ramos-Onsins

AbstractWe investigate the dependence of the site frequency spectrum (SFS) on the topological structure of genealogical trees. We show that basic population genetic statistics – for instance estimators ofθor neutrality tests such as Tajima’sD– can be decomposed into components of waiting times between coalescent events and of tree topology. Our results clarify the relative impact of the two components on these statistics. We provide a rigorous interpretation of positive or negative values of an important class of neutrality tests in terms of the underlying tree shape. In particular, we show that values of Tajima’sDand Fay and Wu’sHdepend in a direct way on a peculiar measure of tree balance which is mostly determined by the root balance of the tree. We present a new test for selection in the same class as Fay and Wu’sHand discuss its interpretation and power. Finally, we determine the trees corresponding to extreme expected values of these neutrality tests and present formulae for these extreme values as a function of sample size and number of segregating sites.

Genetics ◽  
2017 ◽  
Vol 207 (1) ◽  
pp. 229-240 ◽  
Author(s):  
Luca Ferretti ◽  
Alice Ledda ◽  
Thomas Wiehe ◽  
Guillaume Achaz ◽  
Sebastian E. Ramos-Onsins

Genetics ◽  
2009 ◽  
Vol 182 (1) ◽  
pp. 205-216 ◽  
Author(s):  
Thomas Städler ◽  
Bernhard Haubold ◽  
Carlos Merino ◽  
Wolfgang Stephan ◽  
Peter Pfaffelhuber

2014 ◽  
Author(s):  
Yu-Ping Poh ◽  
Vera S Domingues ◽  
Hopi Hoekstra ◽  
Jeffrey Jensen

Identifying adaptively important loci in recently bottlenecked populations?be it natural selection acting on a population following the colonization of novel habitats in the wild, or artificial selection during the domestication of a breed?remains a major challenge. Here we report the results of a simulation study examining the performance of available population-genetic tools for identifying genomic regions under selection. To illustrate our findings, we examined the interplay between selection and demography in two species of Peromyscus mice, for which we have independent evidence of selection acting on phenotype as well as functional evidence identifying the underlying genotype. With this unusual information, we tested whether population-genetic-based approaches could have been utilized to identify the adaptive locus. Contrary to published claims, we conclude that the use of the background site frequency spectrum as a null model is largely ineffective in bottlenecked populations. Results are quantified both for site frequency spectrum and linkage disequilibrium-based predictions, and are found to hold true across a large parameter space that encompasses many species and populations currently under study. These results suggest that the genomic footprint left by selection on both new and standing variation in strongly bottlenecked populations will be difficult, if not impossible, to find using current approaches.


2015 ◽  
Author(s):  
Jochen Blath ◽  
Mathias C Cronjager ◽  
Bjarki Eldon ◽  
Matthias Hammer

We give recursions for the expected site-frequency spectrum associated with so-calledXi-coalescents, that is exchangeable coalescents which admitsimultaneous multiple mergersof ancestral lineages. Xi-coalescents arise, for example, in association with population models of skewed offspring distributions with diploidy, recurrent advantageous mutations, or strong bottlenecks. In contrast, the simplerLambda-coalescentsadmit multiple mergers of lineages, but at most one such merger each time. Xi-coalescents, as well as Lambda-coalescents, can predict an excess of singletons, compared to the Kingman coalescent. We compare estimates of coalescent parameters when Xi-coalescents are applied to data generated by Lambda-coalescents, and vice versa. In general, Xi-coalescents predict fewer singletons than corresponding Lambda-coalescents, but a higher count of mutations of size larger than singletons. We fit examples of Xi-coalescents to unfolded site-frequency spectra obtained for autosomal loci of the diploid Atlantic cod, and obtain different coalescent parameter estimates than obtained with corresponding Lambda-coalescents. Our results provide new inference tools, and suggest that for autosomal population genetic data from diploid or polyploid highly fecund populations who may have skewed offspring distributions, one should not apply Lambda-coalescents, but Xi-coalescents.


2020 ◽  
Author(s):  
Ethan M. Jewett

AbstractThe site frequency spectrum (SFS) is a statistic that summarizes the distribution of derived allele frequencies in a sample of DNA sequences. The SFS provides useful information about genetic variation within and among populations and it can used to make population genetic inferences. Methods for computing the SFS based on the diffusion approximation are computationally efficient when computing all terms of the SFS simultaneously and they can handle complicated demographic scenarios. However, in practice it is sometimes only necessary to compute a subset of terms of the SFS, in which case coalescent-based methods can achieve greater computational efficiency. Here, we present simple and accurate approximate formulas for the expected joint SFS for multiple populations connected by migration. Compared with existing exact approaches, our approximate formulas greatly reduce the complexity of computing each entry of the SFS and have simple forms. The computational complexity of our method depends on the index of the entry to be computed, rather than on the sample size, and the accuracy of our approximation improves as the sample size increases.


2017 ◽  
Author(s):  
A. Klassmann ◽  
L. Ferretti

AbstractThe analysis of patterns of segregating (i.e. polymorphic) sites in aligned sequences is routine in population genetics. Quantities of interest include the total number of segregating sites and the number of sites with mutations of different frequencies, the so-called site frequency spectrum. For neutrally evolving sequences, some classical results are available, including the expected value and variance of the spectrum in the Kingman coalescent model without recombination as calculated by Fu (1995).In this work, we use similar techniques to compute the third moments of the site frequency spectrum without recombination. We also account for the linkage pattern of mutations, yielding the full haplotype spectrum of three polymorphic sites. Based on these results, we derive analytical results for the bias of Tajima’s D and other neutrality tests.As an application, we obtain the second moments of the spectrum of linked sites, which is related to the neutral spectrum of chromosomal inversions and other structural variants. These moments can be used for the normalisation of new neutrality tests relying on these spectra.


Parasitology ◽  
2014 ◽  
Vol 142 (S1) ◽  
pp. S98-S107 ◽  
Author(s):  
HSIAO-HAN CHANG ◽  
DANIEL L. HARTL

SUMMARYDetecting signals of selection in the genome of malaria parasites is a key to identify targets for drug and vaccine development. Malaria parasites have a unique life cycle alternating between vector and host organism with a population bottleneck at each transition. These recurrent bottlenecks could influence the patterns of genetic diversity and the power of existing population genetic tools to identify sites under positive selection. We therefore simulated the site-frequency spectrum of a beneficial mutant allele through time under the malaria life cycle. We investigated the power of current population genetic methods to detect positive selection based on the site-frequency spectrum as well as temporal changes in allele frequency. We found that a within-host selective advantage is difficult to detect using these methods. Although a between-host transmission advantage could be detected, the power is decreased when compared with the classical Wright–Fisher (WF) population model. Using an adjusted null site-frequency spectrum that takes the malaria life cycle into account, the power of tests based on the site-frequency spectrum to detect positive selection is greatly improved. Our study demonstrates the importance of considering the life cycle in genetic analysis, especially in parasites with complex life cycles.


Author(s):  
Adrien Oliva ◽  
Raymond Tobler ◽  
Alan Cooper ◽  
Bastien Llamas ◽  
Yassine Souilmi

Abstract The current standard practice for assembling individual genomes involves mapping millions of short DNA sequences (also known as DNA ‘reads’) against a pre-constructed reference genome. Mapping vast amounts of short reads in a timely manner is a computationally challenging task that inevitably produces artefacts, including biases against alleles not found in the reference genome. This reference bias and other mapping artefacts are expected to be exacerbated in ancient DNA (aDNA) studies, which rely on the analysis of low quantities of damaged and very short DNA fragments (~30–80 bp). Nevertheless, the current gold-standard mapping strategies for aDNA studies have effectively remained unchanged for nearly a decade, during which time new software has emerged. In this study, we used simulated aDNA reads from three different human populations to benchmark the performance of 30 distinct mapping strategies implemented across four different read mapping software—BWA-aln, BWA-mem, NovoAlign and Bowtie2—and quantified the impact of reference bias in downstream population genetic analyses. We show that specific NovoAlign, BWA-aln and BWA-mem parameterizations achieve high mapping precision with low levels of reference bias, particularly after filtering out reads with low mapping qualities. However, unbiased NovoAlign results required the use of an IUPAC reference genome. While relevant only to aDNA projects where reference population data are available, the benefit of using an IUPAC reference demonstrates the value of incorporating population genetic information into the aDNA mapping process, echoing recent results based on graph genome representations.


Sign in / Sign up

Export Citation Format

Share Document