site frequency spectrum
Recently Published Documents


TOTAL DOCUMENTS

75
(FIVE YEARS 21)

H-INDEX

14
(FIVE YEARS 3)

Genetics ◽  
2022 ◽  
Author(s):  
Benjamin H Good

Abstract The statistical associations between mutations, collectively known as linkage disequilibrium (LD), encode important information about the evolutionary forces acting within a population. Yet in contrast to single-site analogues like the site frequency spectrum, our theoretical understanding of linkage disequilibrium remains limited. In particular, little is currently known about how mutations with different ages and fitness costs contribute to expected patterns of LD, even in simple settings where recombination and genetic drift are the major evolutionary forces. Here, I introduce a forward-time framework for predicting linkage disequilibrium between pairs of neutral and deleterious mutations as a function of their present-day frequencies. I show that the dynamics of linkage disequilibrium become much simpler in the limit that mutations are rare, where they admit a simple heuristic picture based on the trajectories of the underlying lineages. I use this approach to derive analytical expressions for a family of frequency-weighted LD statistics as a function of the recombination rate, the frequency scale, and the additive and epistatic fitness costs of the mutations. I find that the frequency scale can have a dramatic impact on the shapes of the resulting LD curves, reflecting the broad range of time scales over which these correlations arise. I also show that the differences between neutral and deleterious LD are not purely driven by differences in their mutation frequencies, and can instead display qualitative features that are reminiscent of epistasis. I conclude by discussing the implications of these results for recent LD measurements in bacteria. This forward-time approach may provide a useful framework for predicting linkage disequilibrium across a range of evolutionary scenarios.


2021 ◽  
Vol 83 (6-7) ◽  
Author(s):  
Asger Hobolth ◽  
Mogens Bladt ◽  
Lars Nørvang Andersen

2021 ◽  
Author(s):  
Robert Horvath ◽  
Mitra Menon ◽  
Michelle C Stitzer ◽  
Jeffrey Ross-Ibarra

Recognition of the important role of transposable elements (TEs) in eukaryotic genomes quickly led to a burgeoning literature modeling and estimating the effects of selection on TEs. Much of the empirical work on selection has focused on analyzing the site frequency spectrum (SFS) of TEs. But TEs differ from standard evolutionary models in a number of ways that can impact the power and interpretation of the SFS. For example, rather than mutating under a clock-like model, transposition often occurs in bursts which can inflate particular frequency categories compared to expectations under a standard neutral model. If a TE burst has been recent, the excess of low frequency polymorphisms can mimic the effect of purifying selection. Here, we investigate how transposition bursts affect the frequency distribution of TEs and the correlation between age and allele frequency. Using information on the TE age distribution, we propose an age-adjusted site frequency spectrum to compare TEs and neutral polymorphisms to more effectively evaluate whether TEs are under selective constraints. We show that our approach can minimize instances of false inference of selective constraint, but also allows for a correct identification of even weak selection affecting TEs which experienced a transposition burst and is robust to at least simple demographic changes. The results presented here will help researchers working on TEs to more reliably identify the effects of selection on TEs without having to rely on the assumption of a constant transposition rate.


2021 ◽  
Author(s):  
Helmut E Simon ◽  
Gavin A Huttley

The site frequency spectrum (SFS) is a commonly used statistic to summarize genetic variation in a sample of genomic sequences from a population. Such a genomic sample is associated with an imputed genealogical history with attributes such as branch lengths, coalescence times and the time to the most recent common ancestor (TMRCA) as well as topological and combinatorial properties. We present a Bayesian model for sampling from the joint posterior distribution of coalescence times conditional on the SFS associated with a sample of sequences in the absence of selection. In this model, the combinatorial properties of a genealogy, which is represented as a coalescent tree, are expressed as matrices. This facilitates the calculation of likelihoods and the effective sampling of the entire space of tree structures according to the Equal Rates Markov (or Yule-type) measure. Unlike previous methods, assumptions as to the type of stochastic process that generated the genealogical tree are not required. Novel approaches to defining both uninformative and informative prior distributions are employed. The uncertainty in inference due to the stochastic nature of mutation and the unknown tree structure is expressed by the shape of the posterior distributions. The method is implemented using the general purpose Markov Chain Monte Carlo software PyMC3. From the sampled posterior distribution of coalescence times, one can also infer related quantities such as the number of ancestors of a sample at a given time in the past (ancestral distribution) and the probability of specific relationships between branch lengths (for example, that the most recent branch is longer than all the others). The performance of the method is evaluated against simulated data and is also applied to historic mitochondrial data from the Nuu-Chah-Nulth people of North America. The method can be used to obtain estimates of the TMRCA of the sample. The relationship of these estimates to those given by ''Thomson's estimator'' is explored. Keywords: coalescent theory; Bayesian inference; time to most recent common ancestor; site frequency spectrum


Genetics ◽  
2021 ◽  
Author(s):  
Aline Muyle ◽  
Jeffrey Ross-Ibarra ◽  
Danelle K Seymour ◽  
Brandon S Gaut

Abstract In plants, mammals and insects, some genes are methylated in the CG dinucleotide context, a phenomenon called gene body methylation (gbM). It has been controversial whether this phenomenon has any functional role. Here we took advantage of the availability of 876 leaf methylomes in Arabidopsis thaliana to characterize the population frequency of methylation at the gene level and to estimate the site-frequency spectrum of allelic states. Using a population genetics model specifically designed for epigenetic data, we found that genes with ancestral gbM are under significant selection to remain methylated. Conversely, ancestrally unmethylated genes were under selection to remain unmethylated. Repeating the analyses at the level of individual cytosines confirmed these results. Estimated selection coefficients were small, on the order of 4Nes = 1.4, which is similar to the magnitude of selection acting on codon usage. We also estimated that A. thaliana is losing gbM three-fold more rapidly than gaining it, which could be due to a recent reduction in the efficacy of selection after a switch to selfing. Finally, we investigated the potential function of gbM through its link with gene expression. Across genes with polymorphic methylation states, the expression of gene body methylated alleles was consistently and significantly higher than unmethylated alleles. Although it is difficult to disentangle genetic from epigenetic effects, our work suggests that gbM has a small but measurable effect on fitness, perhaps due to its association to a phenotype like gene expression.


2021 ◽  
Vol 17 (2) ◽  
pp. e1008701
Author(s):  
Hwai-Ray Tung ◽  
Rick Durrett

Recent work of Sottoriva, Graham, and collaborators have led to the controversial claim that exponentially growing tumors have a site frequency spectrum that follows the 1/f law consistent with neutral evolution. This conclusion has been criticized based on data quality issues, statistical considerations, and simulation results. Here, we use rigorous mathematical arguments to investigate the site frequency spectrum in the two-type model of clonal evolution. If the fitnesses of the two types are λ0 < λ1, then the site frequency spectrum is c/fα where α = λ0/λ1. This is due to the advantageous mutations that produce the founders of the type 1 population. Mutations within the growing type 0 and type 1 populations follow the 1/f law. Our results show that, in contrast to published criticisms, neutral evolution in an exponentially growing tumor can be distinguished from the two-type model using the site frequency spectrum.


Author(s):  
Götz Kersting ◽  
Arno Siri-Jégousse ◽  
Alejandro H. Wences

2020 ◽  
Author(s):  
Benjamin H. Good

The statistical associations between mutations, collectively known as linkage disequilibrium (LD), encode important information about the evolutionary forces acting within a population. Yet in contrast to single-site analogues like the site frequency spectrum, our theoretical understanding of linkage disequilibrium remains limited. In particular, little is currently known about how mutations with different ages and fitness costs contribute to expected patterns of LD, even in simple settings where recombination and genetic drift are the major evolutionary forces. Here, we introduce a forward-time framework for predicting linkage disequilibrium between pairs of neutral and deleterious mutations as a function of their present-day frequencies. We show that the dynamics of linkage disequilibrium become much simpler in the limit that mutations are rare, where they admit a simple heuristic picture based on the trajectories of the underlying lineages. We use this approach to derive analytical expressions for a family of frequency-weighted LD statistics as a function of the recombination rate, the frequency scale, and the additive and epistatic fitness costs of the mutations. We find that the frequency scale can have a dramatic impact on the shapes of the resulting LD curves, reflecting the broad range of time scales over which these correlations arise. We also show that the differences between neutral and deleterious LD are not purely driven by differences in their mutation frequencies, and can instead display qualitative features that are reminiscent of epistasis. We conclude by discussing the implications of these results for recent LD measurements in bacteria. This forward-time approach may provide a useful framework for predicting linkage disequilibrium across a range of evolutionary scenarios.


2020 ◽  
Author(s):  
Hwai-Ray Tung ◽  
Rick Durrett

AbstractWe investigate the site frequency spectrum in the two-type model of clonal evolution. If the fitnesses of the two types are λ0 < λ1 then the site frequency spectrum is c/fα where α = λ0/λ1. This is due to the advantageous mutations that produce the founders of the type 1 population. Mutations within the growing type 0 and type 1 populations follow the 1/f law. Our results show that neutral evolution can be distinguished from the two-type model using the site frequency spectrum.


Author(s):  
Daniel L. Hartl

Chapter 7 is an introduction to molecular population genetics that includes the principal concepts of nucleotide polymorphism and divergence, the site frequency spectrum, and tests of selection and their limitations. Highlighted are rates of nucleotide substitution in coding and noncoding DNA, nucleotide and amino acid divergence between species, corrections for multiple substitutions, and the molecular clock. Discussion of the folded and unfolded site frequency spectrum includes the strengths and limitations of Tajima’s D, Fay and Wu’s H, and other measures. The chapter also discusses an emerging consensus to resolve the celebrated selection–neutrality controversy. It also includes examination of demographic history through the use of ancient DNA with special emphasis on the surprising findings in regard to the ancestral makeup of contemporary human populations. Also discussed are the population dynamics of transposable elements in prokaryotes and eukaryotes.


Sign in / Sign up

Export Citation Format

Share Document