scholarly journals Accuracy of demographic inferences from site frequency spectrum: the case of the yoruba population

2016 ◽  
Author(s):  
Marguerite Lapierre ◽  
Amaury Lambert ◽  
Guillaume Achaz

AbstractSome methods for demographic inference based on the observed genetic diversity of current populations rely on the use of summary statistics such as the Site Frequency Spectrum (SFS). Demographic models can be either model-constrained with numerous parameters such as growth rates, timing of demographic events and migration rates, or model-flexible, with an unbounded collection of piecewise constant sizes. It is still debated whether demographic histories can be accurately inferred based on the SFS. Here we illustrate this theoretical issue on an example of demographic inference for an African population. The SFS of the Yoruba population (data from the 1000 Genomes Project) is fit to a simple model of population growth described with a single parameter (e.g., founding time). We infer a time to the most recent common ancestor of 1.7 million years for this population. However, we show that the Yoruba SFS is not informative enough to discriminate between several different models of growth. We also show that for such simple demographies, the fit of one-parameter models outperforms the model-flexible method recently developed by Liu and Fu. The use of this method on simulated data suggests that it is biased by the noise intrinsically present in the data.


2015 ◽  
Author(s):  
Jochen Blath ◽  
Bjarki Eldon ◽  
Adrian Casanova ◽  
Noemi Kurt ◽  
Maite Wilke-Berenguer

We analyse patterns of genetic variability of populations in the presence of a large seedbank with the help of a new coalescent structure called the seedbank coalescent. This ancestral process appears naturally as scaling limit of the genealogy of large populations that sustain seedbanks, if the seedbank size and individual dormancy times are of the same order as the active population. Mutations appear as Poisson processes on the active lineages, and potentially at reduced rate also on the dormant lineages. The presence of `dormant' lineages leads to qualitatively altered times to the most recent common ancestor and non-classical patterns of genetic diversity. To illustrate this we provide a Wright-Fisher model with seedbank component and mutation, motivated from recent models of microbial dormancy, whose genealogy can be described by the seedbank coalescent. Based on our coalescent model, we derive recursions for the expectation and variance of the time to most recent common ancestor, number of segregating sites, pairwise differences, and singletons. Estimates (obtained by simulations) of the distributions of commonly employed distance statistics, in the presence and absence of a seedbank, are compared. The effect of a seedbank on the expected site-frequency spectrum is also investigated using simulations. Our results indicate that the presence of a large seedbank considerably alters the distribution of some distance statistics, as well as the site-frequency spectrum. Thus, one should be able to detect from genetic data the presence of a large seedbank in natural populations.



2021 ◽  
Author(s):  
Helmut E Simon ◽  
Gavin A Huttley

The site frequency spectrum (SFS) is a commonly used statistic to summarize genetic variation in a sample of genomic sequences from a population. Such a genomic sample is associated with an imputed genealogical history with attributes such as branch lengths, coalescence times and the time to the most recent common ancestor (TMRCA) as well as topological and combinatorial properties. We present a Bayesian model for sampling from the joint posterior distribution of coalescence times conditional on the SFS associated with a sample of sequences in the absence of selection. In this model, the combinatorial properties of a genealogy, which is represented as a coalescent tree, are expressed as matrices. This facilitates the calculation of likelihoods and the effective sampling of the entire space of tree structures according to the Equal Rates Markov (or Yule-type) measure. Unlike previous methods, assumptions as to the type of stochastic process that generated the genealogical tree are not required. Novel approaches to defining both uninformative and informative prior distributions are employed. The uncertainty in inference due to the stochastic nature of mutation and the unknown tree structure is expressed by the shape of the posterior distributions. The method is implemented using the general purpose Markov Chain Monte Carlo software PyMC3. From the sampled posterior distribution of coalescence times, one can also infer related quantities such as the number of ancestors of a sample at a given time in the past (ancestral distribution) and the probability of specific relationships between branch lengths (for example, that the most recent branch is longer than all the others). The performance of the method is evaluated against simulated data and is also applied to historic mitochondrial data from the Nuu-Chah-Nulth people of North America. The method can be used to obtain estimates of the TMRCA of the sample. The relationship of these estimates to those given by ''Thomson's estimator'' is explored. Keywords: coalescent theory; Bayesian inference; time to most recent common ancestor; site frequency spectrum



2019 ◽  
Vol 17 (2) ◽  
pp. 114-125 ◽  
Author(s):  
Dmitry Neshumaev ◽  
Aleksey Lebedev ◽  
Marina Malysheva ◽  
Anatoly Boyko ◽  
Sergey Skudarnov ◽  
...  

Background:The information about the dynamics of the viral population and migration events that affect the epidemic in different parts of the Russia is insufficient. Possibly, the huge size of the country and limited transport accessibility to certain territories may determine unique traits of the HIV-1 evolutionary history in different regions.Objective:The aim of this study was to explore the genetic diversity of HIV-1 in the Krasnoyarsk region and reconstruct spatial-temporal dynamics of the infection in the region.Methods:The demographic and virologic data from 281 HIV-infected individuals in Krasnoyarsk region collected during 2011-2016 were analyzed. The time to the most recent common ancestor, evolutionary rates, population growth, and ancestral geographic movements was estimated using Bayesian coalescent-based methods.Results:The study revealed moderate diversity of the HIV-1 subtypes found in the region, which included A6 (92.3%), CRF063_02A (4.3%), B (1.1%), and unique recombinants (2.5%). Phylogenetic reconstruction revealed that the A6 subtype was introduced into Krasnoyarsk region by one viral lineage, which arose around 1996.9 (1994.5-1999.5). The phylogeography analysis pointed to Krasnoyarsk city as the geographical center of the epidemic, which further spread to central neighboring districts of the region. At least two epidemic growth phases of subtype A6 were identified which included exponential growth in early-2000s followed by the decline in the mid/late 2010s.Conclusion:This study demonstrates a change in the genetic diversity of HIV-1 in the Krasnoyarsk region. At the beginning of the epidemic, subtype A6 prevailed, subtypes B and CRF063_02A appeared in the region later.



Genetics ◽  
2000 ◽  
Vol 156 (2) ◽  
pp. 879-891 ◽  
Author(s):  
Mikkel H Schierup ◽  
Jotun Hein

Abstract We investigate the shape of a phylogenetic tree reconstructed from sequences evolving under the coalescent with recombination. The motivation is that evolutionary inferences are often made from phylogenetic trees reconstructed from population data even though recombination may well occur (mtDNA or viral sequences) or does occur (nuclear sequences). We investigate the size and direction of biases when a single tree is reconstructed ignoring recombination. Standard software (PHYLIP) was used to construct the best phylogenetic tree from sequences simulated under the coalescent with recombination. With recombination present, the length of terminal branches and the total branch length are larger, and the time to the most recent common ancestor smaller, than for a tree reconstructed from sequences evolving with no recombination. The effects are pronounced even for small levels of recombination that may not be immediately detectable in a data set. The phylogenies when recombination is present superficially resemble phylogenies for sequences from an exponentially growing population. However, exponential growth has a different effect on statistics such as Tajima's D. Furthermore, ignoring recombination leads to a large overestimation of the substitution rate heterogeneity and the loss of the molecular clock. These results are discussed in relation to viral and mtDNA data sets.



Biology ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1282
Author(s):  
Artem N. Bondaryuk ◽  
Tatiana E. Peretolchina ◽  
Elena V. Romanova ◽  
Anzhelika V. Yudinceva ◽  
Evgeny I. Andaev ◽  
...  

In this paper, we revealed the genetic structure and migration history of the Powassan virus (POWV) reconstructed based on 25 complete genomes available in NCBI and ViPR databases (accessed in June 2021). The usage of this data set allowed us to perform a more precise assessment of the evolutionary rate of this virus. In addition, we proposed a simple Bayesian technique for the evaluation and visualization of ‘temporal signal dynamics’ along the phylogenetic tree. We showed that the evolutionary rate value of POWV is 3.3 × 10−5 nucleotide substitution per site per year (95% HPD, 2.0 × 10−5–4.7 × 10−5), which is lower than values reported in the previous studies. Divergence of the most recent common ancestor (MRCA) of POWV into two independent genetic lineages most likely occurred in the period between 2600 and 6030 years ago. We assume that the divergence of the virus lineages happened due to the melting of glaciers about 12,000 years ago, which led to the disappearance of the Bering Land Bridge between Eurasia and North America (the modern Alaskan territory) and spatial division of the viral areal into two parts. Genomic data provide evidence of the virus migrations between two continents. The mean migration rate detected from the Far East of Russia to North America was one event per 1750 years. The migration to the opposite direction occurred approximately once per 475 years.



2019 ◽  
Author(s):  
Ethan Linck ◽  
C.J. Battey

AbstractCommon models of speciation with gene flow consider constant migration or admixture on secondary contact, but earth’s recent climatic history suggests many populations have experienced cycles of isolation and contact over the last million years. How does this process impact the rate of speciation, and how much can we learn about its dynamics by analyzing the genomes of modern populations? Here we develop a simple model of speciation through Bateson-Dobzhansky-Muller incompatibilities in the face of periodic gene flow and validate our model with forward time simulations. We then use empirical atmospheric CO2 concentration data from the Vostok Ice Cores to simulate cycles of isolation and secondary contact in a tropical montane landscape, and ask whether they can be distinguished from a standard isolation-with-migration model by summary statistics or joint site frequency spectrum-based demographic inference. We find speciation occurs much faster under periodic than constant gene flow with equivalent effective migration rates (Nm). These processes can be distinguished through combinations of summary statistics or demographic inference from the site frequency spectrum, but parameter estimates appear to have little resolution beyond the most recent cycle of isolation and migration. Our results suggest speciation with periodic gene flow is a common force in generating species diversity through Pleistocene climate cycles, and highlight the limits of current inference techniques for demographic models mimicking the complexity of earth’s recent climatic history.



2019 ◽  
Vol 116 (13) ◽  
pp. 6244-6249
Author(s):  
Somayeh Mashayekhi ◽  
Peter Beerli

An approach to the coalescent, the fractional coalescent (f-coalescent), is introduced. The derivation is based on the discrete-time Cannings population model in which the variance of the number of offspring depends on the parameter α. This additional parameter α affects the variability of the patterns of the waiting times; values ofα<1lead to an increase of short time intervals, but occasionally allow for very long time intervals. Whenα=1, the f-coalescent and the Kingman’s n-coalescent are equivalent. The distribution of the time to the most recent common ancestor and the probability that n genes descend from m ancestral genes in a time interval of length T for the f-coalescent are derived. The f-coalescent has been implemented in the population genetic model inference software Migrate. Simulation studies suggest that it is possible to accurately estimate α values from data that were generated with known α values and that the f-coalescent can detect potential environmental heterogeneity within a population. Bayes factor comparisons of simulated data withα<1and real data (H1N1 influenza and malaria parasites) showed an improved model fit of the f-coalescent over the n-coalescent. The development of the f-coalescent and its inclusion into the inference program Migratefacilitates testing for deviations from the n-coalescent.



Genetics ◽  
1998 ◽  
Vol 150 (3) ◽  
pp. 1187-1198 ◽  
Author(s):  
Mikkel H Schierup ◽  
Xavier Vekemans ◽  
Freddy B Christiansen

Abstract Expectations for the time scale and structure of allelic genealogies in finite populations are formed under three models of sporophytic self-incompatibility. The models differ in the dominance interactions among the alleles that determine the self-incompatibility phenotype: In the SSIcod model, alleles act codominantly in both pollen and style, in the SSIdom model, alleles form a dominance hierarchy, and in SSIdomcod, alleles are codominant in the style and show a dominance hierarchy in the pollen. Coalescence times of alleles rarely differ more than threefold from those under gametophytic self-incompatibility, and transspecific polymorphism is therefore expected to be equally common. The previously reported directional turnover process of alleles in the SSIdomcod model results in coalescence times lower and substitution rates higher than those in the other models. The SSIdom model assumes strong asymmetries in allelic action, and the most recessive extant allele is likely to be the most recent common ancestor. Despite these asymmetries, the expected shape of the allele genealogies does not deviate markedly from the shape of a neutral gene genealogy. The application of the results to sequence surveys of alleles, including interspecific comparisons, is discussed.



Author(s):  
Wenjun Cheng ◽  
Tianjiao Ji ◽  
Shuaifeng Zhou ◽  
Yong Shi ◽  
Lili Jiang ◽  
...  

AbstractEchovirus 6 (E6) is associated with various clinical diseases and is frequently detected in environmental sewage. Despite its high prevalence in humans and the environment, little is known about its molecular phylogeography in mainland China. In this study, 114 of 21,539 (0.53%) clinical specimens from hand, foot, and mouth disease (HFMD) cases collected between 2007 and 2018 were positive for E6. The complete VP1 sequences of 87 representative E6 strains, including 24 strains from this study, were used to investigate the evolutionary genetic characteristics and geographical spread of E6 strains. Phylogenetic analysis based on VP1 nucleotide sequence divergence showed that, globally, E6 strains can be grouped into six genotypes, designated A to F. Chinese E6 strains collected between 1988 and 2018 were found to belong to genotypes C, E, and F, with genotype F being predominant from 2007 to 2018. There was no significant difference in the geographical distribution of each genotype. The evolutionary rate of E6 was estimated to be 3.631 × 10-3 substitutions site-1 year-1 (95% highest posterior density [HPD]: 3.2406 × 10-3-4.031 × 10-3 substitutions site-1 year-1) by Bayesian MCMC analysis. The most recent common ancestor of the E6 genotypes was traced back to 1863, whereas their common ancestor in China was traced back to around 1962. A small genetic shift was detected in the Chinese E6 population size in 2009 according to Bayesian skyline analysis, which indicated that there might have been an epidemic around that year.



Sign in / Sign up

Export Citation Format

Share Document