scholarly journals Inferring time-dependent migration and coalescence patterns from genetic sequence and predictor data in structured populations

2019 ◽  
Vol 5 (2) ◽  
Author(s):  
Nicola F Müller ◽  
Gytis Dudas ◽  
Tanja Stadler

Abstract Population dynamics can be inferred from genetic sequence data by using phylodynamic methods. These methods typically quantify the dynamics in unstructured populations or assume migration rates and effective population sizes to be constant through time in structured populations. When considering rates to vary through time in structured populations, the number of parameters to infer increases rapidly and the available data might not be sufficient to inform these. Additionally, it is often of interest to know what predicts these parameters rather than knowing the parameters themselves. Here, we introduce a method to  infer the predictors for time-varying migration rates and effective population sizes by using a generalized linear model (GLM) approach under the marginal approximation of the structured coalescent. Using simulations, we show that our approach is able to reliably infer the model parameters and its predictors from phylogenetic trees. Furthermore, when simulating trees under the structured coalescent, we show that our new approach outperforms the discrete trait GLM model. We then apply our framework to a previously described Ebola virus dataset, where we infer the parameters and its predictors from genome sequences while accounting for phylogenetic uncertainty. We infer weekly cases to be the strongest predictor for effective population size and geographic distance the strongest predictor for migration. This approach is implemented as part of the BEAST2 package MASCOT, which allows us to jointly infer population dynamics, i.e. the parameters and predictors, within structured populations, the phylogenetic tree, and evolutionary parameters.

2018 ◽  
Author(s):  
Nicola F. Müller ◽  
Gytis Dudas ◽  
Tanja Stadler

AbstractPopulation dynamics can be inferred from genetic sequence data using phylodynamic methods. These methods typically quantify the dynamics in unstructured populations or assume the parameters describing the dynamics to be constant through time in structured populations. Inference methods allowing for structured populations and parameters to vary through time involve many parameters which have to be inferred. Each of these parameters might be however only weakly informed by data. Here we introduce an approach that uses so-called predictors, such as geographic distance between locations, within a generalized linear model to inform the population dynamic parameters, namely the time-varying migration rates and effective population sizes under the marginal approximation of the structured coalescent. By using simulations, we show that we are able to reliably infer the parameters from phylogenetic trees. We then apply this framework to a previously described Ebola virus dataset. We infer incidence to be the strongest predictor for effective population size and geographic distance the strongest predictor for migration. This allows us to show not only on simulated data, but also on real data, that we are able to identify reasonable predictors. Overall, we provide a novel method that allows to identify predictors for migration rates and effective population sizes and to use these predictors to quantify migration rates and effective population sizes. Its implementation as part of the BEAST2 software package MASCOT allows to jointly infer population dynamics within structured populations, the phylogenetic tree, and evolutionary parameters.


2021 ◽  
Author(s):  
Tyler Steven Brown ◽  
Aimee R. Taylor ◽  
Olufunmilayo Arogbokun ◽  
Caroline O. Buckee ◽  
Hsiao-Han Chang

Measuring gene flow between malaria parasite populations in different geographic locations can provide strategic information for malaria control interventions. Multiple important questions pertaining to the design of such studies remain unanswered, limiting efforts to operationalize genomic surveillance tools for routine public health use. This report evaluates numerically the ability to distinguish different levels of gene flow between malaria populations, using different amounts of real and simulated data, where data are simulated using parameters that approximate different epidemiological conditions. Specifically, using Plasmodium falciparum  whole genome sequence data and sequence data simulated for a metapopulation with different migration rates and effective population sizes, we compare two estimators of gene flow, explore the number of genetic markers and number of individuals required to reliably rank highly connected locations, and describe how these thresholds change given different effective population sizes and migration rates. Our results have implications for the design and implementation of malaria genomic surveillance efforts.


2019 ◽  
Vol 36 (11) ◽  
pp. 2620-2628 ◽  
Author(s):  
Verity Hill ◽  
Guy Baele

Abstract Inferring past population dynamics over time from heterochronous molecular sequence data is often achieved using the Bayesian Skygrid model, a nonparametric coalescent model that estimates the effective population size over time. Available in BEAST, a cross-platform program for Bayesian analysis of molecular sequences using Markov chain Monte Carlo, this coalescent model is often estimated in conjunction with a molecular clock model to produce time-stamped phylogenetic trees. We here provide a practical guide to using BEAST and its accompanying applications for the purpose of drawing inference under these models. We focus on best practices, potential pitfalls, and recommendations that can be generalized to other software packages for Bayesian inference. This protocol shows how to use TempEst, BEAUti, and BEAST 1.10 (http://beast.community/; last accessed July 29, 2019), LogCombiner as well as Tracer in a complete workflow.


2001 ◽  
Vol 77 (2) ◽  
pp. 153-166 ◽  
Author(s):  
BRIAN CHARLESWORTH

Formulae for the effective population sizes of autosomal, X-linked, Y-linked and maternally transmitted loci in age-structured populations are developed. The approximations used here predict both asymptotic rates of increase in probabilities of identity, and equilibrium levels of neutral nucleotide site diversity under the infinite-sites model. The applications of the results to the interpretation of data on DNA sequence variation in Drosophila, plant, and human populations are discussed. It is concluded that sex differences in demographic parameters such as adult mortality rates generally have small effects on the relative effective population sizes of loci with different modes of inheritance, whereas differences between the sexes in variance in reproductive success can have major effects, either increasing or reducing the effective population size for X-linked loci relative to autosomal or Y-linked loci. These effects need to be accounted for when trying to understand data on patterns of sequence variation for genes with different transmission modes.


2016 ◽  
Vol 73 (9) ◽  
pp. 2178-2180 ◽  
Author(s):  
W. Stewart Grant ◽  
Einar Árnason ◽  
Bjarki Eldon

Abstract The analyses of often large amounts of field and laboratory data depend on computer programs to generate descriptive statistics and to test hypotheses. The algorithms in these programs are often complex and can be understood only with advanced training in mathematics and programming, topics that are beyond the capabilities of most fisheries biologists and empirical population geneticists. The backward looking Kingman coalescent model, based on the classic forward-looking Wright–Fisher model of genetic change, is used in many genetics software programs to generate null distributions against which to test hypotheses. An article in this issue by Niwa et al. shows that the assumption of bifurcations at nodes in the Kingman coalescent model is inappropriate for highly fecund Japanese sardines, which have type III life histories. Species with this life history pattern are better modelled with multiple mergers at the nodes of a coalescent gene genealogy. However, only a few software programs allow analysis with multiple-merger coalescent models. This parameter misspecification produces demographic reconstructions that reach too far into the past and greatly overestimates genetically effective population sizes (the number of individuals actually contributing to the next generation). The results of Niwa et al. underline the need to understand the assumptions and model parameters in the software programs used to analyse DNA sequences.


1997 ◽  
Vol 69 (2) ◽  
pp. 111-116 ◽  
Author(s):  
ZIHENG YANG

The theory developed by Takahata and colleagues for estimating the effective population size of ancestral species using homologous sequences from closely related extant species was extended to take account of variation of evolutionary rates among loci. Nuclear sequence data related to the evolution of modern humans were reanalysed and computer simulations were performed to examine the effect of rate variation on estimation of ancestral population sizes. It is found that the among-locus rate variation does not have a significant effect on estimation of the current population size when sequences from multiple loci are sampled from the same species, but does have a significant effect on estimation of the ancestral population size using sequences from different species. The effects of ancestral population size, species divergence time and among-locus rate variation are found to be highly correlated, and to achieve reliable estimates of the ancestral population size, effects of the other two factors should be estimated independently.


Genetics ◽  
1995 ◽  
Vol 140 (2) ◽  
pp. 679-695 ◽  
Author(s):  
A Estoup ◽  
L Garnery ◽  
M Solignac ◽  
J M Cornuet

Abstract Samples from nine populations belonging to three African (intermissa, scutellata and capensis) and four European (mellifera, ligustica, carnica and cecropia) Apis mellifera subspecies were scored for seven microsatellite loci. A large amount of genetic variation (between seven and 30 alleles per locus) was detected. Average heterozygosity and average number of alleles were significantly higher in African than in European subspecies, in agreement with larger effective population sizes in Africa. Microsatellite analyses confirmed that A. mellifera evolved in three distinct and deeply differentiated lineages previously detected by morphological and mitochondrial DNA studies. Dendrogram analysis of workers from a given population indicated that super-sisters cluster together when using a sufficient number of microsatellite data whereas half-sisters do not. An index of classification was derived to summarize the clustering of different taxonomic levels in large phylogenetic trees based on individual genotypes. Finally, individual population x loci data were used to test the adequacy of the two alternative mutation models, the infinite allele model (IAM) and the stepwise mutation models. The better fit overall of the IAM probably results from the majority of the microsatellites used including repeats of two or three different length motifs (compound microsatellites).


2018 ◽  
Author(s):  
Erik M. Volz ◽  
Igor Siveroni

AbstractPopulation genetic modeling can enhance Bayesian phylogenetic inference by providing a realistic prior on the distribution of branch lengths and times of common ancestry.The parameters of a population genetic model may also have intrinsic importance, and simultaneous estimation of a phylogeny and model parameters has enabled phylodynamic inference of population growth rates, reproduction numbers, and effective population size through time. Phylodynamic inference based on pathogen genetic sequence data has emerged as useful supplement to epidemic surveillance, however commonly-used mechanistic models that are typically fitted to non-genetic surveillance data are rarely fitted to pathogen genetic data due to a dearth of software tools, and the theory required to conduct such inference has been developed only recently. We present a framework for coalescent-based phylogenetic and phylodynamic inference which enables highly-flexible modeling of demographic and epidemiological processes. This approach builds upon previous structured coalescent approaches and includes enhancements for computational speed, accuracy, and stability. A flexible markup language is described for translating parametric demographic or epidemiological models into a structured coalescent model enabling simultaneous estimation of demographic or epidemiological parameters and time-scaled phylogenies. We demonstrate the utility of these approaches by fitting compartmental epidemiological models to Ebola virus and Influenza A virus sequence data, demonstrating how important features of these epidemics, such as the reproduction number and epidemic curves, can be gleaned from genetic data. These approaches are provided as an open-source package PhyDyn for the BEAST phylogenetics platform.


2018 ◽  
Author(s):  
Chao Yang ◽  
Yujun Cui ◽  
Xavier Didelot ◽  
Ruifu Yang ◽  
Daniel Falush

AbstractBackgroundBacteria typically have more structured populations than higher eukaryotes, but this difference is surprising given high recombination rates, enormous population sizes and effective geographical dispersal in many bacterial species.ResultsWe estimated the recombination scaled effective population size Ner in 21 bacterial species and find that it does not correlate with synonymous nucleotide diversity as would be expected under neutral models of evolution. Only two species have estimates substantially over 100, consistent with approximate panmixia, namely Helicobacter pylori and Vibrio parahaemolyticus. Both species are far from demographic equilibrium, with diversity predicted to increase more than 30 fold in V. parahaemolyticus if the current value of Ner were maintained, to values much higher than found in any species. We propose that panmixia is unstable in bacteria, and that persistent environmental species are likely to evolve barriers to genetic exchange, which act to prevent a continuous increase in diversity by enhancing genetic drift.ConclusionsOur results highlight the dynamic nature of bacterial population structures and imply that overall diversity levels found within a species are poor indicators of its size.


Genetics ◽  
2003 ◽  
Vol 163 (1) ◽  
pp. 395-404 ◽  
Author(s):  
Jeffrey D Wall

Abstract This article presents a new method for jointly estimating species divergence times and ancestral population sizes. The method improves on previous ones by explicitly incorporating intragenic recombination, by utilizing orthologous sequence data from closely related species, and by using a maximum-likelihood framework. The latter allows for efficient use of the available information and provides a way of assessing how much confidence we should place in the estimates. I apply the method to recently collected intergenic sequence data from humans and the great apes. The results suggest that the human-chimpanzee ancestral population size was four to seven times larger than the current human effective population size and that the current human effective population size is slightly >10,000. These estimates are similar to previous ones, and they appear relatively insensitive to assumptions about the recombination rates or mutation rates across loci.


Sign in / Sign up

Export Citation Format

Share Document