Inferring time-dependent migration and coalescence patterns from genetic sequence and predictor data in structured populations

Abstract Population dynamics can be inferred from genetic sequence data by using phylodynamic methods. These methods typically quantify the dynamics in unstructured populations or assume migration rates and effective population sizes to be constant through time in structured populations. When considering rates to vary through time in structured populations, the number of parameters to infer increases rapidly and the available data might not be sufficient to inform these. Additionally, it is often of interest to know what predicts these parameters rather than knowing the parameters themselves. Here, we introduce a method to infer the predictors for time-varying migration rates and effective population sizes by using a generalized linear model (GLM) approach under the marginal approximation of the structured coalescent. Using simulations, we show that our approach is able to reliably infer the model parameters and its predictors from phylogenetic trees. Furthermore, when simulating trees under the structured coalescent, we show that our new approach outperforms the discrete trait GLM model. We then apply our framework to a previously described Ebola virus dataset, where we infer the parameters and its predictors from genome sequences while accounting for phylogenetic uncertainty. We infer weekly cases to be the strongest predictor for effective population size and geographic distance the strongest predictor for migration. This approach is implemented as part of the BEAST2 package MASCOT, which allows us to jointly infer population dynamics, i.e. the parameters and predictors, within structured populations, the phylogenetic tree, and evolutionary parameters.

Download Full-text

Inferring time-dependent migration and coalescence patterns from genetic sequence and predictor data in structured populations

10.1101/342329 ◽

2018 ◽

Cited By ~ 3

Author(s):

Nicola F. Müller ◽

Gytis Dudas ◽

Tanja Stadler

Keyword(s):

Population Dynamics ◽

Phylogenetic Trees ◽

Ebola Virus ◽

Sequence Data ◽

Geographic Distance ◽

Structured Populations ◽

Effective Population ◽

Genetic Sequence ◽

Migration Rates ◽

Population Sizes

AbstractPopulation dynamics can be inferred from genetic sequence data using phylodynamic methods. These methods typically quantify the dynamics in unstructured populations or assume the parameters describing the dynamics to be constant through time in structured populations. Inference methods allowing for structured populations and parameters to vary through time involve many parameters which have to be inferred. Each of these parameters might be however only weakly informed by data. Here we introduce an approach that uses so-called predictors, such as geographic distance between locations, within a generalized linear model to inform the population dynamic parameters, namely the time-varying migration rates and effective population sizes under the marginal approximation of the structured coalescent. By using simulations, we show that we are able to reliably infer the parameters from phylogenetic trees. We then apply this framework to a previously described Ebola virus dataset. We infer incidence to be the strongest predictor for effective population size and geographic distance the strongest predictor for migration. This allows us to show not only on simulated data, but also on real data, that we are able to identify reasonable predictors. Overall, we provide a novel method that allows to identify predictors for migration rates and effective population sizes and to use these predictors to quantify migration rates and effective population sizes. Its implementation as part of the BEAST2 software package MASCOT allows to jointly infer population dynamics within structured populations, the phylogenetic tree, and evolutionary parameters.

Download Full-text

Distinguishing gene flow between malaria parasite populations

10.1101/2021.01.08.425858 ◽

2021 ◽

Author(s):

Tyler Steven Brown ◽

Aimee R. Taylor ◽

Olufunmilayo Arogbokun ◽

Caroline O. Buckee ◽

Hsiao-Han Chang

Keyword(s):

Gene Flow ◽

Malaria Parasite ◽

Sequence Data ◽

Simulated Data ◽

Whole Genome Sequence ◽

Effective Population ◽

Migration Rates ◽

Population Sizes ◽

And Migration ◽

Parasite Populations

Measuring gene flow between malaria parasite populations in different geographic locations can provide strategic information for malaria control interventions. Multiple important questions pertaining to the design of such studies remain unanswered, limiting efforts to operationalize genomic surveillance tools for routine public health use. This report evaluates numerically the ability to distinguish different levels of gene flow between malaria populations, using different amounts of real and simulated data, where data are simulated using parameters that approximate different epidemiological conditions. Specifically, using Plasmodium falciparum whole genome sequence data and sequence data simulated for a metapopulation with different migration rates and effective population sizes, we compare two estimators of gene flow, explore the number of genetic markers and number of individuals required to reliably rank highly connected locations, and describe how these thresholds change given different effective population sizes and migration rates. Our results have implications for the design and implementation of malaria genomic surveillance efforts.

Download Full-text

Bayesian Estimation of Past Population Dynamics in BEAST 1.10 Using the Skygrid Coalescent Model

Molecular Biology and Evolution ◽

10.1093/molbev/msz172 ◽

2019 ◽

Vol 36 (11) ◽

pp. 2620-2628 ◽

Cited By ~ 9

Author(s):

Verity Hill ◽

Guy Baele

Keyword(s):

Population Dynamics ◽

Phylogenetic Trees ◽

Sequence Data ◽

Effective Population ◽

Molecular Sequence Data ◽

Molecular Sequence ◽

Coalescent Model ◽

Molecular Clock Model ◽

Past Population ◽

Over Time

Abstract Inferring past population dynamics over time from heterochronous molecular sequence data is often achieved using the Bayesian Skygrid model, a nonparametric coalescent model that estimates the effective population size over time. Available in BEAST, a cross-platform program for Bayesian analysis of molecular sequences using Markov chain Monte Carlo, this coalescent model is often estimated in conjunction with a molecular clock model to produce time-stamped phylogenetic trees. We here provide a practical guide to using BEAST and its accompanying applications for the purpose of drawing inference under these models. We focus on best practices, potential pitfalls, and recommendations that can be generalized to other software packages for Bayesian inference. This protocol shows how to use TempEst, BEAUti, and BEAST 1.10 (http://beast.community/; last accessed July 29, 2019), LogCombiner as well as Tracer in a complete workflow.

Download Full-text

The effect of life-history and mode of inheritance on neutral genetic variability

Genetics Research ◽

10.1017/s0016672301004979 ◽

2001 ◽

Vol 77 (2) ◽

pp. 153-166 ◽

Cited By ~ 114

Author(s):

BRIAN CHARLESWORTH

Keyword(s):

Sequence Variation ◽

Adult Mortality ◽

Structured Populations ◽

Human Populations ◽

Effective Population ◽

Transmission Modes ◽

Population Sizes ◽

Age Structured ◽

Infinite Sites Model ◽

Site Diversity

Formulae for the effective population sizes of autosomal, X-linked, Y-linked and maternally transmitted loci in age-structured populations are developed. The approximations used here predict both asymptotic rates of increase in probabilities of identity, and equilibrium levels of neutral nucleotide site diversity under the infinite-sites model. The applications of the results to the interpretation of data on DNA sequence variation in Drosophila, plant, and human populations are discussed. It is concluded that sex differences in demographic parameters such as adult mortality rates generally have small effects on the relative effective population sizes of loci with different modes of inheritance, whereas differences between the sexes in variance in reproductive success can have major effects, either increasing or reducing the effective population size for X-linked loci relative to autosomal or Y-linked loci. These effects need to be accounted for when trying to understand data on patterns of sequence variation for genes with different transmission modes.

Download Full-text

New DNA coalescent models and old population genetics software†

ICES Journal of Marine Science ◽

10.1093/icesjms/fsw076 ◽

2016 ◽

Vol 73 (9) ◽

pp. 2178-2180 ◽

Cited By ~ 4

Author(s):

W. Stewart Grant ◽

Einar Árnason ◽

Bjarki Eldon

Keyword(s):

Dna Sequences ◽

Life Histories ◽

Laboratory Data ◽

Model Parameters ◽

Effective Population ◽

Coalescent Model ◽

Null Distributions ◽

Life History Pattern ◽

Population Sizes ◽

Software Programs

Abstract The analyses of often large amounts of field and laboratory data depend on computer programs to generate descriptive statistics and to test hypotheses. The algorithms in these programs are often complex and can be understood only with advanced training in mathematics and programming, topics that are beyond the capabilities of most fisheries biologists and empirical population geneticists. The backward looking Kingman coalescent model, based on the classic forward-looking Wright–Fisher model of genetic change, is used in many genetics software programs to generate null distributions against which to test hypotheses. An article in this issue by Niwa et al. shows that the assumption of bifurcations at nodes in the Kingman coalescent model is inappropriate for highly fecund Japanese sardines, which have type III life histories. Species with this life history pattern are better modelled with multiple mergers at the nodes of a coalescent gene genealogy. However, only a few software programs allow analysis with multiple-merger coalescent models. This parameter misspecification produces demographic reconstructions that reach too far into the past and greatly overestimates genetically effective population sizes (the number of individuals actually contributing to the next generation). The results of Niwa et al. underline the need to understand the assumptions and model parameters in the software programs used to analyse DNA sequences.

Download Full-text

On the estimation of ancestral population sizes of modern humans

Genetics Research ◽

10.1017/s001667239700270x ◽

1997 ◽

Vol 69 (2) ◽

pp. 111-116 ◽

Cited By ~ 40

Author(s):

ZIHENG YANG

Keyword(s):

Population Size ◽

Sequence Data ◽

Divergence Time ◽

Ancestral Population ◽

Rate Variation ◽

Effective Population ◽

Modern Humans ◽

Extant Species ◽

Population Sizes ◽

Highly Correlated

The theory developed by Takahata and colleagues for estimating the effective population size of ancestral species using homologous sequences from closely related extant species was extended to take account of variation of evolutionary rates among loci. Nuclear sequence data related to the evolution of modern humans were reanalysed and computer simulations were performed to examine the effect of rate variation on estimation of ancestral population sizes. It is found that the among-locus rate variation does not have a significant effect on estimation of the current population size when sequences from multiple loci are sampled from the same species, but does have a significant effect on estimation of the ancestral population size using sequences from different species. The effects of ancestral population size, species divergence time and among-locus rate variation are found to be highly correlated, and to achieve reliable estimates of the ancestral population size, effects of the other two factors should be estimated independently.

Download Full-text

Microsatellite variation in honey bee (Apis mellifera L.) populations: hierarchical genetic structure and test of the infinite allele and stepwise mutation models.

Genetics ◽

10.1093/genetics/140.2.679 ◽

1995 ◽

Vol 140 (2) ◽

pp. 679-695 ◽

Cited By ~ 9

Author(s):

A Estoup ◽

L Garnery ◽

M Solignac ◽

J M Cornuet

Keyword(s):

Apis Mellifera ◽

Phylogenetic Trees ◽

Microsatellite Data ◽

Effective Population ◽

Average Heterozygosity ◽

Microsatellite Variation ◽

Infinite Allele Model ◽

Population Sizes ◽

Stepwise Mutation ◽

Allele Model

Abstract Samples from nine populations belonging to three African (intermissa, scutellata and capensis) and four European (mellifera, ligustica, carnica and cecropia) Apis mellifera subspecies were scored for seven microsatellite loci. A large amount of genetic variation (between seven and 30 alleles per locus) was detected. Average heterozygosity and average number of alleles were significantly higher in African than in European subspecies, in agreement with larger effective population sizes in Africa. Microsatellite analyses confirmed that A. mellifera evolved in three distinct and deeply differentiated lineages previously detected by morphological and mitochondrial DNA studies. Dendrogram analysis of workers from a given population indicated that super-sisters cluster together when using a sufficient number of microsatellite data whereas half-sisters do not. An index of classification was derived to summarize the clustering of different taxonomic levels in large phylogenetic trees based on individual genotypes. Finally, individual population x loci data were used to test the adequacy of the two alternative mutation models, the infinite allele model (IAM) and the stepwise mutation models. The better fit overall of the IAM probably results from the majority of the microsatellites used including repeats of two or three different length motifs (compound microsatellites).

Download Full-text

Bayesian phylodynamic inference with complex models

10.1101/268052 ◽

2018 ◽

Cited By ~ 2

Author(s):

Erik M. Volz ◽

Igor Siveroni

Keyword(s):

Population Genetic ◽

Influenza A ◽

Genetic Model ◽

Sequence Data ◽

Genetic Data ◽

Simultaneous Estimation ◽

Model Parameters ◽

Epidemiological Models ◽

Structured Coalescent ◽

Speed Accuracy

AbstractPopulation genetic modeling can enhance Bayesian phylogenetic inference by providing a realistic prior on the distribution of branch lengths and times of common ancestry.The parameters of a population genetic model may also have intrinsic importance, and simultaneous estimation of a phylogeny and model parameters has enabled phylodynamic inference of population growth rates, reproduction numbers, and effective population size through time. Phylodynamic inference based on pathogen genetic sequence data has emerged as useful supplement to epidemic surveillance, however commonly-used mechanistic models that are typically fitted to non-genetic surveillance data are rarely fitted to pathogen genetic data due to a dearth of software tools, and the theory required to conduct such inference has been developed only recently. We present a framework for coalescent-based phylogenetic and phylodynamic inference which enables highly-flexible modeling of demographic and epidemiological processes. This approach builds upon previous structured coalescent approaches and includes enhancements for computational speed, accuracy, and stability. A flexible markup language is described for translating parametric demographic or epidemiological models into a structured coalescent model enabling simultaneous estimation of demographic or epidemiological parameters and time-scaled phylogenies. We demonstrate the utility of these approaches by fitting compartmental epidemiological models to Ebola virus and Influenza A virus sequence data, demonstrating how important features of these epidemics, such as the reproduction number and epidemic curves, can be gleaned from genetic data. These approaches are provided as an open-source package PhyDyn for the BEAST phylogenetics platform.

Download Full-text

Why panmictic bacteria are rare

10.1101/385336 ◽

2018 ◽

Cited By ~ 6

Author(s):

Chao Yang ◽

Yujun Cui ◽

Xavier Didelot ◽

Ruifu Yang ◽

Daniel Falush

Keyword(s):

Nucleotide Diversity ◽

Bacterial Species ◽

Structured Populations ◽

Effective Population ◽

Recombination Rates ◽

Continuous Increase ◽

Population Structures ◽

Population Sizes ◽

Current Value ◽

Neutral Models

AbstractBackgroundBacteria typically have more structured populations than higher eukaryotes, but this difference is surprising given high recombination rates, enormous population sizes and effective geographical dispersal in many bacterial species.ResultsWe estimated the recombination scaled effective population size Ner in 21 bacterial species and find that it does not correlate with synonymous nucleotide diversity as would be expected under neutral models of evolution. Only two species have estimates substantially over 100, consistent with approximate panmixia, namely Helicobacter pylori and Vibrio parahaemolyticus. Both species are far from demographic equilibrium, with diversity predicted to increase more than 30 fold in V. parahaemolyticus if the current value of Ner were maintained, to values much higher than found in any species. We propose that panmixia is unstable in bacteria, and that persistent environmental species are likely to evolve barriers to genetic exchange, which act to prevent a continuous increase in diversity by enhancing genetic drift.ConclusionsOur results highlight the dynamic nature of bacterial population structures and imply that overall diversity levels found within a species are poor indicators of its size.

Download Full-text

Estimating Ancestral Population Sizes and Divergence Times

Genetics ◽

10.1093/genetics/163.1.395 ◽

2003 ◽

Vol 163 (1) ◽

pp. 395-404 ◽

Cited By ~ 5

Author(s):

Jeffrey D Wall

Keyword(s):

Population Size ◽

Effective Population Size ◽

Sequence Data ◽

Ancestral Population ◽

Divergence Times ◽

Intragenic Recombination ◽

Intergenic Sequence ◽

Effective Population ◽

Recombination Rates ◽

Population Sizes

Abstract This article presents a new method for jointly estimating species divergence times and ancestral population sizes. The method improves on previous ones by explicitly incorporating intragenic recombination, by utilizing orthologous sequence data from closely related species, and by using a maximum-likelihood framework. The latter allows for efficient use of the available information and provides a way of assessing how much confidence we should place in the estimates. I apply the method to recently collected intergenic sequence data from humans and the great apes. The results suggest that the human-chimpanzee ancestral population size was four to seven times larger than the current human effective population size and that the current human effective population size is slightly >10,000. These estimates are similar to previous ones, and they appear relatively insensitive to assumptions about the recombination rates or mutation rates across loci.

Download Full-text