scholarly journals Inferring time-dependent migration and coalescence patterns from genetic sequence and predictor data in structured populations

2018 ◽  
Author(s):  
Nicola F. Müller ◽  
Gytis Dudas ◽  
Tanja Stadler

AbstractPopulation dynamics can be inferred from genetic sequence data using phylodynamic methods. These methods typically quantify the dynamics in unstructured populations or assume the parameters describing the dynamics to be constant through time in structured populations. Inference methods allowing for structured populations and parameters to vary through time involve many parameters which have to be inferred. Each of these parameters might be however only weakly informed by data. Here we introduce an approach that uses so-called predictors, such as geographic distance between locations, within a generalized linear model to inform the population dynamic parameters, namely the time-varying migration rates and effective population sizes under the marginal approximation of the structured coalescent. By using simulations, we show that we are able to reliably infer the parameters from phylogenetic trees. We then apply this framework to a previously described Ebola virus dataset. We infer incidence to be the strongest predictor for effective population size and geographic distance the strongest predictor for migration. This allows us to show not only on simulated data, but also on real data, that we are able to identify reasonable predictors. Overall, we provide a novel method that allows to identify predictors for migration rates and effective population sizes and to use these predictors to quantify migration rates and effective population sizes. Its implementation as part of the BEAST2 software package MASCOT allows to jointly infer population dynamics within structured populations, the phylogenetic tree, and evolutionary parameters.

2019 ◽  
Vol 5 (2) ◽  
Author(s):  
Nicola F Müller ◽  
Gytis Dudas ◽  
Tanja Stadler

Abstract Population dynamics can be inferred from genetic sequence data by using phylodynamic methods. These methods typically quantify the dynamics in unstructured populations or assume migration rates and effective population sizes to be constant through time in structured populations. When considering rates to vary through time in structured populations, the number of parameters to infer increases rapidly and the available data might not be sufficient to inform these. Additionally, it is often of interest to know what predicts these parameters rather than knowing the parameters themselves. Here, we introduce a method to  infer the predictors for time-varying migration rates and effective population sizes by using a generalized linear model (GLM) approach under the marginal approximation of the structured coalescent. Using simulations, we show that our approach is able to reliably infer the model parameters and its predictors from phylogenetic trees. Furthermore, when simulating trees under the structured coalescent, we show that our new approach outperforms the discrete trait GLM model. We then apply our framework to a previously described Ebola virus dataset, where we infer the parameters and its predictors from genome sequences while accounting for phylogenetic uncertainty. We infer weekly cases to be the strongest predictor for effective population size and geographic distance the strongest predictor for migration. This approach is implemented as part of the BEAST2 package MASCOT, which allows us to jointly infer population dynamics, i.e. the parameters and predictors, within structured populations, the phylogenetic tree, and evolutionary parameters.


2021 ◽  
Author(s):  
Tyler Steven Brown ◽  
Aimee R. Taylor ◽  
Olufunmilayo Arogbokun ◽  
Caroline O. Buckee ◽  
Hsiao-Han Chang

Measuring gene flow between malaria parasite populations in different geographic locations can provide strategic information for malaria control interventions. Multiple important questions pertaining to the design of such studies remain unanswered, limiting efforts to operationalize genomic surveillance tools for routine public health use. This report evaluates numerically the ability to distinguish different levels of gene flow between malaria populations, using different amounts of real and simulated data, where data are simulated using parameters that approximate different epidemiological conditions. Specifically, using Plasmodium falciparum  whole genome sequence data and sequence data simulated for a metapopulation with different migration rates and effective population sizes, we compare two estimators of gene flow, explore the number of genetic markers and number of individuals required to reliably rank highly connected locations, and describe how these thresholds change given different effective population sizes and migration rates. Our results have implications for the design and implementation of malaria genomic surveillance efforts.


2019 ◽  
Vol 36 (11) ◽  
pp. 2620-2628 ◽  
Author(s):  
Verity Hill ◽  
Guy Baele

Abstract Inferring past population dynamics over time from heterochronous molecular sequence data is often achieved using the Bayesian Skygrid model, a nonparametric coalescent model that estimates the effective population size over time. Available in BEAST, a cross-platform program for Bayesian analysis of molecular sequences using Markov chain Monte Carlo, this coalescent model is often estimated in conjunction with a molecular clock model to produce time-stamped phylogenetic trees. We here provide a practical guide to using BEAST and its accompanying applications for the purpose of drawing inference under these models. We focus on best practices, potential pitfalls, and recommendations that can be generalized to other software packages for Bayesian inference. This protocol shows how to use TempEst, BEAUti, and BEAST 1.10 (http://beast.community/; last accessed July 29, 2019), LogCombiner as well as Tracer in a complete workflow.


2018 ◽  
Author(s):  
Evan McCartney-Melstad ◽  
Jannet K. Vu ◽  
H. Bradley Shaffer

AbstractRoads fragment landscapes and can cause the loss of metapopulation dynamics in threatened species, but as relatively new landscape features, few studies have had the statistical power to genetically examine road effects. We used DNA sequence data from thousands of nuclear loci to characterize the population structure of New York-endangered Eastern tiger salamanders (Ambystoma tigrinum) on Long Island and quantify the impacts of roads on population fragmentation. We uncovered highly genetically structured populations over an extremely small spatial scale (approximately 40 km2) in an increasingly human-modified landscape. Geographic distance and the presence of roads between ponds are both strong predictors of genetic divergence, suggesting that both natural and anthropogenic factors are responsible for the observed patterns of genetic variation. Our study demonstrates the value of genomic approaches in molecular ecology, as these patterns did not emerge in an earlier study of the same system using microsatellite loci. Ponds supported small effective population sizes, and pond surface area showed a strong positive correlation with salamander population size. When combined with the high degree of structuring in this heavily modified landscape, our study indicates that these endangered amphibians require management at the individual pond, or pond cluster, level. Particular efforts should be made to preserve large vernal pools, which harbor the greatest genetic diversity, and their surrounding upland habitat. Contiguous upland landscapes between ponds that facilitate natural metapopulation connectivity and demographic rescue from future local extirpations should also be protected.


Author(s):  
Hesam Montazeri ◽  
Susan Little ◽  
Mozhgan Mozaffarilegha ◽  
Niko Beerenwinkel ◽  
Victor DeGruttola

AbstractGenetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably reconstruct transmission trees because genetic sequence data or inferred phylogenetic trees from such data contain insufficient information for accurate estimation of transmission chains. Here, we show by simulation studies that incorporating infection times, even when they are uncertain, can greatly improve the accuracy of reconstruction of transmission trees. To achieve this improvement, we propose a Bayesian inference methods using Markov chain Monte Carlo that directly draws samples from the space of transmission trees under the assumption of complete sampling of the outbreak. The likelihood of each transmission tree is computed by a phylogenetic model by treating its internal nodes as transmission events. By a simulation study, we demonstrate that accuracy of the reconstructed transmission trees depends mainly on the amount of information available on times of infection; we show superiority of the proposed method to two alternative approaches when infection times are known up to specified degrees of certainty. In addition, we illustrate the use of a multiple imputation framework to study features of epidemic dynamics, such as the relationship between characteristics of nodes and average number of outbound edges or inbound edges, signifying possible transmission events from and to nodes. We apply the proposed method to a transmission cluster in San Diego and to a dataset from the 2014 Sierra Leone Ebola virus outbreak and investigate the impact of biological, behavioral, and demographic factors.


2000 ◽  
Vol 57 (8) ◽  
pp. 1701-1717 ◽  
Author(s):  
Carol A Stepien ◽  
Alison K Dillon ◽  
Amy K Patterson

Population genetic, phylogeographic, and systematic relationships are elucidated among the three species comprising the thornyhead rockfish genus Sebastolobus (Teleostei: Scorpaenidae). Genetic variation among sampling sites representing their extensive ranges along the deep continental slopes of the northern Pacific Ocean is compared using sequence data from the left domain of the mtDNA control region. Comparisons are made among the shortspine thornyhead (S. alascanus) (from seven locations), the longspine thornyhead (S. altivelis) (from five sites), which are sympatric in the northeast, and the broadbanded thornyhead (S. macrochir) (a single site) from the northwest. Phylogenetic trees rooted to Sebastes show that S. macrochir is the sister taxon of S. alascanus and S. altivelis. Intraspecific genetic variability is appreciable, with most individuals having unique haplotypes. Gene flow is substantial among some locations and others diverged significantly. Genetic divergences among sampling sites for S. alascanus indicate an isolation by geographic distance pattern. Genetic divergences for S. altivelis are unrelated to the hypothesis of isolation by geographic distance and appear to be more consistent with the hypothesis of larval retention in currents and gyres. Differences in geographic genetic patterns between the species are attributed to life history differences in their relative mobilities as juveniles and adults.


2001 ◽  
Vol 77 (2) ◽  
pp. 153-166 ◽  
Author(s):  
BRIAN CHARLESWORTH

Formulae for the effective population sizes of autosomal, X-linked, Y-linked and maternally transmitted loci in age-structured populations are developed. The approximations used here predict both asymptotic rates of increase in probabilities of identity, and equilibrium levels of neutral nucleotide site diversity under the infinite-sites model. The applications of the results to the interpretation of data on DNA sequence variation in Drosophila, plant, and human populations are discussed. It is concluded that sex differences in demographic parameters such as adult mortality rates generally have small effects on the relative effective population sizes of loci with different modes of inheritance, whereas differences between the sexes in variance in reproductive success can have major effects, either increasing or reducing the effective population size for X-linked loci relative to autosomal or Y-linked loci. These effects need to be accounted for when trying to understand data on patterns of sequence variation for genes with different transmission modes.


1997 ◽  
Vol 69 (2) ◽  
pp. 111-116 ◽  
Author(s):  
ZIHENG YANG

The theory developed by Takahata and colleagues for estimating the effective population size of ancestral species using homologous sequences from closely related extant species was extended to take account of variation of evolutionary rates among loci. Nuclear sequence data related to the evolution of modern humans were reanalysed and computer simulations were performed to examine the effect of rate variation on estimation of ancestral population sizes. It is found that the among-locus rate variation does not have a significant effect on estimation of the current population size when sequences from multiple loci are sampled from the same species, but does have a significant effect on estimation of the ancestral population size using sequences from different species. The effects of ancestral population size, species divergence time and among-locus rate variation are found to be highly correlated, and to achieve reliable estimates of the ancestral population size, effects of the other two factors should be estimated independently.


2018 ◽  
Author(s):  
Hussein Al-Asadi ◽  
Desislava Petkova ◽  
Matthew Stephens ◽  
John Novembre

AbstractIn many species a fundamental feature of genetic diversity is that genetic similarity decays with geographic distance; however, this relationship is often complex, and may vary across space and time. Methods to uncover and visualize such relationships have widespread use for analyses in molecular ecology, conservation genetics, evolutionary genetics, and human genetics. While several frameworks exist, a promising approach is to infer maps of how migration rates vary across geographic space. Such maps could, in principle, be estimated across time to reveal the full complexity of population histories. Here, we take a step in this direction: we present a method to infer separate maps of population sizes and migration rates for different time periods from a matrix of genetic similarity between every pair of individuals. Specifically, genetic similarity is measured by counting the number of long segments of haplotype sharing (also known as identity-by-descent tracts). By varying the length of these segments we obtain parameter estimates for qualitatively different time periods. Using simulations, we show that the method can reveal time-varying migration rates and population sizes, including changes that are not detectable when ignoring haplotypic structure. We apply the method to a dataset of contemporary European individuals (POPRES), and provide an integrated analysis of recent population structure and growth over the last ~3,000 years in Europe. Software implementing the methods is available at https://github.com/halasadi/MAPS.


Genetics ◽  
1995 ◽  
Vol 140 (2) ◽  
pp. 679-695 ◽  
Author(s):  
A Estoup ◽  
L Garnery ◽  
M Solignac ◽  
J M Cornuet

Abstract Samples from nine populations belonging to three African (intermissa, scutellata and capensis) and four European (mellifera, ligustica, carnica and cecropia) Apis mellifera subspecies were scored for seven microsatellite loci. A large amount of genetic variation (between seven and 30 alleles per locus) was detected. Average heterozygosity and average number of alleles were significantly higher in African than in European subspecies, in agreement with larger effective population sizes in Africa. Microsatellite analyses confirmed that A. mellifera evolved in three distinct and deeply differentiated lineages previously detected by morphological and mitochondrial DNA studies. Dendrogram analysis of workers from a given population indicated that super-sisters cluster together when using a sufficient number of microsatellite data whereas half-sisters do not. An index of classification was derived to summarize the clustering of different taxonomic levels in large phylogenetic trees based on individual genotypes. Finally, individual population x loci data were used to test the adequacy of the two alternative mutation models, the infinite allele model (IAM) and the stepwise mutation models. The better fit overall of the IAM probably results from the majority of the microsatellites used including repeats of two or three different length motifs (compound microsatellites).


Sign in / Sign up

Export Citation Format

Share Document