scholarly journals Accounting for Gene Flow from Unsampled ‘Ghost’ Populations while Estimating Evolutionay History under the Isolation with Migration Model

2019 ◽  
Author(s):  
Arun Sethuraman ◽  
Melissa Lynch

AbstractUnsampled or extinct ‘ghost’ populations leave signatures on the genomes of individuals from extant, sampled populations, especially if they have exchanged genes with them over evolutionary time. This gene flow from ‘ghost’ populations can introduce biases when estimating evolutionary history from genomic data, often leading to data misinterpretation and ambiguous results. Here we assess these biases while accounting, or not accounting for gene flow from ‘ghost’ populations under the Isolation with Migration (IM) model. We perform extensive simulations under five scenarios with no gene flow (Scenario A), to extensive gene flow to- and from- an unsampled ‘ghost’ population (Scenarios B, C, D, and E). Estimates of evolutionary history across all scenarios A-E (effective population sizes, divergence times, and migration rates) indicate consistent a) under-estimation of divergence times between sampled populations, (b) over-estimation of effective population sizes of sampled populations, and (c) under-estimation of migration rates between sampled populations, with increased gene flow from the unsampled ‘ghost’ population. Without accounting for an unsampled ‘ghost’, summary statistics like FST are under-estimated, and π is over-estimated with increased gene flow from the‘ghost’. To show this persistent issue in empirical data, we use a 355 locus dataset from African Hunter-Gatherer populations and discuss similar biases in estimating evolutionary history while not accounting for unsampled ‘ghosts’. Considering the large effects of gene flow from these ‘ghosts’, we propose a multi-pronged approach to account for the presence of unsampled ‘ghost’ populations in population genomics studies to reduce erroneous inferences.

2021 ◽  
Author(s):  
Tyler Steven Brown ◽  
Aimee R. Taylor ◽  
Olufunmilayo Arogbokun ◽  
Caroline O. Buckee ◽  
Hsiao-Han Chang

Measuring gene flow between malaria parasite populations in different geographic locations can provide strategic information for malaria control interventions. Multiple important questions pertaining to the design of such studies remain unanswered, limiting efforts to operationalize genomic surveillance tools for routine public health use. This report evaluates numerically the ability to distinguish different levels of gene flow between malaria populations, using different amounts of real and simulated data, where data are simulated using parameters that approximate different epidemiological conditions. Specifically, using Plasmodium falciparum  whole genome sequence data and sequence data simulated for a metapopulation with different migration rates and effective population sizes, we compare two estimators of gene flow, explore the number of genetic markers and number of individuals required to reliably rank highly connected locations, and describe how these thresholds change given different effective population sizes and migration rates. Our results have implications for the design and implementation of malaria genomic surveillance efforts.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0259124
Author(s):  
Damian C. Lettoof ◽  
Vicki A. Thomson ◽  
Jari Cornelis ◽  
Philip W. Bateman ◽  
Fabien Aubret ◽  
...  

Urbanisation alters landscapes, introduces wildlife to novel stressors, and fragments habitats into remnant ‘islands’. Within these islands, isolated wildlife populations can experience genetic drift and subsequently suffer from inbreeding depression and reduced adaptive potential. The Western tiger snake (Notechis scutatus occidentalis) is a predator of wetlands in the Swan Coastal Plain, a unique bioregion that has suffered substantial degradation through the development of the city of Perth, Western Australia. Within the urban matrix, tiger snakes now only persist in a handful of wetlands where they are known to bioaccumulate a suite of contaminants, and have recently been suggested as a relevant bioindicator of ecosystem health. Here, we used genome-wide single nucleotide polymorphism (SNP) data to explore the contemporary population genomics of seven tiger snake populations across the urban matrix. Specifically, we used population genomic structure and diversity, effective population sizes (Ne), and heterozygosity-fitness correlations to assess fitness of each population with respect to urbanisation. We found that population genomic structure was strongest across the northern and southern sides of a major river system, with the northern cluster of populations exhibiting lower heterozygosities than the southern cluster, likely due to a lack of historical gene flow. We also observed an increasing signal of inbreeding and genetic drift with increasing geographic isolation due to urbanisation. Effective population sizes (Ne) at most sites were small (< 100), with Ne appearing to reflect the area of available habitat rather than the degree of adjacent urbanisation. This suggests that ecosystem management and restoration may be the best method to buffer the further loss of genetic diversity in urban wetlands. If tiger snake populations continue to decline in urban areas, our results provide a baseline measure of genomic diversity, as well as highlighting which ‘islands’ of habitat are most in need of management and protection.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Megan Phifer-Rixey ◽  
Bettina Harr ◽  
Jody Hey

Abstract Background The three main subspecies of house mice, Mus musculus castaneus, Mus musculus domesticus, and Mus musculus musculus, are estimated to have diverged ~ 350-500KYA. Resolution of the details of their evolutionary history is complicated by their relatively recent divergence, ongoing gene flow among the subspecies, and complex demographic histories. Previous studies have been limited to some extent by the number of loci surveyed and/or by the scope of the method used. Here, we apply a method (IMa3) that provides an estimate of a population phylogeny while allowing for complex histories of gene exchange. Results Results strongly support a topology with M. m. domesticus as sister to M. m. castaneus and M. m. musculus. In addition, we find evidence of gene flow between all pairs of subspecies, but that gene flow is most restricted from M. m. musculus into M. m. domesticus. Estimates of other key parameters are dependent on assumptions regarding generation time and mutation rate in house mice. Nevertheless, our results support previous findings that the effective population size, Ne, of M. m. castaneus is larger than that of the other two subspecies, that the three subspecies began diverging ~ 130 - 420KYA, and that the time between divergence events was short. Conclusions Joint demographic and phylogenetic analyses of genomic data provide a clearer picture of the history of divergence in house mice.


2020 ◽  
Vol 12 (5) ◽  
pp. 715-719
Author(s):  
Junfeng Liu ◽  
Qiao Liu ◽  
Qingzhu Yang

Abstract Gene flow between species may cause variations in branch length and topology of gene tree, which are beyond the expected variations from ancestral processes. These additional variations make it difficult to estimate parameters during speciation with gene flow, as the pattern of these additional variations differs with the relationship between isolation and migration. As far as we know, most methods rely on the assumption about the relationship between isolation and migration by a given model, such as the isolation-with-migration model, when estimating parameters during speciation with gene flow. In this article, we develop a multispecies coalescent approach which does not rely on any assumption about the relationship between isolation and migration when estimating parameters and is called mstree. mstree is available at https://github.com/liujunfengtop/MStree/ and uses some mathematical inequalities among several factors, which include the species divergence time, the ancestral population size, and the number of gene trees, to estimate parameters during speciation with gene flow. Using simulations, we show that the estimated values of ancestral population sizes and species divergence times are close to the true values when analyzing the simulation data sets, which are generated based on the isolation-with-initial-migration model, secondary contact model, and isolation-with-migration model. Therefore, our method is able to estimate ancestral population sizes and speciation times in the presence of different modes of gene flow and may be helpful to test different theories of speciation.


Genetics ◽  
2001 ◽  
Vol 157 (2) ◽  
pp. 743-750
Author(s):  
Charles Taylor ◽  
Yeya T Touré ◽  
John Carnahan ◽  
Douglas E Norris ◽  
Guimogo Dolo ◽  
...  

Abstract The population structure of the Anopheles gambiae complex is unusual, with several sibling species often occupying a single area and, in one of these species, An. gambiae sensu stricto, as many as three “chromosomal forms” occurring together. The chromosomal forms are thought to be intermediate between populations and species, distinguishable by patterns of chromosome gene arrangements. The extent of reproductive isolation among these forms has been debated. To better characterize this structure we measured effective population size, Ne, and migration rates, m, or their product by both direct and indirect means. Gene flow among villages within each chromosomal form was found to be large (Nem &gt; 40), was intermediate between chromosomal forms (Nem ≈ 3–30), and was low between species (Nem ≈ 0.17–1.3). A recently developed means for distinguishing among certain of the forms using PCR indicated rates of gene flow consistent with those observed using the other genetic markers.


Genetics ◽  
2003 ◽  
Vol 163 (1) ◽  
pp. 429-446 ◽  
Author(s):  
Jinliang Wang ◽  
Michael C Whitlock

Abstract In the past, moment and likelihood methods have been developed to estimate the effective population size (Ne) on the basis of the observed changes of marker allele frequencies over time, and these have been applied to a large variety of species and populations. Such methods invariably make the critical assumption of a single isolated population receiving no immigrants over the study interval. For most populations in the real world, however, migration is not negligible and can substantially bias estimates of Ne if it is not accounted for. Here we extend previous moment and maximum-likelihood methods to allow the joint estimation of Ne and migration rate (m) using genetic samples over space and time. It is shown that, compared to genetic drift acting alone, migration results in changes in allele frequency that are greater in the short term and smaller in the long term, leading to under- and overestimation of Ne, respectively, if it is ignored. Extensive simulations are run to evaluate the newly developed moment and likelihood methods, which yield generally satisfactory estimates of both Ne and m for populations with widely different effective sizes and migration rates and patterns, given a reasonably large sample size and number of markers.


2014 ◽  
Author(s):  
Jonathan Puritz ◽  
Christopher M. Hollenbeck ◽  
John R. Gold

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).


2019 ◽  
Vol 110 (5) ◽  
pp. 587-600
Author(s):  
A Millie Burrell ◽  
Jeffrey H R Goddard ◽  
Paul J Greer ◽  
Ryan J Williams ◽  
Alan E Pepper

Abstract Globally, a small number of plants have adapted to terrestrial outcroppings of serpentine geology, which are characterized by soils with low levels of essential mineral nutrients (N, P, K, Ca, Mo) and toxic levels of heavy metals (Ni, Cr, Co). Paradoxically, many of these plants are restricted to this harsh environment. Caulanthus ampexlicaulis var. barbarae (Brassicaceae) is a rare annual plant that is strictly endemic to a small set of isolated serpentine outcrops in the coastal mountains of central California. The goals of the work presented here were to 1) determine the patterns of genetic connectivity among all known populations of C. ampexlicaulis var. barbarae, and 2) estimate contemporary effective population sizes (Ne), to inform ongoing genomic analyses of the evolutionary history of this taxon, and to provide a foundation upon which to model its future evolutionary potential and long-term viability in a changing environment. Eleven populations of this taxon were sampled, and population-genetic parameters were estimated using 11 nuclear microsatellite markers. Contemporary effective population sizes were estimated using multiple methods and found to be strikingly small (typically Ne &lt; 10). Further, our data showed that a substantial component of genetic connectivity of this taxon is not at equilibrium, and instead showed sporadic gene flow. Several lines of evidence indicate that gene flow between isolated populations is maintained through long-distance seed dispersal (e.g., &gt;1 km), possibly via zoochory.


Genetics ◽  
2007 ◽  
Vol 177 (4) ◽  
pp. 2195-2207 ◽  
Author(s):  
Daniel Garrigan ◽  
Sarah B. Kingan ◽  
Maya M. Pilkington ◽  
Jason A. Wilder ◽  
Murray P. Cox ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document