scholarly journals Joint Estimation of Pedigrees and Effective Population Size Using Markov Chain Monte Carlo

Genetics ◽  
2019 ◽  
Vol 212 (3) ◽  
pp. 855-868 ◽  
Author(s):  
Amy Ko ◽  
Rasmus Nielsen

Pedigrees provide the genealogical relationships among individuals at a fine resolution and serve an important function in many areas of genetic studies. One such use of pedigree information is in the estimation of the short-term effective population size (Ne), which is of great relevance in fields such as conservation genetics. Despite the usefulness of pedigrees, however, they are often an unknown parameter and must be inferred from genetic data. In this study, we present a Bayesian method to jointly estimate pedigrees and Ne from genetic markers using Markov Chain Monte Carlo. Our method supports analysis of a large number of markers and individuals within a single generation with the use of a composite likelihood, which significantly increases computational efficiency. We show, on simulated data, that our method is able to jointly estimate relationships up to first cousins and Ne with high accuracy. We also apply the method on a real dataset of house sparrows to reconstruct their previously unreported pedigree.

2018 ◽  
Author(s):  
Amy Ko ◽  
Rasmus Nielsen

Pedigrees provide a fine resolution of the genealogical relationships among individuals and serve an important function in many areas of genetic studies. One such use of pedigree information is in the estimation of short-term effective population size (Ne), which is of great relevance in fields such as conservation genetics. Despite the usefulness of pedigrees, however, they are often an unknown parameter and must be inferred from genetic data. In this study, we present a Bayesian method to jointly estimate pedigrees and Ne from genetic markers using Markov Chain Monte Carlo. Our method supports analysis of a large number of markers and individuals with the use of composite likelihood, which significantly increases computational efficiency. We show on simulated data that our method is able to jointly estimate relationships up to first cousins and Ne with high accuracy. We also apply the method on a real dataset of house sparrows to reconstruct their previously unreported pedigree.


2021 ◽  
Author(s):  
Montgomery Slatkin

A composite likelihood method is introduced for jointly estimating the intensity of selection and the rate of mutation, both scaled by the effective population size, when there is balancing selection at a single multi-allelic locus in an isolated population at demographic equilibrium. The performance of the method is tested using simulated data. Average estimated mutation rates and selection intensities are close to the true values but there is considerable variation about the averages. Allowing for both population growth and population subdivision do not result in qualitative differences but the estimated mutation rates and selection intensities do not in general reflect the current effective population size. The method is applied to three class I (HLA-A, HLA-B and HLA-C) and two class II loci (HLA-DRB1 and HLA-DQA1) in the 1000 Genomes populations. Allowing for asymmetric balancing selection has only a slight effect on the results from the symmetric model. Mutations that restore symmetry of the selection model are preferentially retained because of the tendency of natural selection to maximize average fitness. However, slight differences in selective effects result in much longer persistence time of some alleles. Trans-species polymorphism (TSP), which is characteristic of MHC in vertebrates, is more likely when there are small differences in allelic fitness than when complete symmetry is assumed. Therefore, variation in allelic fitness expands the range of parameter values consistent with observations of TSP.


Genetics ◽  
2003 ◽  
Vol 163 (1) ◽  
pp. 429-446 ◽  
Author(s):  
Jinliang Wang ◽  
Michael C Whitlock

Abstract In the past, moment and likelihood methods have been developed to estimate the effective population size (Ne) on the basis of the observed changes of marker allele frequencies over time, and these have been applied to a large variety of species and populations. Such methods invariably make the critical assumption of a single isolated population receiving no immigrants over the study interval. For most populations in the real world, however, migration is not negligible and can substantially bias estimates of Ne if it is not accounted for. Here we extend previous moment and maximum-likelihood methods to allow the joint estimation of Ne and migration rate (m) using genetic samples over space and time. It is shown that, compared to genetic drift acting alone, migration results in changes in allele frequency that are greater in the short term and smaller in the long term, leading to under- and overestimation of Ne, respectively, if it is ignored. Extensive simulations are run to evaluate the newly developed moment and likelihood methods, which yield generally satisfactory estimates of both Ne and m for populations with widely different effective sizes and migration rates and patterns, given a reasonably large sample size and number of markers.


1992 ◽  
Vol 60 (3) ◽  
pp. 209-220 ◽  
Author(s):  
Joseph Felsenstein

SummaryWe would like to use maximum likelihood to estimate parameters such as the effective population size Ne, or, if we do not know mutation rates, the product 4Neμof mutation rate per site and effective population size. To compute the likelihood for a sample of unrecombined nucleotide sequences taken from a random-mating population it is necessary to sum over all genealogies that could have led to the sequences, computing for each one the probability that it would have yielded the sequences, and weighting each one by its prior probability. The genealogies vary in tree topology and in branch lengths. Although the likelihood and the prior are straightforward to compute, the summation over all genealogies seems at first sight hopelessly difficult. This paper reports that it is possible to carry out a Monte Carlo integration to evaluate the likelihoods pproximately. The method uses bootstrap sampling of sites to create data sets for each of which a maximum likelihood tree is estimated. The resulting trees are assumed to be sampled from a distribution whose height is proportional to the likelihood surface for the full data. That it will be so is dependent on a theorem which is not proven, but seems likely to be true if the sequences are not short. One can use the resulting estimated likelihood curve to make a maximum likelihood estimate of the parameter of interest, Ne or of 4Neμ. The method requires at least 100 times the computational effort required for estimation of a phylogeny by maximum likelihood, but is practical on today's work stations. The method does not at present have any way of dealing with recombination.


2008 ◽  
Vol 90 (1) ◽  
pp. 119-128 ◽  
Author(s):  
STEPHEN I. WRIGHT ◽  
NARDIN NANO ◽  
JOHN PAUL FOXE ◽  
VAQAAR-UN NISA DAR

SummaryCytoplasmic genomes typically lack recombination, implying that genetic hitch-hiking could be a predominant force structuring nucleotide polymorphism in the chloroplast and mitochondria. We test this hypothesis by analysing nucleotide polymorphism data at 28 loci across the chloroplast and mitochondria of the outcrossing plant Arabidopsis lyrata, and compare patterns with multiple nuclear loci, and the highly selfing Arabidopsis thaliana. The maximum likelihood estimate of the ratio of effective population size at cytoplasmic relative to nuclear genes in A. lyrata does not depart from the neutral expectation of 0·5. Similarly, the ratio of effective size in A. thaliana is close to unity, the neutral expectation for a highly selfing species. The results are thus consistent with neutral organelle polymorphism in these species or with comparable effects of hitch-hiking in both cytoplasmic and nuclear genes, in contrast to the results of recent studies on gynodioecious taxa. The four-gamete test and composite likelihood estimation provide evidence for very low levels of recombination in the organelles of A. lyrata, although permutation tests do not suggest that adjacent polymorphic sites are more closely linked than more distant sites across the two genomes, suggesting that mutation hotspots or very low rates of gene conversion could explain the data.


2012 ◽  
Vol 9 (73) ◽  
pp. 1797-1808 ◽  
Author(s):  
Eric de Silva ◽  
Neil M. Ferguson ◽  
Christophe Fraser

Using sequence data to infer population dynamics is playing an increasing role in the analysis of outbreaks. The most common methods in use, based on coalescent inference, have been widely used but not extensively tested against simulated epidemics. Here, we use simulated data to test the ability of both parametric and non-parametric methods for inference of effective population size (coded in the popular BEAST package) to reconstruct epidemic dynamics. We consider a range of simulations centred on scenarios considered plausible for pandemic influenza, but our conclusions are generic for any exponentially growing epidemic. We highlight systematic biases in non-parametric effective population size estimation. The most prominent such bias leads to the false inference of slowing of epidemic spread in the recent past even when the real epidemic is growing exponentially. We suggest some sampling strategies that could reduce (but not eliminate) some of the biases. Parametric methods can correct for these biases if the infected population size is large. We also explore how some poor sampling strategies (e.g. that over-represent epidemiologically linked clusters of cases) could dramatically exacerbate bias in an uncontrolled manner. Finally, we present a simple diagnostic indicator, based on coalescent density and which can easily be applied to reconstructed phylogenies, that identifies time-periods for which effective population size estimates are less likely to be biased. We illustrate this with an application to the 2009 H1N1 pandemic.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4815
Author(s):  
Pavel Kulmon

This paper deals with bistatic track association and deghosting in the classical frequency modulation (FM)-based multi-static primary surveillance radar (MSPSR). The main contribution of this paper is a novel algorithm for bistatic track association and deghosting. The proposed algorithm is based on a hierarchical model which uses the Indian buffet process (IBP) as the prior probability distribution for the association matrix. The inference of the association matrix is then performed using the classical reversible jump Markov chain Monte Carlo (RJMCMC) algorithm with the usage of a custom set of the moves proposed by the sampler. A detailed description of the moves together with the underlying theory and the whole model is provided. Using the simulated data, the algorithm is compared with the two alternative ones and the results show the significantly better performance of the proposed algorithm in such a simulated setup. The simulated data are also used for the analysis of the properties of Markov chains produced by the sampler, such as the convergence or the posterior distribution. At the end of the paper, further research on the proposed method is outlined.


Genetics ◽  
2001 ◽  
Vol 158 (2) ◽  
pp. 885-896 ◽  
Author(s):  
Rasmus Nielsen ◽  
John Wakeley

Abstract A Markov chain Monte Carlo method for estimating the relative effects of migration and isolation on genetic diversity in a pair of populations from DNA sequence data is developed and tested using simulations. The two populations are assumed to be descended from a panmictic ancestral population at some time in the past and may (or may not) after that be connected by migration. The use of a Markov chain Monte Carlo method allows the joint estimation of multiple demographic parameters in either a Bayesian or a likelihood framework. The parameters estimated include the migration rate for each population, the time since the two populations diverged from a common ancestral population, and the relative size of each of the two current populations and of the common ancestral population. The results show that even a single nonrecombining genetic locus can provide substantial power to test the hypothesis of no ongoing migration and/or to test models of symmetric migration between the two populations. The use of the method is illustrated in an application to mitochondrial DNA sequence data from a fish species: the threespine stickleback (Gasterosteus aculeatus).


Sign in / Sign up

Export Citation Format

Share Document