scholarly journals Evaluation of methods for the inference of ancestral recombination graphs

2021 ◽  
Author(s):  
Debora Y C Brandt ◽  
Xinzhu Wei ◽  
Yun Deng ◽  
Andrew H. Vaughn ◽  
Rasmus Nielsen

The ancestral recombination graph (ARG) is a structure that describes the joint genealogies of sampled DNA sequences along the genome. Recent computational methods have made impressive progress towards scalably estimating whole-genome genealogies. In addition to inferring the ARG, some of these methods can also provide ARGs sampled from a defined posterior distribution. Obtaining good samples of ARGs is crucial for quantifying statistical uncertainty and for estimating population genetic parameters such as effective population size, mutation rate, and allele age. Here, we use simulations to benchmark three popular ARG inference programs: ARGweaver, Relate, and tsdate. We use neutral coalescent simulations to 1) compare the true coalescence times to the inferred times at each locus; 2) compare the distribution of coalescence times across all loci to the expected exponential distribution; 3) evaluate whether the sampled coalescence times have the properties expected of a valid posterior distribution. We find that inferred coalescence times at each locus are more accurate in ARGweaver and Relate than in tsdate. However, all three methods tend to overestimate small coalescence times and underestimate large ones. Lastly, the posterior distribution of ARGweaver is closer to the expected posterior distribution than Relate's, but this higher accuracy comes at a substantial trade-off in scalability. The best choice of method will depend on the number and length of input sequences and on the goal of downstream analyses, and we provide guidelines for the best practices.

Genetics ◽  
2000 ◽  
Vol 156 (3) ◽  
pp. 1427-1436 ◽  
Author(s):  
Lada Markovtsova ◽  
Paul Marjoram ◽  
Simon Tavaré

AbstractWe describe a Markov chain Monte Carlo approach for assessing the role of site-to-site rate variation in the analysis of within-population samples of DNA sequences using the coalescent. Our framework is a Bayesian one. We discuss methods for assessing the goodness-of-fit of these models, as well as problems concerning the separate estimation of effective population size and mutation rate. Using a mitochondrial data set for illustration, we show that ancestral inference concerning coalescence times can be dramatically affected if rate variation is ignored.


Genetics ◽  
1994 ◽  
Vol 136 (2) ◽  
pp. 685-692 ◽  
Author(s):  
Y X Fu

Abstract A new estimator of the essential parameter theta = 4Ne mu from DNA polymorphism data is developed under the neutral Wright-Fisher model without recombination and population subdivision, where Ne is the effective population size and mu is the mutation rate per locus per generation. The new estimator has a variance only slightly larger than the minimum variance of all possible unbiased estimators of the parameter and is substantially smaller than that of any existing estimator. The high efficiency of the new estimator is achieved by making full use of phylogenetic information in a sample of DNA sequences from a population. An example of estimating theta by the new method is presented using the mitochondrial sequences from an American Indian population.


Genetics ◽  
1994 ◽  
Vol 138 (1) ◽  
pp. 227-234 ◽  
Author(s):  
D L Hartl ◽  
E N Moriyama ◽  
S A Sawyer

Abstract The patterns of nonrandom usage of synonymous codons (codon bias) in enteric bacteria were analyzed. Poisson random field (PRF) theory was used to derive the expected distribution of frequencies of nucleotides differing from the ancestral state at aligned sites in a set of DNA sequences. This distribution was applied to synonymous nucleotide polymorphisms and amino acid polymorphisms in the gnd and putP genes of Escherichia coli. For the gnd gene, the average intensity of selection against disfavored synonymous codons was estimated as approximately 7.3 x 10(-9); this value is significantly smaller than the estimated selection intensity against selectively disfavored amino acids in observed polymorphisms (2.0 x 10(-8)), but it is approximately of the same order of magnitude. The selection coefficients for optimal synonymous codons estimated from PRF theory were consistent with independent estimates based on codon usage for threonine and glycine. Across 118 genes in E. coli and Salmonella typhimurium, the distribution of estimated selection coefficients, expressed as multiples of the effective population size, has a mean and standard deviation of 0.5 +/- 0.4. No significant differences were found in the degree of codon bias between conserved positions and replacement positions, suggesting that translational misincorporation is not an important selective constraint among synonymous polymorphic codons in enteric bacteria. However, across the first 100 codons of the genes, conserved amino acids with identical codons have significantly greater codon bias than that of either synonymous or nonidentical codons, suggesting that there are unique selective constraints, perhaps including mRNA secondary structures, in this part of the coding region.


Author(s):  
Asher D. Cutter

Chapter 3, “Quantifying genetic variation at the molecular level,” introduces quantitative methods for measuring variation directly in DNA sequences to help decipher fundamental properties of populations and what they can tell us about evolution. It provides an overview of the evolutionary factors that contribute to genetic variation, like mutational input, effective population size, genetic drift, migration rate, and models of migration. This chapter surveys the principal ways to measure and summarize polymorphisms within a single population and across multiple populations of a species, including heterozygosity, nucleotide polymorphism estimators of θ‎, the site frequency spectrum, and F ST, and by providing illustrative natural examples. Populations are where evolution starts, after mutations arise as the spark of population genetic variation, and Chapter 3 describes how to quantify the variation to connect observations to predictions about how much polymorphism there ought to be under different circumstances.


2010 ◽  
Vol 61 (8) ◽  
pp. 918 ◽  
Author(s):  
Meaghan L. Rourke ◽  
Helen C. McPartlan ◽  
Brett A. Ingram ◽  
Andrea C. Taylor

Stocking wild fish populations with hatchery-bred fish has numerous genetic implications for fish species worldwide. In the present study, 16 microsatellite loci were used to determine the genetic effects of nearly three decades of Murray cod (Maccullochella peelii peelii) stocking in five river catchments in southern Australia. Genetic parameters taken from scale samples collected from 1949 to 1954 before the commencement of stocking were compared with samples collected 16 to 28 years after stocking commenced, and with samples from a local hatchery that supplements these catchments. Given that the five catchments are highly connected and adult Murray cod undertake moderate migrations, we predicted that there would be minimal population structuring of historical samples, whereas contemporary samples may have diverged slightly and lost genetic diversity as a result of stocking. A Bayesian Structure analysis indicated genetic homogeneity among the catchments both pre- and post-stocking, indicating that stocking has not measurably impacted genetic structure, although allele frequencies in one catchment changed slightly over this period. Current genetic diversity was moderately high (HE = 0.693) and had not changed over the period of stocking. Broodfish had a similar level of genetic diversity to the wild populations, and effective population size had not changed substantially between the two time periods. Our results may bode well for stocking programs of species that are undertaken without knowledge of natural genetic structure, when river connectivity is high, fish are moderately migratory and broodfish are sourced locally.


2016 ◽  
Vol 73 (9) ◽  
pp. 2178-2180 ◽  
Author(s):  
W. Stewart Grant ◽  
Einar Árnason ◽  
Bjarki Eldon

Abstract The analyses of often large amounts of field and laboratory data depend on computer programs to generate descriptive statistics and to test hypotheses. The algorithms in these programs are often complex and can be understood only with advanced training in mathematics and programming, topics that are beyond the capabilities of most fisheries biologists and empirical population geneticists. The backward looking Kingman coalescent model, based on the classic forward-looking Wright–Fisher model of genetic change, is used in many genetics software programs to generate null distributions against which to test hypotheses. An article in this issue by Niwa et al. shows that the assumption of bifurcations at nodes in the Kingman coalescent model is inappropriate for highly fecund Japanese sardines, which have type III life histories. Species with this life history pattern are better modelled with multiple mergers at the nodes of a coalescent gene genealogy. However, only a few software programs allow analysis with multiple-merger coalescent models. This parameter misspecification produces demographic reconstructions that reach too far into the past and greatly overestimates genetically effective population sizes (the number of individuals actually contributing to the next generation). The results of Niwa et al. underline the need to understand the assumptions and model parameters in the software programs used to analyse DNA sequences.


2010 ◽  
Vol 107 (5) ◽  
pp. 2147-2152 ◽  
Author(s):  
Chad D. Huff ◽  
Jinchuan Xing ◽  
Alan R. Rogers ◽  
David Witherspoon ◽  
Lynn B. Jorde

The genealogies of different genetic loci vary in depth. The deeper the genealogy, the greater the chance that it will include a rare event, such as the insertion of a mobile element. Therefore, the genealogy of a region that contains a mobile element is on average older than that of the rest of the genome. In a simple demographic model, the expected time to most recent common ancestor (TMRCA) is doubled if a rare insertion is present. We test this expectation by examining single nucleotide polymorphisms around polymorphic Alu insertions from two completely sequenced human genomes. The estimated TMRCA for regions containing a polymorphic insertion is two times larger than the genomic average (P < <10−30), as predicted. Because genealogies that contain polymorphic mobile elements are old, they are shaped largely by the forces of ancient population history and are insensitive to recent demographic events, such as bottlenecks and expansions. Remarkably, the information in just two human DNA sequences provides substantial information about ancient human population size. By comparing the likelihood of various demographic models, we estimate that the effective population size of human ancestors living before 1.2 million years ago was 18,500, and we can reject all models where the ancient effective population size was larger than 26,000. This result implies an unusually small population for a species spread across the entire Old World, particularly in light of the effective population sizes of chimpanzees (21,000) and gorillas (25,000), which each inhabit only one part of a single continent.


Genetics ◽  
1973 ◽  
Vol 75 (4) ◽  
pp. 709-726
Author(s):  
J J Rutledge ◽  
E J Eisen ◽  
J E Legates

ABSTRACT Heritability and genetic correlations realized from both single-trait and antagonistic index selection were compared with paternal half-sib estimates. Primary attention was focused on the genetic correlation between six-week body weight and six-week tail length. Parameters realized from single-trait selection were in excellent agreement with paternal half-sib estimates. However, the realized genetic correlation between six-week body weight and six-week tail length obtained from index selection was significantly greater than the other estimates. Differential inbreeding levels and realized selection intensities were considered and rejected as being causative factors for these results. Linkage disequilibrium probably was not a factor either, as the base population had been randomly mated and randomly selected with a large effective population size for many generations. It was concluded that with antagonistic index selection, the pleiotropic effects of genes may be more powerful in retarding response in aggregate genotype than current theory would suggest. Replication of all selected and control lines allowed the use of between-line estimators of sampling variances of realized genetic parameters in the above comparisons. Generally, standard errors of realized genetic parameters were much smaller than corresponding paternal half-sib standard errors. Thus, selection was an efficient method of estimation.


2019 ◽  
Author(s):  
Dominic Nelson ◽  
Jerome Kelleher ◽  
Aaron P. Ragsdale ◽  
Gil McVean ◽  
Simon Gravel

1AbstractCoalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime [1] now provide genome-wide simulations for millions of individuals. However, this software relies on classic coalescent theory and the corresponding assumptions that sample sizes are small relative to effective population size and that the region being simulated is short. Here we show that coalescent simulations of long regions of the genome exhibit large biases in identity-by-descent (IBD), long-range linkage disequilibrium (LD), and ancestry patterns, particularly when sample size is large. We present a Wright-Fisher extension to msprime, and show that it produces more realistic distributions of IBD, LD, and ancestry proportions, while also addressing more subtle biases of the coalescent. Further, these extensions are more computationally efficient than state-of-the-art coalescent simulations when simulating long regions, including whole-genome data. For shorter regions, efficiency and accuracy can be maintained via a flexible hybrid model which simulates the recent past under the Wright-Fisher model and uses coalescent simulations in the distant past.


2021 ◽  
Author(s):  
Simon Boitard ◽  
Armando Arredondo ◽  
Camille Noûs ◽  
Lounes Chikhi ◽  
Olivier Mazet

The relative contribution of selection and neutrality in shaping species genetic diversity is one of the most central and controversial questions in evolutionary theory. Genomic data provide growing evidence that linked selection, i.e. the modification of genetic diversity at neutral sites through linkage with selected sites, might be pervasive over the genome. Several studies proposed that linked selection could be modelled as first approximation by a local reduction (e.g. purifying selection, selective sweeps) or increase (e.g. balancing selection) of effective population size (Ne). At the genome-wide scale, this leads to a large variance of Ne from one region to another, reflecting the heterogeneity of selective constraints and recombination rates between regions. We investigate here the consequences of this variation of Ne on the genome-wide distribution of coalescence times. The underlying motivation concerns the impact of linked selection on demographic inference, because the distribution of coalescence times is at the heart of several important demographic inference approaches. Using the concept of Inverse Instantaneous Coalescence Rate, we demonstrate that in a panmictic population, linked selection always results in a spurious apparent decrease of Ne along time. Balancing selection has a particularly large effect, even when it concerns a very small part of the genome. We quantify the expected magnitude of the spurious decrease of Ne in humans and Drosophila melanogaster, based on Ne distributions inferred from real data in these species. We also find that the effect of linked selection can be significantly reduced by that of population structure.


Sign in / Sign up

Export Citation Format

Share Document