scholarly journals Unbiased Estimation of Linkage Disequilibrium from Unphased Data

2019 ◽  
Vol 37 (3) ◽  
pp. 923-932 ◽  
Author(s):  
Aaron P Ragsdale ◽  
Simon Gravel

Abstract Linkage disequilibrium (LD) is used to infer evolutionary history, to identify genomic regions under selection, and to dissect the relationship between genotype and phenotype. In each case, we require accurate estimates of LD statistics from sequencing data. Unphased data present a challenge because multilocus haplotypes cannot be inferred exactly. Widely used estimators for the common statistics r2 and D2 exhibit large and variable upward biases that complicate interpretation and comparison across cohorts. Here, we show how to find unbiased estimators for a wide range of two-locus statistics, including D2, for both single and multiple randomly mating populations. These unbiased statistics are particularly well suited to estimate effective population sizes from unlinked loci in small populations. We develop a simple inference pipeline and use it to refine estimates of recent effective population sizes of the threatened Channel Island Fox populations.

2019 ◽  
Author(s):  
Aaron P. Ragsdale ◽  
Simon Gravel

AbstractLinkage disequilibrium is used to infer evolutionary history and to identify regions under selection or associated with a given trait. In each case, we require accurate estimates of linkage disequilibrium from sequencing data. Unphased data presents a challenge because the co-occurrence of alleles at different loci is ambiguous. Commonly used estimators for the common statistics r2 and D2 exhibit large and variable upward biases that complicate interpretation and comparison across cohorts. Here, we show how to find unbiased estimators for a wide range of two-locus statistics, including D2, for both single and multiple randomly mating populations. These provide accurate estimates over three orders of magnitude in LD. We also use these estimators to construct an estimator for r2 that is less biased than commonly used estimators, but nevertheless argue for using rather than r2 for population size estimates.


2021 ◽  
Author(s):  
Irene Novo ◽  
Armando Caballero ◽  
Enrique Santiago

The effective population size ( N e ) is a key parameter to quantify the magnitude of genetic drift and inbreeding, with important implications in human evolution. The increasing availability of high-density genetic markers allows the estimation of historical changes in N e across time using measures of genome diversity or linkage disequilibrium between markers. Selection is expected to reduce diversity and N e , and this reduction is modulated by the heterogeneity of the genome in terms of recombination rate. Here we investigate by computer simulations the consequences of selection (both positive and negative) and of recombination rate heterogeneity in the estimation of historical N e . We also investigate the relationship between diversity parameters and N e across the different regions of the genome using human marker data. We show that the estimates of historical N e obtained from linkage disequilibrium between markers ( N e LD ) are virtually unaffected by selection. In contrast, those estimates obtained by coalescence mutation-recombination-based methods can be strongly affected by it, what could have important consequences for the estimation of human demography. The simulation results are supported by the analysis of human data. The estimates of N e LD obtained for particular genomic regions do not correlate with recombination rate, nucleotide diversity, polymorphism, background selection statistic, minor allele frequency of SNPs, loss of function and missense variants and gene density. This suggests that N e LD measures are merely indicative of demographic changes in population size across generations.


2020 ◽  
Vol 37 (12) ◽  
pp. 3642-3653
Author(s):  
Enrique Santiago ◽  
Irene Novo ◽  
Antonio F Pardiñas ◽  
María Saura ◽  
Jinliang Wang ◽  
...  

Abstract Inferring changes in effective population size (Ne) in the recent past is of special interest for conservation of endangered species and for human history research. Current methods for estimating the very recent historical Ne are unable to detect complex demographic trajectories involving multiple episodes of bottlenecks, drops, and expansions. We develop a theoretical and computational framework to infer the demographic history of a population within the past 100 generations from the observed spectrum of linkage disequilibrium (LD) of pairs of loci over a wide range of recombination rates in a sample of contemporary individuals. The cumulative contributions of all of the previous generations to the observed LD are included in our model, and a genetic algorithm is used to search for the sequence of historical Ne values that best explains the observed LD spectrum. The method can be applied from large samples to samples of fewer than ten individuals using a variety of genotyping and DNA sequencing data: haploid, diploid with phased or unphased genotypes and pseudohaploid data from low-coverage sequencing. The method was tested by computer simulation for sensitivity to genotyping errors, temporal heterogeneity of samples, population admixture, and structural division into subpopulations, showing high tolerance to deviations from the assumptions of the model. Computer simulations also show that the proposed method outperforms other leading approaches when the inference concerns recent timeframes. Analysis of data from a variety of human and animal populations gave results in agreement with previous estimations by other methods or with records of historical events.


2020 ◽  
Vol 12 (12) ◽  
pp. 2441-2449
Author(s):  
Jennifer James ◽  
Adam Eyre-Walker

Abstract What determines the level of genetic diversity of a species remains one of the enduring problems of population genetics. Because neutral diversity depends upon the product of the effective population size and mutation rate, there is an expectation that diversity should be correlated to measures of census population size. This correlation is often observed for nuclear but not for mitochondrial DNA. Here, we revisit the question of whether mitochondrial DNA sequence diversity is correlated to census population size by compiling the largest data set to date, using 639 mammalian species. In a multiple regression, we find that nucleotide diversity is significantly correlated to both range size and mass-specific metabolic rate, but not a variety of other factors. We also find that a measure of the effective population size, the ratio of nonsynonymous to synonymous diversity, is also significantly negatively correlated to both range size and mass-specific metabolic rate. These results together suggest that species with larger ranges have larger effective population sizes. The slope of the relationship between diversity and range is such that doubling the range increases diversity by 12–20%, providing one of the first quantifications of the relationship between diversity and the census population size.


1985 ◽  
Vol 17 (1) ◽  
pp. 97-106 ◽  
Author(s):  
John H. Relethford

SummaryA method is presented for examining the relationship between effective population size and accumulated random inbreeding in human populations. For a set of populations, the inverse of inbreeding is regressed on effective population size using a linear regression model. This procedure allows testing of several hypotheses regarding the common and unique influences on population structure. Deviations from the expected curve suggest demographic or historical change. This method is applied to surname data from nine Irish isolates. The results show that the method is very useful in assessing differential influences on population structure.


Genetics ◽  
1988 ◽  
Vol 120 (4) ◽  
pp. 1043-1051
Author(s):  
Z Smit-McBride ◽  
A Moya ◽  
F J Ayala

Abstract We have studied linkage disequilibrium in Drosophila melanogaster in two samples from a wild population and in four large laboratory populations derived from the wild samples. We have assayed four polymorphic enzyme loci, fairly closely linked in the third chromosome: Sod Est-6, Pgm, and Odh. The assay method used allows us to identify the allele associations separately in each of the two homologous chromosomes from each male sampled. We have detected significant linkage disequilibrium between two loci in 16.7% of the cases in the wild samples and in 27.8% of the cases in the experimental populations, considerably more than would be expected by chance alone. We have also found three-locus disequilibria in more instances than would be expected by chance. Some disequilibria present in the wild samples disappear in the experimental populations derived from them, but new ones appear over the generations. The effective population sizes required to generate the observed disequilibria by randomness range from 40 to more than 60,000 individuals in the natural population, depending on which locus pair is considered, and from 100 to more than 60,000 in the experimental populations. These population sizes are unrealistic; the fact that different locus-pairs yield disparate estimates within the same population argues against the likelihood that the disequilibria may have arisen as a consequence of population bottlenecks. Migration, or population mixing, cannot be excluded as the process generating the disequilibria in the wild samples, but can in the experimental populations. We conclude that linkage disequilibrium in these populations is most likely due to natural selection acting on the allozymes, or on loci very tightly linked to them.


2016 ◽  
Author(s):  
Daniel L. McCartney ◽  
Rosie M. Walker ◽  
Stewart W. Morris ◽  
Andrew M. McIntosh ◽  
David J. Porteous ◽  
...  

AbstractGenome-wide analysis of DNA methylation has now become a relatively inexpensive technique thanks to array-based methylation profiling technologies. The recently developed Illumina Infinium MethylationEPIC BeadChip interrogates methylation at over 850,000 sites across the human genome, covering 99% of RefSeq genes. This array supersedes the widely used Infinium HumanMethylation450 BeadChip, which has permitted insights into the relationship between DNA methylation and a wide range of conditions and traits. Previous research has identified issues with certain probes on both the HumanMethylation450 BeadChip and its predecessor, the Infinium HumanMethylation27 BeadChip, which were predicted to affect array performance. These issues concerned probe-binding specificity and the presence of polymorphisms at target sites. Using in silico methods, we have identified probes on the Infinium MethylationEPIC BeadChip that are predicted to (i) measure methylation at polymorphic sites and (ii) hybridise to multiple genomic regions. We intend these resources to be used for quality control procedures when analysing data derived from this platform.


Animals ◽  
2019 ◽  
Vol 9 (3) ◽  
pp. 83 ◽  
Author(s):  
Lei Xu ◽  
Bo Zhu ◽  
Zezhao Wang ◽  
Ling Xu ◽  
Ying Liu ◽  
...  

Understanding the linkage disequilibrium (LD) across the genome, haplotype structure, and persistence of phase between breeds can enable us to appropriately design and implement the genome-wide association (GWAS) and genomic selection (GS) in beef cattle. We estimated the extent of genome-wide LD, haplotype block structure, and the persistence of phase in 10 Chinese cattle population using high density BovinHD BeadChip. The overall LD measured by r2 between adjacent SNPs were 0.60, 0.67, 0.58, 0.73, and 0.71 for South Chinese cattle (SCHC), North Chinese cattle (NCC), Southwest Chinese cattle (SWC), Simmental (SIM), and Wagyu (WAG). The highest correlation (0.53) for persistence of phase across groups was observed for SCHC vs. SWC at distances of 0–50 kb, while the lowest correlation was 0.13 for SIM vs. SCHC at the same distances. In addition, the estimated current effective population sizes were 27, 14, 31, 34, and 43 for SCHC, NCC, SWC, SIM, and WAG, respectively. Our result showed that 58K, 87K, 95K, 52K, and 52K markers were required for implementation of GWAS and GS in SCHC, NCC, SWC, SIM, and WAG, respectively. Also, our findings suggested that the implication of genomic selection for multipopulation with high persistence of phase is feasible for Chinese cattle.


Genes ◽  
2020 ◽  
Vol 11 (5) ◽  
pp. 577
Author(s):  
Huiwen Zhan ◽  
Saixian Zhang ◽  
Kaili Zhang ◽  
Xia Peng ◽  
Shengsong Xie ◽  
...  

Investigating the patterns of homozygosity, linkage disequilibrium, effective population size and inbreeding coefficients in livestock contributes to our understanding of the genetic diversity and evolutionary history. Here we used Illumina PorcineSNP50 Bead Chip to identify the runs of homozygosity (ROH) and estimate the linkage disequilibrium (LD) across the whole genome, and then predict the effective population size. In addition, we calculated the inbreeding coefficients based on ROH in 305 Piétrain pigs and compared its effect with the other two types of inbreeding coefficients obtained by different calculation methods. A total of 23,434 ROHs were detected, and the average length of ROH per individual was about 507.27 Mb. There was no regularity on how those runs of homozygosity distributed in genome. The comparisons of different categories suggested that the formation of long ROH was probably related with recent inbreeding events. Although the density of genes located in ROH core regions is lower than that in the other genomic regions, most of them are related with Piétrain commercial traits like meat qualities. Overall, the results provide insight into the way in which ROH is produced and the identified ROH core regions can be used to map the genes associated with commercial traits in domestic animals.


Sign in / Sign up

Export Citation Format

Share Document