scholarly journals A genealogical estimate of genetic relationships

2021 ◽  
Author(s):  
Caoqi Fan ◽  
Nicholas Mancuso ◽  
Charleston W.K. Chiang

The application of genetic relationships among individuals, characterized by a genetic relationship matrix (GRM), has far-reaching effects in human genetics. However, the current standard to calculate the GRM generally does not take advantage of linkage information and does not reflect the underlying genealogical history of the study sample. Here, we propose a coalescent-informed framework to infer the expected relatedness between pairs of individuals given an ancestral recombination graph (ARG) of the sample. Through extensive simulations we show that the eGRM is an unbiased estimate of latent pairwise genome-wide relatedness and is robust when computed using genealogies inferred from incomplete genetic data. As a result, the eGRM better captures the structure of a population than the canonical GRM, even when using the same genetic information. More importantly, our framework allows a principled approach to estimate the eGRM at different time depths of the ARG, thereby revealing the time-varying nature of population structure in a sample. When applied to genotyping data from a population sample from Northern and Eastern Finland, we find that clustering analysis using the eGRM reveals population structure driven by subpopulations that would not be apparent using the canonical GRM, and that temporally the population model is consistent with recent divergence and expansion. Taken together, our proposed eGRM provides a robust tree-centric estimate of relatedness with wide application to genetic studies.

2016 ◽  
Author(s):  
Lana S. Martin ◽  
Eleazar Eskin

AbstractA genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to effectively test for association while correcting for population structure is a computational and statistical challenge. Using laboratory mouse strains as an example, our review characterizes the problem of population structure in association studies and describes how it can cause false positive associations. We then motivate mixed models in the context of unmodeled factors.


2021 ◽  
Author(s):  
Sudaraka Mallawaarachchi ◽  
Gerry Tonkin-Hill ◽  
Nicholas J. Croucher ◽  
Paul Turner ◽  
Doug Speed ◽  
...  

AbstractAdvances in whole-genome genotyping and sequencing have allowed genome-wide analyses of association, prediction and heritability in many organisms. However, the application of such analyses to bacteria is still in its infancy, being limited by difficulties including the plasticity of bacterial genomes and their strong population structure. Here we propose a suite of genome-wide analyses for bacteria that combines methods from human genetics and previous bacterial studies, including linear mixed models, elastic net and LD-score regression. We introduce innovations such as frequency-based allele coding, testing for both insertion/deletion and nucleotide effects and partitioning heritability by genome region. Using a previously-published large cohort study, we analyse three phenotypes of a major human pathogen Streptococcus pneumoniae, including the first analyses of minimum inhibitory concentrations (MIC) for each of two antibiotics, penicillin and ceftriaxone. We show that these are very highly heritable leading to high prediction accuracy, which is explained by many genetic associations identified under good control of population structure effects. In the case of ceftriaxone MIC, these results are surprising because none of the isolates was resistant according to the inhibition zone diameter threshold. We estimate that just over half of the heritability of penicillin MIC is explained by a known drug-resistance region, which also contributes around a quarter of the heritability of ceftriaxone MIC. For the within-host survival phenotype carriage duration, no reliable associations were found but we observed moderate heritability and prediction accuracy, indicating a polygenic trait. While generating important new results for S. pneumoniae, we have critically assessed existing methods and introduced innovations that will be useful for future large-scale population genomics studies to help decipher the genetic architecture of bacterial traits.Author summaryGenome-wide association, prediction and heritability analyses in bacteria are beginning to help unravel the genetic underpinnings of traits such as antimicrobial resistance, virulence, within-host survival and transmissibility. Progress to date is limited by challenges including the effects of strong population structure and variable recombination, and the many gaps in sequence alignments including the absence of entire genes in many isolates. More work is required to critically asses and develop methods for bacterial genomics. We address this task here, using a range of existing methods from bacterial and human genetics, such as linear mixed models, elastic net and LD-score regression. We adapt these methods to introduce new analyses, including separate assessment of gap and nucleotide effects, a new allele coding for association analyses and a method to partition heritability into genome regions. We analyse within-host survival and two antimicrobial response traits of Streptococcus pneumoniae, identifying many novel associations while demonstrating good control of population structure and accurate prediction. We present both new results for an important pathogen and methodological advances that will be useful in guiding future studies in bacterial population genomics.


2014 ◽  
Vol 31 (11) ◽  
pp. 2929-2940 ◽  
Author(s):  
Takehiro Sato ◽  
Shigeki Nakagome ◽  
Chiaki Watanabe ◽  
Kyoko Yamaguchi ◽  
Akira Kawaguchi ◽  
...  

animal ◽  
2017 ◽  
Vol 11 (10) ◽  
pp. 1680-1688 ◽  
Author(s):  
A. Kominakis ◽  
A.L. Hager-Theodorides ◽  
A. Saridaki ◽  
G. Antonakos ◽  
G. Tsiamis

1996 ◽  
Vol 121 (5) ◽  
pp. 783-788 ◽  
Author(s):  
Jan Tivang ◽  
Paul W. Skroch ◽  
James Nienhuis ◽  
Neal De Vos

The magnitude of genetic differences among and heterogeneity within globe artichoke cultivars is unknown. Variation among individual heads (capitula) from three artichoke cultivars and two breeding populations were evaluated using RAPD markers. One vegetatively propagated cultivar (`Green Globe'), two seed-propagated cultivars (`Imperial Star' and `Big Heart') and two breeding populations were examined. Two to thirteen polymorphic bands were observed for 27 RAPD primers, which resulted in 178 scored bands. Variation was found within and among all cultivars, and breeding populations indicating that all five groups represent heterogeneous populations with respect to RAPD markers. The genetic relationships among individual genotypes were estimated using the ratio of discordant bands to total bands scored. Multidimensional scaling of the relationship matrix showed five independent clusters corresponding to the three cultivars and two breeding populations. The integrity of the five clusters was confirmed using pooled chi-squares for fragment homogeneity. Average gene diversity (Hs) was calculated for each population sample, and a one-way analysis of variance showed significant differences among populations. `Big Heart' had an Hs value equivalent to the two breeding populations, while clonally propagated `Green Globe' and seed propagated `Imperial Star' had the lowest Hs values. The RAPD heterogeneity observed within clonally propagated `Green Globe' is consistent with phenotypic variability observed for this cultivar. Overall, the results demonstrate the utility of the RAPD technique for evaluating genetic relationships and contrasting levels of genetic diversity among populations of artichoke genotypes.


2019 ◽  
Author(s):  
Jairui Li ◽  
Tomas Gonzalez ◽  
Julie D. White ◽  
Karlijne Indencleef ◽  
Hanne Hoskens ◽  
...  

AbstractAccurate inference of genomic ancestry is critically important in human genetics, epidemiology, and related fields. Geneticists today have access to multiple heterogeneous population-based datasets from studies collected under different protocols. Therefore, joint analyses of these datasets require robust and consistent inference of ancestry, where a common strategy is to yield an ancestry space generated by a reference dataset. However, such a strategy is sensitive to batch artefacts introduced by different protocols. In this work, we propose a novel robust genome-wide ancestry inference method; referred to as SUGIBS, based on an unnormalized genomic (UG) relationship matrix whose spectral (S) decomposition is generalized by an Identity-by-State (IBS) similarity degree matrix. SUGIBS robustly constructs an ancestry space from a single reference dataset, and provides a robust projection of new samples, from different studies. In experiments and simulations, we show that, SUGIBS is robust against individual outliers and batch artifacts introduced by different genotyping protocols. The performance of SUGIBS is equivalent to the widely used principal component analysis (PCA) on normalized genotype data in revealing the underlying structure of an admixed population and in adjusting for false positive findings in a case-control admixed GWAS. We applied SUGIBS on the 1000 Genome project, as a reference, in combination with a large heterogeneous dataset containing auxiliary 3D facial images, to predict population stratified average or ancestry faces. In addition, we projected eight ancient DNA profiles into the 1000 Genome ancestry space and reconstructed their ancestry face. Based on the visually strong and recognizable human facial phenotype, comprehensive facial illustrations of the populations embedded in the 1000 Genome project are provided. Furthermore, ancestry facial imaging has important applications in personalized and precision medicine along with forensic and archeological DNA phenotyping.Author SummaryEstimates of individual-level genomic ancestry are routinely used in human genetics, epidemiology, and related fields. The analysis of population structure and genomic ancestry can yield significant insights in terms of modern and ancient population dynamics, allowing us to address questions regarding the timing of the admixture events, and the numbers and identities of the parental source populations. Unrecognized or cryptic population structure is also an important confounder to correct for in genome-wide association studies (GWAS). However, to date, it remains challenging to work with heterogeneous datasets from multiple studies collected by different laboratories with diverse genotyping and imputation protocols. This work presents a new approach and an accompanying open-source software toolbox that facilitates a robust integrative analysis for population structure and genomic ancestry estimates for heterogeneous datasets. Given that visually evident and easily recognizable patterns of human facial characteristics covary with genomic ancestry, we can generate predicted ancestry faces on both the population and individual levels as we illustrate for the 26 1000 Genome populations and for eight eminent ancient-DNA profiles, respectively.


HortScience ◽  
1997 ◽  
Vol 32 (3) ◽  
pp. 454B-454
Author(s):  
C.L. Boehm ◽  
H.C. Harrison ◽  
G. Jung ◽  
J. Nienhuis

The magnitude of genetic differences among and the heterogeneity within cultivated and wild American ginseng populations is unknown. Variation among individual plants from 16 geographically separated, cultivated populations and 21 geographically separated, wild populations were evaluated using RAPD markers. Cultivated populations from the midwestern U.S., the southern U.S., and Canada were examined. Wild populations from the midwestern U.S., the southern U.S., and the eastern U.S. were examined. Polymorphic bands were observed for 15 RAPD primers, which resulted in 100 scored bands. Variation was found within and among populations, indicating that the selected populations are heterogeneous with respect to RAPD markers. The genetic relationships among individual genotypes were estimated using the ratio of discordant bands to total bands scored. Multidimensional scaling of the relationship matrix showed independent clusters corresponding to the geographical and cultural origins of the populations. The integrity of the clusters were confirmed using pooled chi-squares for fragment homogeneity. Average gene diversity (Hs) was calculated for each population sample, and a one-way analysis of variance showed significant differences among populations. Overall, the results demonstrate the usefulness of the RAPD procedure for evaluating genetic relationships and comparing levels of genetic diversity among populations of American ginseng genotypes.


2020 ◽  
Vol 287 (1923) ◽  
pp. 20192999 ◽  
Author(s):  
Maëva Gabrielli ◽  
Benoit Nabholz ◽  
Thibault Leroy ◽  
Borja Milá ◽  
Christophe Thébaud

The presence of congeneric taxa on the same island suggests the possibility of in situ divergence, but can also result from multiple colonizations of previously diverged lineages. Here, using genome-wide data from a large population sample, we test the hypothesis that intra-island divergence explains the occurrence of four geographical forms meeting at hybrid zones in the Reunion grey white-eye ( Zosterops borbonicus ), a species complex endemic to the small volcanic island of Reunion. Using population genomic and phylogenetic analyses, we reconstructed the population history of the different forms. We confirmed the monophyly of the complex and found that one of the lowland forms is paraphyletic and basal relative to others, a pattern highly consistent with in situ divergence. Our results suggest initial colonization of the island through the lowlands, followed by expansion into the highlands, which led to the evolution of a distinct geographical form, genetically and ecologically different from the lowland ones. Lowland forms seem to have experienced periods of geographical isolation, but they diverged from one another by sexual selection rather than niche change. Overall, low dispersal capabilities in this island bird combined with both geographical and ecological opportunities seem to explain how divergence occurred at such a small spatial scale.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 9-10
Author(s):  
Enrico Mancin

Abstract Several methods are available for genome-wide association analysis, including the classical GWAA (cGWAA) based on fixed, single-SNP regression; efficient mixed-model association expedited (EMMAX) that fits single-SNP regressions together with a relationship matrix to account for population structure; and single-step GWAA (ssGWAA) where all data, including non-genotyped animals, are used. The objectives of this study were to: 1) investigate the ability of ssGWAA to account for population structure and correctly identify quantitative trait nucleotides (QTN); and 2) compare ssGWAA with cGWAA and EMMAX. Three simulated datasets were used, which mimic fish, beef cattle, and dairy cattle populations. The fish population was composed of 2,040 fish, out of which 1,040 were genotyped and had phenotypes for a trait with heritability of 0.25. The beef cattle population had 6,010 animals in the pedigree, but only 1,500 with phenotypes (h2 = 0.35) and genotypes. Lastly, the dairy cattle population had 40,800 pedigreed animals, of which 20,000 females had phenotypes (h2 = 0.32) and 2,400 males were genotyped. All phenotypes, pedigree, and genotypes were used in ssGWAA, whereas only genotypes and phenotypes were used in cGWAA and EMMAX for the fish and beef cattle analyses. For the dairy cattle analysis using the last two methods, deregressed proofs had to be used instead of phenotypes. The ability to correctly identify QTN and the number of statistically significant SNP (P < 0.05/number of SNP) was assessed among methods. In all populations, cGWAA was able to identify some of the strongest QTN but showed a large number of false positives. EMMAX and ssGWAA did not show false associations and correctly identified the top QTN, with more signals observed in ssGWAA. The ssGWAA accounts for population structure and is a proper association method, especially for livestock populations where sparse genotyping is a reality and phenotypes may not be recorded in genotyped animals.


Sign in / Sign up

Export Citation Format

Share Document