Accurate Estimation of Marker-Associated Genetic Variance and Heritability in Complex Trait Analyses

ABSTRACTThe emergence of high-throughput, genome-scale approaches for identifying and genotyping DNA variants has been a catalyst for the development of increasingly sophisticated whole-genome association and genomic prediction approaches, which together have revolutionized the study of complex traits in human, animal, and plant populations. These approaches have uncovered a broad spectrum of genetic complexity across traits and organisms, from a small number of detectable loci to an unknown number of undetectable loci. The heritable variation observed in a population is often partly caused by the segregation of one or more large-effect (statistically detectable) loci. Our study focused on the accurate estimation of the proportion of the genetic variance explained by such loci (p), a parameter estimated to quantify and predict the importance of causative loci or markers in linkage disequilibrium with causative loci. Here, we show that marker-associated genetic variances are systematically overestimated by standard statistical methods. The upward bias is purely mathematical in nature, unrelated to selection bias, and caused by the inequality between the genetic variance among progeny and sums of partitioned marker-associated genetic variances. We discovered a straightforward mathematical correction factor (kM) that depends only on degrees of freedom and the number of entries, is constant for a given experiment design, expands to higher-order genetic models in a predictable pattern, and yields bias-corrected estimates of marker-associated genetic variance and heritability.

Download Full-text

Average semivariance yields accurate estimates of the fraction of marker-associated genetic variance and heritability in complex trait analyses

PLoS Genetics ◽

10.1371/journal.pgen.1009762 ◽

2021 ◽

Vol 17 (8) ◽

pp. e1009762

Author(s):

Mitchell J. Feldmann ◽

Hans-Peter Piepho ◽

William C. Bridges ◽

Steven J. Knapp

Keyword(s):

Genetic Variation ◽

Statistical Methods ◽

Complex Traits ◽

Degrees Of Freedom ◽

Genetic Variance ◽

Complex Trait ◽

Statistical Detection ◽

Best Linear Unbiased ◽

Image Position ◽

Prediction Problems

The development of genome-informed methods for identifying quantitative trait loci (QTL) and studying the genetic basis of quantitative variation in natural and experimental populations has been driven by advances in high-throughput genotyping. For many complex traits, the underlying genetic variation is caused by the segregation of one or more ‘large-effect’ loci, in addition to an unknown number of loci with effects below the threshold of statistical detection. The large-effect loci segregating in populations are often necessary but not sufficient for predicting quantitative phenotypes. They are, nevertheless, important enough to warrant deeper study and direct modelling in genomic prediction problems. We explored the accuracy of statistical methods for estimating the fraction of marker-associated genetic variance (p) and heritability (H M 2) for large-effect loci underlying complex phenotypes. We found that commonly used statistical methods overestimate p and H M 2. The source of the upward bias was traced to inequalities between the expected values of variance components in the numerators and denominators of these parameters. Algebraic solutions for bias-correcting estimates of p and H M 2 were found that only depend on the degrees of freedom and are constant for a given study design. We discovered that average semivariance methods, which have heretofore not been used in complex trait analyses, yielded unbiased estimates of p and H M 2, in addition to best linear unbiased predictors of the additive and dominance effects of the underlying loci. The cryptic bias problem described here is unrelated to selection bias, although both cause the overestimation of p and H M 2. The solutions we described are predicted to more accurately describe the contributions of large-effect loci to the genetic variation underlying complex traits of medical, biological, and agricultural importance.

Download Full-text

Prospects and pitfalls in whole genome association studies

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2005.1689 ◽

2005 ◽

Vol 360 (1460) ◽

pp. 1589-1595 ◽

Cited By ~ 28

Author(s):

Robert W Lawrence ◽

David M Evans ◽

Lon R Cardon

Keyword(s):

Genetic Variation ◽

Genetic Markers ◽

Complex Traits ◽

Large Scale ◽

Association Studies ◽

Whole Genome ◽

Common Genetic Variation ◽

Close Attention ◽

Genome Association ◽

Whole Genome Association

Recent large-scale studies of common genetic variation throughout the human genome are making it feasible to conduct whole genome studies of genotype–phenotype associations. Such studies have the potential to uncover novel contributors to common complex traits and thus lead to insights into the aetiology of multifactorial phenotypes. Despite this promise, it is important to recognize that the availability of genetic markers and the ability to assay them at realistic cost does not guarantee success of this approach. There are a number of practical issues that require close attention, some forms of allelic architecture are not readily amenable to the association approach with even the most rigorous design, and doubtless new hurdles will emerge as the studies begin. Here we discuss the promise and current challenges of the whole genome approach, and raise some issues to consider in interpreting the results of the first whole genome studies.

Download Full-text

Thinking About the Evolution of Complex Traits in the Era of Genome-Wide Association Studies

Annual Review of Genomics and Human Genetics ◽

10.1146/annurev-genom-083115-022316 ◽

2019 ◽

Vol 20 (1) ◽

pp. 461-493 ◽

Cited By ~ 32

Author(s):

Guy Sella ◽

Nicholas H. Barton

Keyword(s):

Complex Traits ◽

Genetic Basis ◽

Association Studies ◽

Practical Importance ◽

Complex Trait ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Heritable Variation ◽

Genome Wide ◽

Polygenic Adaptation

Many traits of interest are highly heritable and genetically complex, meaning that much of the variation they exhibit arises from differences at numerous loci in the genome. Complex traits and their evolution have been studied for more than a century, but only in the last decade have genome-wide association studies (GWASs) in humans begun to reveal their genetic basis. Here, we bring these threads of research together to ask how findings from GWASs can further our understanding of the processes that give rise to heritable variation in complex traits and of the genetic basis of complex trait evolution in response to changing selection pressures (i.e., of polygenic adaptation). Conversely, we ask how evolutionary thinking helps us to interpret findings from GWASs and informs related efforts of practical importance.

Download Full-text

Haplotype-Based Single-Step GWAS for Yearling Temperament in American Angus Cattle

Genes ◽

10.3390/genes13010017 ◽

2021 ◽

Vol 13 (1) ◽

pp. 17

Author(s):

Andre C. Araujo ◽

Paulo L. S. Carneiro ◽

Amanda B. Alvarenga ◽

Hinayah R. Oliveira ◽

Stephen P. Miller ◽

...

Keyword(s):

Association Studies ◽

Complex Trait ◽

Single Step ◽

Genome Wide Association Studies ◽

Behavioral Traits ◽

Angus Cattle ◽

Genome Wide ◽

Genome Association ◽

Whole Genome Association ◽

Genomic Regions

Behavior is a complex trait and, therefore, understanding its genetic architecture is paramount for the development of effective breeding strategies. The objective of this study was to perform traditional and weighted single-step genome-wide association studies (ssGWAS and WssGWAS, respectively) for yearling temperament (YT) in North American Angus cattle using haplotypes. Approximately 266 K YT records and 70 K animals genotyped using a 50 K single nucleotide polymorphism (SNP) panel were used. Linkage disequilibrium thresholds (LD) of 0.15, 0.50, and 0.80 were used to create the haploblocks, and the inclusion of non-LD-clustered SNPs (NCSNP) with the haplotypes in the genomic models was also evaluated. WssGWAS did not perform better than ssGWAS. Cattle YT was found to be a highly polygenic trait, with genes and QTL broadly distributed across the whole genome. Association studies using LD-based haplotypes should include NCSNPs and different LD thresholds to increase the likelihood of finding the relevant genomic regions affecting the trait of interest. The main candidate genes identified, i.e., ATXN10, ADAM10, VAX2, ATP6V1B1, CRISPLD1, CAPRIN1, FA2H, SPEF2, PLXNA1, and CACNA2D3, are involved in important biological processes and metabolic pathways related to behavioral traits, social interactions, and aggressiveness in cattle. Future studies should further investigate the role of these genes.

Download Full-text

Faculty Opinions recommendation of PLINK: a tool set for whole-genome association and population-based linkage analyses.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1162373.622875 ◽

2009 ◽

Cited By ~ 1

Author(s):

Alejandro Schaffer

Keyword(s):

Population Based ◽

Whole Genome ◽

Linkage Analyses ◽

Genome Association ◽

Whole Genome Association ◽

Tool Set

Download Full-text

Linkage disequilibrium and genetic variances under mutation-selection balance.

Genetics ◽

10.1093/genetics/121.4.857 ◽

1989 ◽

Vol 121 (4) ◽

pp. 857-860 ◽

Cited By ~ 1

Author(s):

A Hastings

Keyword(s):

Linkage Disequilibrium ◽

Mutation Rate ◽

Genetic Variance ◽

Harmonic Mean ◽

Recombination Rates ◽

Genetic Variances ◽

Locus Selection

Abstract I determine the contribution of linkage disequilibrium to genetic variances using results for two loci and for induced or marginal systems. The analysis allows epistasis and dominance, but assumes that mutation is weak relative to selection. The linkage disequilibrium component of genetic variance is shown to be unimportant for unlinked loci if the gametic mutation rate divided by the harmonic mean of the pairwise recombination rates is much less than one. For tightly linked loci, linkage disequilibrium is unimportant if the gametic mutation rate divided by the (induced) per locus selection is much less than one.

Download Full-text

Sept8/SEPTIN8 involvement in cellular structure and kidney damage is identified by genetic mapping and a novel human tubule hypoxic model

Scientific Reports ◽

10.1038/s41598-021-81550-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Gregory R. Keele ◽

Jeremy W. Prokop ◽

Hong He ◽

Katie Holl ◽

John Littrell ◽

...

Keyword(s):

Complex Traits ◽

Genetic Model ◽

Association Studies ◽

Model Systems ◽

Linear Mixed Effect Model ◽

Genome Wide Association Studies ◽

Tubulointerstitial Injury ◽

Heritable Variation ◽

Mixed Effect

AbstractChronic kidney disease (CKD), which can ultimately progress to kidney failure, is influenced by genetics and the environment. Genes identified in human genome wide association studies (GWAS) explain only a small proportion of the heritable variation and lack functional validation, indicating the need for additional model systems. Outbred heterogeneous stock (HS) rats have been used for genetic fine-mapping of complex traits, but have not previously been used for CKD traits. We performed GWAS for urinary protein excretion (UPE) and CKD related serum biochemistries in 245 male HS rats. Quantitative trait loci (QTL) were identified using a linear mixed effect model that tested for association with imputed genotypes. Candidate genes were identified using bioinformatics tools and targeted RNAseq followed by testing in a novel in vitro model of human tubule, hypoxia-induced damage. We identified two QTL for UPE and five for serum biochemistries. Protein modeling identified a missense variant within Septin 8 (Sept8) as a candidate for UPE. Sept8/SEPTIN8 expression increased in HS rats with elevated UPE and tubulointerstitial injury and in the in vitro hypoxia model. SEPTIN8 is detected within proximal tubule cells in human kidney samples and localizes with acetyl-alpha tubulin in the culture system. After hypoxia, SEPTIN8 staining becomes diffuse and appears to relocalize with actin. These data suggest a role of SEPTIN8 in cellular organization and structure in response to environmental stress. This study demonstrates that integration of a rat genetic model with an environmentally induced tubule damage system identifies Sept8/SEPTIN8 and informs novel aspects of the complex gene by environmental interactions contributing to CKD risk.

Download Full-text

Algorithms for large-scale whole genome association analysis

Proceedings of the 20th European MPI Users' Group Meeting on - EuroMPI '13 ◽

10.1145/2488551.2488577 ◽

2013 ◽

Cited By ~ 1

Author(s):

Elmar Peise ◽

Diego Fabregat-Traver ◽

Yurii Aulchenko ◽

Paolo Bientinesi

Keyword(s):

Association Analysis ◽

Large Scale ◽

Whole Genome ◽

Genome Association ◽

Whole Genome Association

Download Full-text

Leveraging Multiple Layers of Data To Predict Drosophila Complex Traits

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401847 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4599-4613

Author(s):

Fabio Morgante ◽

Wen Huang ◽

Peter Sørensen ◽

Christian Maltecca ◽

Trudy F. C. Mackay

Keyword(s):

Precision Agriculture ◽

Complex Traits ◽

Prediction Accuracy ◽

Cellular Response ◽

Complex Trait ◽

Sources Of Information ◽

Starvation Resistance ◽

Expression Levels ◽

Chill Coma ◽

Additional Layer

The ability to accurately predict complex trait phenotypes from genetic and genomic data are critical for the implementation of personalized medicine and precision agriculture; however, prediction accuracy for most complex traits is currently low. Here, we used data on whole genome sequences, deep RNA sequencing, and high quality phenotypes for three quantitative traits in the ∼200 inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP) to compare the prediction accuracies of gene expression and genotypes for three complex traits. We found that expression levels (r = 0.28 and 0.38, for females and males, respectively) provided higher prediction accuracy than genotypes (r = 0.07 and 0.15, for females and males, respectively) for starvation resistance, similar prediction accuracy for chill coma recovery (null for both models and sexes), and lower prediction accuracy for startle response (r = 0.15 and 0.14 for female and male genotypes, respectively; and r = 0.12 and 0.11, for females and male transcripts, respectively). Models including both genotype and expression levels did not outperform the best single component model. However, accuracy increased considerably for all the three traits when we included gene ontology (GO) category as an additional layer of information for both genomic variants and transcripts. We found strongly predictive GO terms for each of the three traits, some of which had a clear plausible biological interpretation. For example, for starvation resistance in females, GO:0033500 (r = 0.39 for transcripts) and GO:0032870 (r = 0.40 for transcripts), have been implicated in carbohydrate homeostasis and cellular response to hormone stimulus (including the insulin receptor signaling pathway), respectively. In summary, this study shows that integrating different sources of information improved prediction accuracy and helped elucidate the genetic architecture of three Drosophila complex phenotypes.

Download Full-text