Strategies to Increase Prediction Accuracy in Genomic Selection of Complex Traits in Alfalfa (Medicago sativa L.)

Agronomic traits such as biomass yield and abiotic stress tolerance are genetically complex and challenging to improve by conventional breeding strategies. Genomic selection (GS) is an alternative approach in which genome-wide markers are used to determine the genomic estimated breeding value (GEBV) of individuals in a population. In alfalfa, previous results indicated that low to moderate prediction accuracy values (<70%) were obtained in complex traits such as yield and abiotic stress resistance. There is a need to increase the prediction value in order to employ GS in breeding programs. In this paper we reviewed different statistic models and their applications in polyploid crops including alfalfa. Specifically, we used empirical data affiliated with alfalfa yield under salt stress to investigate approaches which use DNA marker importance values derived from machine learning models, and genome-wide association studies (GWAS) of marker-trait association scores based on different GWASpoly models, in weighted GBLUP analyses. This approach increased prediction accuracies from 50% to more than 80% for alfalfa yield under salt stress. This is the first report in alfalfa to use variable importance and GWAS-assisted approaches to increase the prediction accuracy of GS, thus helping to select superior alfalfa lines based on their GEBVs.

Download Full-text

Strategies to Increase Prediction Accuracy in Genomic Selection of Complex Traits in Alfalfa (Medicago sativa L.)

Cells ◽

10.3390/cells10123372 ◽

2021 ◽

Vol 10 (12) ◽

pp. 3372

Author(s):

Cesar A. Medina ◽

Harpreet Kaur ◽

Ian Ray ◽

Long-Xi Yu

Keyword(s):

Salt Stress ◽

Abiotic Stress ◽

Medicago Sativa ◽

Genomic Selection ◽

Complex Traits ◽

Prediction Accuracy ◽

Breeding Value ◽

Phenotypic Traits ◽

Genome Wide ◽

Medicago Sativa L

Agronomic traits such as biomass yield and abiotic stress tolerance are genetically complex and challenging to improve through conventional breeding approaches. Genomic selection (GS) is an alternative approach in which genome-wide markers are used to determine the genomic estimated breeding value (GEBV) of individuals in a population. In alfalfa (Medicago sativa L.), previous results indicated that low to moderate prediction accuracy values (<70%) were obtained in complex traits, such as yield and abiotic stress resistance. There is a need to increase the prediction value in order to employ GS in breeding programs. In this paper we reviewed different statistic models and their applications in polyploid crops, such as alfalfa and potato. Specifically, we used empirical data affiliated with alfalfa yield under salt stress to investigate approaches that use DNA marker importance values derived from machine learning models, and genome-wide association studies (GWAS) of marker-trait association scores based on different GWASpoly models, in weighted GBLUP analyses. This approach increased prediction accuracies from 50% to more than 80% for alfalfa yield under salt stress. Finally, we expended the weighted GBLUP approach to potato and analyzed 13 phenotypic traits and obtained similar results. This is the first report on alfalfa to use variable importance and GWAS-assisted approaches to increase the prediction accuracy of GS, thus helping to select superior alfalfa lines based on their GEBVs.

Download Full-text

Genome-wide association and genomic selection in animal breedingThis article is one of a selection of papers from the conference “Exploiting Genome-wide Association in Oilseed Brassicas: a model for genetic improvement of major OECD crops for sustainable farming”.

Genome ◽

10.1139/g10-076 ◽

2010 ◽

Vol 53 (11) ◽

pp. 876-883 ◽

Cited By ~ 135

Author(s):

Ben Hayes ◽

Mike Goddard

Keyword(s):

Genomic Selection ◽

Complex Traits ◽

Association Studies ◽

Genome Wide Association ◽

Relationship Matrix ◽

Genome Wide Association Studies ◽

Simple Method ◽

Breeding Values ◽

Genome Wide ◽

A Genome

Results from genome-wide association studies in livestock, and humans, has lead to the conclusion that the effect of individual quantitative trait loci (QTL) on complex traits, such as yield, are likely to be small; therefore, a large number of QTL are necessary to explain genetic variation in these traits. Given this genetic architecture, gains from marker-assisted selection (MAS) programs using only a small number of DNA markers to trace a limited number of QTL is likely to be small. This has lead to the development of alternative technology for using the available dense single nucleotide polymorphism (SNP) information, called genomic selection. Genomic selection uses a genome-wide panel of dense markers so that all QTL are likely to be in linkage disequilibrium with at least one SNP. The genomic breeding values are predicted to be the sum of the effect of these SNPs across the entire genome. In dairy cattle breeding, the accuracy of genomic estimated breeding values (GEBV) that can be achieved and the fact that these are available early in life have lead to rapid adoption of the technology. Here, we discuss the design of experiments necessary to achieve accurate prediction of GEBV in future generations in terms of the number of markers necessary and the size of the reference population where marker effects are estimated. We also present a simple method for implementing genomic selection using a genomic relationship matrix. Future challenges discussed include using whole genome sequence data to improve the accuracy of genomic selection and management of inbreeding through genomic relationships.

Download Full-text

On prs for complex polygenic trait prediction

10.1101/447797 ◽

2018 ◽

Cited By ~ 2

Author(s):

Bingxin Zhao ◽

Fei Zou

Keyword(s):

Complex Traits ◽

Prediction Accuracy ◽

Association Studies ◽

Prediction Method ◽

Causal Snps ◽

Polygenic Risk Score ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Polygenic Traits ◽

Low Performance

Polygenic risk score (PRS) is the state-of-art prediction method for complex traits using summary level data from discovery genome-wide association studies (GWAS). The PRS, as its name suggests, is designed for polygenic traits by aggregating small genetic effects from a large number of causal SNPs and thus is viewed as a powerful method for predicting complex polygenic traits by the genetics community. However, one concern is that the prediction accuracy of PRS in practice remains low with little clinical utility, even for highly heritable traits. Another practical concern is whether genome-wide SNPs should be used in constructing PRS or not. To address the two concerns, we investigate PRS both empirically and theoretically. We show how the performance of PRS is influenced by the triplet (n, p, m), where n, p, m are the sample size, the number of SNPs studied, and the number of true causal SNPs, respectively. For a given heritability, we find that i) when PRS is constructed with all p SNPs (referred as GWAS-PRS), its prediction accuracy is controlled by the p/n ratio; while ii) when PRS is built with a set of top-ranked SNPs that pass a pre-specified threshold (referred as threshold-PRS), its accuracy varies depending on how sparse the true genetic signals are. Only when m is magnitude smaller than n, or genetic signals are sparse, can threshold-PRS perform well and outperform GWAS-PRS. Our results demystify the low performance of PRS in predicting highly polygenic traits, which will greatly increase researchers’ aware-ness of the power and limitations of PRS, and clear up some confusion on the clinical application of PRS.

Download Full-text

Investigating the Effect of Imputed Structural Variants from Whole-Genome Sequence on Genome-Wide Association and Genomic Prediction in Dairy Cattle

Animals ◽

10.3390/ani11020541 ◽

2021 ◽

Vol 11 (2) ◽

pp. 541

Author(s):

Long Chen ◽

Jennie E. Pryce ◽

Ben J. Hayes ◽

Hans D. Daetwyler

Keyword(s):

Dairy Cattle ◽

Genomic Prediction ◽

Complex Traits ◽

Prediction Accuracy ◽

Association Studies ◽

Genome Wide Association ◽

Whole Genome Sequence ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Genome Wide

Structural variations (SVs) are large DNA segments of deletions, duplications, copy number variations, inversions and translocations in a re-sequenced genome compared to a reference genome. They have been found to be associated with several complex traits in dairy cattle and could potentially help to improve genomic prediction accuracy of dairy traits. Imputation of SVs was performed in individuals genotyped with single-nucleotide polymorphism (SNP) panels without the expense of sequencing them. In this study, we generated 24,908 high-quality SVs in a total of 478 whole-genome sequenced Holstein and Jersey cattle. We imputed 4489 SVs with R2 > 0.5 into 35,568 Holstein and Jersey dairy cattle with 578,999 SNPs with two pipelines, FImpute and Eagle2.3-Minimac3. Genome-wide association studies for production, fertility and overall type with these 4489 SVs revealed four significant SVs, of which two were highly linked to significant SNP. We also estimated the variance components for SNP and SV models for these traits using genomic best linear unbiased prediction (GBLUP). Furthermore, we assessed the effect on genomic prediction accuracy of adding SVs to GBLUP models. The estimated percentage of genetic variance captured by SVs for production traits was up to 4.57% for milk yield in bulls and 3.53% for protein yield in cows. Finally, no consistent increase in genomic prediction accuracy was observed when including SVs in GBLUP.

Download Full-text

Integrating genome-wide association mapping of additive and dominance genetic effects to improve genomic prediction accuracy in Eucalyptus

10.1101/841049 ◽

2019 ◽

Author(s):

Biyue Tan ◽

Pär K. Ingvarsson

Keyword(s):

Genomic Selection ◽

Complex Traits ◽

Association Studies ◽

Wood Property ◽

Genomic Variation ◽

Added Value ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Prediction Ability ◽

Genome Wide

SummaryGenome-wide association studies (GWAS) is a powerful and widely used approach to decipher the genetic control of complex traits. A major challenge for dissecting quantitative traits in forest trees is statistical power. In this study, we use a population consisting of 1123 samples from two successive generations that have been phenotyped for growth and wood property traits and genotyped using the EuChip60K chip, yielding 37,832 informative SNPs. We use multi-locus GWAS models to assess both additive and dominance effects to identify markers associated with growth and wood property traits in the eucalypt hybrids. Additive and dominance association models identified 78 and 82 significant SNPs across all traits, respectively, which captured between 39 and 86% of the genomic-based heritability. We also used SNPs identified from the GWAS and SNPs using less stringent significance thresholds to evaluate predictive abilities in a genomic selection framework. Genomic selection models based on the top 1% SNPs captured a substantially greater proportion of the genetic variance of traits compared to when all SNPs were used for model training. The prediction ability of estimated breeding values was significantly improved for all traits using either the top 1% SNPs or SNPs identified using a relaxed p-value threshold (p<10-3). This study highlights the added value of also considering dominance effects for identifying genomic regions controlling growth traits in trees. Moreover, integrating GWAS results into genomic selection method provides enhanced power relative to discrete associations for identifying genomic variation potentially useful in tree breeding.

Download Full-text

Faculty Opinions recommendation of Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.733803377.793550136 ◽

2018 ◽

Author(s):

Mohan Liu

Keyword(s):

Effect Size ◽

Complex Traits ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Size Distributions ◽

Complex Effect ◽

Genome Wide ◽

Level Statistics

Download Full-text

Genetics of complex traits: prediction of phenotype, identification of causal polymorphisms and genetic architecture

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2016.0569 ◽

2016 ◽

Vol 283 (1835) ◽

pp. 20160569 ◽

Cited By ~ 52

Author(s):

M. E. Goddard ◽

K. E. Kemper ◽

I. M. MacLeod ◽

A. J. Chamberlain ◽

B. J. Hayes

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Quantitative Traits ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Crop Breeding ◽

Single Nucleotide ◽

Genome Wide ◽

Phenotype Identification

Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.

Download Full-text

Exploring the predictive power of polygenic scores derived from genome-wide association studies: a study of 10 complex traits

Bioinformatics ◽

10.1093/bioinformatics/btw745 ◽

2017 ◽

pp. btw745 ◽

Cited By ~ 8

Author(s):

Hon-Cheong So ◽

Pak C. Sham

Keyword(s):

Complex Traits ◽

Predictive Power ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Polygenic Scores

Download Full-text

Comprehensive evaluation of mapping complex traits in wheat using genome-wide association studies

Molecular Breeding ◽

10.1007/s11032-021-01272-7 ◽

2021 ◽

Vol 42 (1) ◽

Author(s):

Dinesh K. Saini ◽

Yuvraj Chopra ◽

Jagmohan Singh ◽

Karansher S. Sandhu ◽

Anand Kumar ◽

...

Keyword(s):

Complex Traits ◽

Comprehensive Evaluation ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Mapping Complex Traits

Download Full-text

Abstract 1443: Systems-Based Analysis of HDL Metabolism in a Mouse Intercross between Strains CAST and C57BL6/J (CxB)

Circulation ◽

10.1161/circ.118.suppl_18.s_326-c ◽

2008 ◽

Vol 118 (suppl_18) ◽

Author(s):

Margarete Mehrabian ◽

Charles Farber ◽

Peter Langfelder ◽

Anatole Ghazalpour ◽

Zhiqiang Zhou ◽

...

Keyword(s):

Complex Traits ◽

Association Studies ◽

Plasma Lipid ◽

Heart Association ◽

Hdl Cholesterol ◽

Chow Diet ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Genomic Study ◽

Hdl Metabolism

A recent meta-analysis of three large genome-wide association studies for HDL cholesterol levels revealed several highly significant associations, but altogether these explained less than 10% of the population variance of HDL. Since HDL levels are highly heritable, with heritability estimated at 50–70% in many studies, there are clearly many additional genes, and probably complex genetic and environmental interactions, involved in HDL metabolism. Thus, if “personalized medicine” is to become a reality, these complex factors must be addressed. Combined genetic-genomic approaches have rejuvenated the analysis of complex traits using mouse models, and here report an integrative genomic study of HDL in a large mouse cross. We previously reported the identification of loci associated with HDL cholesterol concentrations using a CXB F2 intercross. We have now generated a much larger CXB cross, consisting of 438 mice, and have integrated genome wide gene expression analysis of liver and adipose with quantitative trait locus (QTL) mapping and causality modeling. These studies were carried out on mice fed a low fat, chow diet and then switched to a high fat, ’Western’ diet. QTL analysis on the clinical traits using R/QTL (http://cran.r-project.org/) revealed a complex inheritance pattern with significant LOD scores at 9 loci, on chromosomes 1,2,4,5,8,9,10,16,18. Of these loci, 6 (chr: 1,4,5,10,16,18) were seen to be involved in genetic-dietary regulation of HDL cholesterol. Expression QTLs (eQTL) were determined using Agilent microarrays for 23,624 transcripts. Genes expressed within a 1-LOD support interval or correlated with HDL (p<2.7E-11) in both adipose and liver were identified. Using Network Edge Orienting (NEO) methods, causal relationships between the identified genes, related QTL peak markers and HDL levels were accessed. The genes were then ranked based on the NEO scores. In liver the highest ranked genes were associated with mitochondrial, ER and golgi trafficking. In adipose, on the other hand, pathways associated with cell signaling, transcription regulation and protein ubiquitation were predicted to be causal for HDL levels. In conclusion, our results reveal a large number of novel pathways and candidate genes for plasma lipid metabolism. This research has received full or partial funding support from the American Heart Association, AHA Western States Affiliate (California, Nevada & Utah).

Download Full-text