Perspectives on Applications of Hierarchical Gene-To-Phenotype (G2P) Maps to Capture Non-stationary Effects of Alleles in Genomic Prediction

Genomic prediction of complex traits across environments, breeding cycles, and populations remains a challenge for plant breeding. A potential explanation for this is that underlying non-additive genetic (GxG) and genotype-by-environment (GxE) interactions generate allele substitution effects that are non-stationary across different contexts. Such non-stationary effects of alleles are either ignored or assumed to be implicitly captured by most gene-to-phenotype (G2P) maps used in genomic prediction. The implicit capture of non-stationary effects of alleles requires the G2P map to be re-estimated across different contexts. We discuss the development and application of hierarchical G2P maps that explicitly capture non-stationary effects of alleles and have successfully increased short-term prediction accuracy in plant breeding. These hierarchical G2P maps achieve increases in prediction accuracy by allowing intermediate processes such as other traits and environmental factors and their interactions to contribute to complex trait variation. However, long-term prediction remains a challenge. The plant breeding community should undertake complementary simulation and empirical experiments to interrogate various hierarchical G2P maps that connect GxG and GxE interactions simultaneously. The existing genetic correlation framework can be used to assess the magnitude of non-stationary effects of alleles and the predictive ability of these hierarchical G2P maps in long-term, multi-context genomic predictions of complex traits in plant breeding.

Download Full-text

Leveraging Multiple Layers of Data To Predict Drosophila Complex Traits

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401847 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4599-4613

Author(s):

Fabio Morgante ◽

Wen Huang ◽

Peter Sørensen ◽

Christian Maltecca ◽

Trudy F. C. Mackay

Keyword(s):

Precision Agriculture ◽

Complex Traits ◽

Prediction Accuracy ◽

Cellular Response ◽

Complex Trait ◽

Sources Of Information ◽

Starvation Resistance ◽

Expression Levels ◽

Chill Coma ◽

Additional Layer

The ability to accurately predict complex trait phenotypes from genetic and genomic data are critical for the implementation of personalized medicine and precision agriculture; however, prediction accuracy for most complex traits is currently low. Here, we used data on whole genome sequences, deep RNA sequencing, and high quality phenotypes for three quantitative traits in the ∼200 inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP) to compare the prediction accuracies of gene expression and genotypes for three complex traits. We found that expression levels (r = 0.28 and 0.38, for females and males, respectively) provided higher prediction accuracy than genotypes (r = 0.07 and 0.15, for females and males, respectively) for starvation resistance, similar prediction accuracy for chill coma recovery (null for both models and sexes), and lower prediction accuracy for startle response (r = 0.15 and 0.14 for female and male genotypes, respectively; and r = 0.12 and 0.11, for females and male transcripts, respectively). Models including both genotype and expression levels did not outperform the best single component model. However, accuracy increased considerably for all the three traits when we included gene ontology (GO) category as an additional layer of information for both genomic variants and transcripts. We found strongly predictive GO terms for each of the three traits, some of which had a clear plausible biological interpretation. For example, for starvation resistance in females, GO:0033500 (r = 0.39 for transcripts) and GO:0032870 (r = 0.40 for transcripts), have been implicated in carbohydrate homeostasis and cellular response to hormone stimulus (including the insulin receptor signaling pathway), respectively. In summary, this study shows that integrating different sources of information improved prediction accuracy and helped elucidate the genetic architecture of three Drosophila complex phenotypes.

Download Full-text

LDpred-funct: incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets

10.1101/375337 ◽

2018 ◽

Cited By ~ 21

Author(s):

Carla Márquez-Luna ◽

Steven Gazal ◽

Po-Ru Loh ◽

Samuel S. Kim ◽

Nicholas Furlotte ◽

...

Keyword(s):

Complex Traits ◽

Prediction Accuracy ◽

Causal Effect ◽

Complex Trait ◽

Training Data ◽

Data Sets ◽

Uk Biobank ◽

Validation Data ◽

Functional Regions ◽

The Uk

AbstractGenetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a new method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, which includes coding, conserved, regulatory and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. LDpred-funct attained higher prediction accuracy than other polygenic prediction methods in simulations using real genotypes. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank. We used association statistics from British-ancestry samples as training data (avg N=373K) and samples of other European ancestries as validation data (avg N=22K), to minimize confounding. LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R2=0.144; highest R2=0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (total N=1107K; higher heritability in UK Biobank cohort) increased prediction R2 to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.

Download Full-text

The Estimation and Prediction of Geocenter Motion Based on GNSS Weekly Solutions

10.5194/egusphere-egu2020-2504 ◽

2020 ◽

Author(s):

Chunmei Zhao ◽

Lingna Qiao ◽

Tianming Ma

Keyword(s):

Reference Frame ◽

Prediction Accuracy ◽

Center Of Mass ◽

Science Research ◽

Terrestrial Reference Frame ◽

Small Contribution ◽

The Earth ◽

Term Prediction ◽

Geocenter Motion

<p>The development of satellite space geodesy technology makes the establishment of global terrestrial reference frame based on the Earth&#8217;s center of mass become reality. Precise and stable terrestrial reference frame is the foundation of the Earth science research, while determination and analysis of the position of the Earth's center of mass and its change is an important part to build high precision terrestrial reference frame. Based on GNSS weekly solutions provided by IGS, the geocenter motion (GM) time series between 2007 and 2017 are obtained by means of net translation method. Then the amplitude of the annual term of geocentric motion is 2.27mm, 1.84mm and 2.13mm in the direction of X, Y and Z respectively, and the amplitude of the half-year term is 0.1mm, 0.20mm and 0.15mm respectively. In addition, some other inter-annual changes with relatively small contribution rate are found. Finally, in order to get reliable GM prediction ,two kinds of methods are used, which are ARMA and SSA+ARMA. In the short-term prediction, the accuracy of the two methods is the same, both can reach the millimeter level of prediction accuracy, but SSA+ARMA is more stable. SSA+ARMA algorithm is much better in the medium and long-term scale, and it can provide 1mm medium term prediction accuracy and 1.5mm long term prediction accuracy.</p>

Download Full-text

Performance Similarity Based Method for Enhanced Prediction of Manufacturing Process Performance

Manufacturing Engineering and Materials Handling Engineering ◽

10.1115/imece2004-62246 ◽

2004 ◽

Cited By ~ 8

Author(s):

Jianbo Liu ◽

Dragan Djurdjanovic ◽

Jun Ni ◽

Jay Lee

Keyword(s):

Prediction Accuracy ◽

Linear Prediction ◽

Moving Average ◽

Degradation Process ◽

Remaining Useful Life ◽

Term Prediction ◽

Arma Modeling ◽

Prediction Techniques ◽

Long Term Prediction

Full realization of all potentials in predictive and proactive maintenance highly depends on the accuracy of long-term predictions of the remaining useful life of manufacturing equipment. Parametric linear prediction techniques, such as Autoregressive Moving Average modeling (ARMA), are routinely used to trend and predict future behavior of any time series, but are frequently not appropriate for long-term prediction because of the highly complicated and non-stationary nature of manufacturing processes. In this paper, we propose a novel method that is capable of achieving high long-term prediction accuracy by comparing signatures from two degradation processes using measures of similarity that form a Match Matrix. Through this concept, we can effectively include large amounts of historical information into the prediction of the current degradation process. Similarities with historical records are used to generate possible future distributions of features, which is then used to predict probabilities of failure over time by evaluating overlaps between predicted feature distributions and feature distributions related to unacceptable equipment behavior. Experimental results show that the proposed method results in a significant improvement of long-term prediction accuracy compared with ARMA modeling-based prediction.

Download Full-text

Comparison of Genomic Selection Models for Exploring Predictive Ability of Complex Traits in Breeding Programs

10.1101/2021.04.15.440015 ◽

2021 ◽

Author(s):

Lance F. Merrick ◽

Arron H. Carter

Keyword(s):

Complex Traits ◽

Seedling Emergence ◽

Predictive Ability ◽

Complex Trait ◽

Parametric Models ◽

Support Vector ◽

Breeding Programs ◽

Breeding Lines ◽

Mapping Panel ◽

Increase In Accuracy

AbstractTraits with a complex unknown genetic architecture are common in breeding programs. However, they pose a challenge for selection due to a combination of complex environmental and pleiotropic effects that impede the ability to create mapping populations to characterize the trait’s genetic basis. One such trait, seedling emergence of wheat (Triticum aestivumL.) from deep planting, presents a unique opportunity to explore the best method to use and implement GS models to predict a complex trait. 17 GS models were compared using two training populations, consisting of 473 genotypes from a diverse association mapping panel (DP) phenotyped from 2015-2019 and the other training population consisting of 643 breeding lines phenotyped in 2015 and 2020 in Lind, WA with 40,368 markers. There were only a few significant differences between GS models, with support vector machines reaching the highest accuracy of 0.56 in a single breeding line trial using cross-validations. However, the consistent moderate accuracy of cBLUP and other parametric models indicates no need to implement computationally demanding non-parametric models for complex traits. There was an increase in accuracy using cross-validations from 0.40 to 0.41 and independent validations from 0.10 to 0.17 using diversity panels lines to breeding lines. The environmental effects of complex traits can be overcome by combining years of the same populations. Overall, our study showed that breeders can accurately predict and implement GS for a complex trait by using parametric models within their own breeding programs with increased accuracy as they combine training populations over the years.

Download Full-text

Generalizable approaches for genomic prediction of metabolites in plants

10.1101/2021.11.24.469870 ◽

2021 ◽

Author(s):

Lauren J Brzozowski ◽

Malachy T Campbell ◽

Haixiao Hu ◽

Melanie Caffe ◽

Lucia Guterrez ◽

...

Keyword(s):

Plant Breeding ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Group Model ◽

Plant Metabolites ◽

Plant Populations ◽

Breeding Programs ◽

Breeding Lines ◽

Consistent Model ◽

Selection For

Plant metabolites are important for plant breeders to improve nutrition and agronomic performance, yet integrating selection for metabolomic traits is limited by phenotyping expense and limited genetic characterization, especially of uncommon metabolites. As such, developing biologically-based and generalizable genomic selection methods for metabolites that are transferable across plant populations would benefit plant breeding programs. We tested genomic prediction accuracy for more than 600 metabolites measured by GC-MS and LC-MS in oat (Avena sativa L.) seed. Using a discovery germplasm panel, we conducted metabolite GWAS (mGWAS) and selected loci to use in multi-kernel models that encompassed metabolome-wide mGWAS results, or mGWAS from specific metabolite structures or biosynthetic pathways. Metabolite kernels developed from LC-MS metabolites in the discovery panel improved prediction accuracy of LC-MS metabolite traits in the validation panel, consisting of more advanced breeding lines. No approach, however, improved prediction accuracy for GC-MS metabolites. We tested if similar metabolites had consistent model ranks and found that, while different metrics of similarity had different results, using annotation-free methods to group metabolites led to consistent within-group model rankings. Overall, testing biological rationales for developing kernels for genomic prediction across populations, contributes to developing frameworks for plant breeding for metabolite traits.

Download Full-text

Harnessing Genetic Diversity in the USDA Pea Germplasm Collection Through Genomic Prediction

Frontiers in Genetics ◽

10.3389/fgene.2021.707754 ◽

2021 ◽

Vol 12 ◽

Author(s):

Md. Abdullah Al Bari ◽

Ping Zheng ◽

Indalecio Viera ◽

Hannah Worral ◽

Stephen Szwiec ◽

...

Keyword(s):

Genetic Diversity ◽

Seed Yield ◽

Genomic Prediction ◽

Complex Traits ◽

Prediction Models ◽

Germplasm Collection ◽

Predictive Ability ◽

Snp Markers ◽

Breeding Values ◽

Germplasm Collections

Phenotypic evaluation and efficient utilization of germplasm collections can be time-intensive, laborious, and expensive. However, with the plummeting costs of next-generation sequencing and the addition of genomic selection to the plant breeder’s toolbox, we now can more efficiently tap the genetic diversity within large germplasm collections. In this study, we applied and evaluated genomic prediction’s potential to a set of 482 pea (Pisum sativum L.) accessions—genotyped with 30,600 single nucleotide polymorphic (SNP) markers and phenotyped for seed yield and yield-related components—for enhancing selection of accessions from the USDA Pea Germplasm Collection. Genomic prediction models and several factors affecting predictive ability were evaluated in a series of cross-validation schemes across complex traits. Different genomic prediction models gave similar results, with predictive ability across traits ranging from 0.23 to 0.60, with no model working best across all traits. Increasing the training population size improved the predictive ability of most traits, including seed yield. Predictive abilities increased and reached a plateau with increasing number of markers presumably due to extensive linkage disequilibrium in the pea genome. Accounting for population structure effects did not significantly boost predictive ability, but we observed a slight improvement in seed yield. By applying the best genomic prediction model (e.g., RR-BLUP), we then examined the distribution of genotyped but nonphenotyped accessions and the reliability of genomic estimated breeding values (GEBV). The distribution of GEBV suggested that none of the nonphenotyped accessions were expected to perform outside the range of the phenotyped accessions. Desirable breeding values with higher reliability can be used to identify and screen favorable germplasm accessions. Expanding the training set and incorporating additional orthogonal information (e.g., transcriptomics, metabolomics, physiological traits, etc.) into the genomic prediction framework can enhance prediction accuracy.

Download Full-text

Multi-trait Genomic Prediction Model Increased the Predictive Ability for Agronomic and Malting Quality Traits in Barley (Hordeum vulgare L.)

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400968 ◽

2020 ◽

Vol 10 (3) ◽

pp. 1113-1124 ◽

Cited By ~ 8

Author(s):

Madhav Bhatta ◽

Lucia Gutierrez ◽

Lorena Cammarota ◽

Fernanda Cardozo ◽

Silvia Germán ◽

...

Keyword(s):

Prediction Model ◽

Genomic Prediction ◽

Complex Traits ◽

Prediction Models ◽

Agronomic Traits ◽

Predictive Ability ◽

Malting Quality ◽

Quality Traits ◽

Multiple Traits ◽

Correlated Traits

Plant breeders regularly evaluate multiple traits across multiple environments, which opens an avenue for using multiple traits in genomic prediction models. We assessed the potential of multi-trait (MT) genomic prediction model through evaluating several strategies of incorporating multiple traits (eight agronomic and malting quality traits) into the prediction models with two cross-validation schemes (CV1, predicting new lines with genotypic information only and CV2, predicting partially phenotyped lines using both genotypic and phenotypic information from correlated traits) in barley. The predictive ability was similar for single (ST-CV1) and multi-trait (MT-CV1) models to predict new lines. However, the predictive ability for agronomic traits was considerably increased when partially phenotyped lines (MT-CV2) were used. The predictive ability for grain yield using the MT-CV2 model with other agronomic traits resulted in 57% and 61% higher predictive ability than ST-CV1 and MT-CV1 models, respectively. Therefore, complex traits such as grain yield are better predicted when correlated traits are used. Similarly, a considerable increase in the predictive ability of malting quality traits was observed when correlated traits were used. The predictive ability for grain protein content using the MT-CV2 model with both agronomic and malting traits resulted in a 76% higher predictive ability than ST-CV1 and MT-CV1 models. Additionally, the higher predictive ability for new environments was obtained for all traits using the MT-CV2 model compared to the MT-CV1 model. This study showed the potential of improving the genomic prediction of complex traits by incorporating the information from multiple traits (cost-friendly and easy to measure traits) collected throughout breeding programs which could assist in speeding up breeding cycles.

Download Full-text

MLP ensembles improve long term prediction accuracy over single networks

International Journal of Forecasting ◽

10.1016/j.ijforecast.2009.05.029 ◽

2011 ◽

Vol 27 (3) ◽

pp. 661-671 ◽

Cited By ~ 33

Author(s):

Paulo J.L. Adeodato ◽

Adrian L. Arnaud ◽

Germano C. Vasconcelos ◽

Rodrigo C.L.V. Cunha ◽

Domingos S.M.P. Monteiro

Keyword(s):

Prediction Accuracy ◽

Term Prediction ◽

Long Term Prediction

Download Full-text

Haplotype associated RNA expression (HARE) improves prediction of complex traits in maize

PLoS Genetics ◽

10.1371/journal.pgen.1009568 ◽

2021 ◽

Vol 17 (10) ◽

pp. e1009568

Author(s):

Anju Giri ◽

Merritt Khaipho-Burch ◽

Edward S. Buckler ◽

Guillaume P. Ramstein

Keyword(s):

Genomic Prediction ◽

Complex Traits ◽

Prediction Accuracy ◽

Prediction Models ◽

Cost Effective ◽

Rna Expression ◽

Haplotype Structure ◽

Association Panel ◽

Nested Association Mapping ◽

Mapping Panel

Genomic prediction typically relies on associations between single-site polymorphisms and traits of interest. This representation of genomic variability has been successful for predicting many complex traits. However, it usually cannot capture the combination of alleles in haplotypes and it has generated little insight about the biological function of polymorphisms. Here we present a novel and cost-effective method for imputing cis haplotype associated RNA expression (HARE), studied their transferability across tissues, and evaluated genomic prediction models within and across populations. HARE focuses on tightly linked cis acting causal variants in the immediate vicinity of the gene, while excluding trans effects from diffusion and metabolism. Therefore, HARE estimates were more transferrable across different tissues and populations compared to measured transcript expression. We also showed that HARE estimates captured one-third of the variation in gene expression. HARE estimates were used in genomic prediction models evaluated within and across two diverse maize panels–a diverse association panel (Goodman Association panel) and a large half-sib panel (Nested Association Mapping panel)–for predicting 26 complex traits. HARE resulted in up to 15% higher prediction accuracy than control approaches that preserved haplotype structure, suggesting that HARE carried functional information in addition to information about haplotype structure. The largest increase was observed when the model was trained in the Nested Association Mapping panel and tested in the Goodman Association panel. Additionally, HARE yielded higher within-population prediction accuracy as compared to measured expression values. The accuracy achieved by measured expression was variable across tissues, whereas accuracy by HARE was more stable across tissues. Therefore, imputing RNA expression of genes by haplotype is stable, cost-effective, and transferable across populations.

Download Full-text