scholarly journals Multi-omics prediction of oat agronomic and seed nutritional traits across environments and in distantly related populations

2021 ◽  
Vol 134 (12) ◽  
pp. 4043-4054
Author(s):  
Haixiao Hu ◽  
Malachy T. Campbell ◽  
Trevor H. Yeats ◽  
Xuying Zheng ◽  
Daniel E. Runcie ◽  
...  

Abstract Key message Integration of multi-omics data improved prediction accuracies of oat agronomic and seed nutritional traits in multi-environment trials and distantly related populations in addition to the single-environment prediction. Abstract Multi-omics prediction has been shown to be superior to genomic prediction with genome-wide DNA-based genetic markers (G) for predicting phenotypes. However, most of the existing studies were based on historical datasets from one environment; therefore, they were unable to evaluate the efficiency of multi-omics prediction in multi-environment trials and distantly related populations. To fill those gaps, we designed a systematic experiment to collect omics data and evaluate 17 traits in two oat breeding populations planted in single and multiple environments. In the single-environment trial, transcriptomic BLUP (T), metabolomic BLUP (M), G + T, G + M, and G + T + M models showed greater prediction accuracy than GBLUP for 5, 10, 11, 17, and 17 traits, respectively, and metabolites generally performed better than transcripts when combined with SNPs. In the multi-environment trial, multi-trait models with omics data outperformed both counterpart multi-trait GBLUP models and single-environment omics models, and the highest prediction accuracy was achieved when modeling genetic covariance as an unstructured covariance model. We also demonstrated that omics data can be used to prioritize loci from one population with omics data to improve genomic prediction in a distantly related population using a two-kernel linear model that accommodated both likely casual loci with large-effect and loci that explain little or no phenotypic variance. We propose that the two-kernel linear model is superior to most genomic prediction models that assume each variant is equally likely to affect the trait and can be used to improve prediction accuracy for any trait with prior knowledge of genetic architecture.

2021 ◽  
Author(s):  
Haixiao Hu ◽  
Malachy T Campbell ◽  
Trevor Howard Yeats ◽  
Xuying Zheng ◽  
Daniel E Runcie ◽  
...  

Multi-omics prediction has been shown to be superior to genomic prediction with genome-wide DNA-based genetic markers (G) for predicting phenotypes. However, most of the existing studies were based on historical datasets from one environment; therefore, they were unable to evaluate the efficiency of multi-omics prediction in multi-environment trials and distantly-related populations. To fill those gaps, we designed a systematic experiment to collect omics data and evaluate 17 traits in two oat breeding populations planted in single and multiple environments. In the single-environment trial, transcriptomic BLUP (T), metabolomic BLUP (M), G+T, G+M and G+T+M models showed greater prediction accuracy than GBLUP for 5, 10, 11, 17 and 17 traits, respectively, and metabolites generally performed better than transcripts when combined with SNPs. In the multi-environment trial, multi-trait models with omics data outperformed both counterpart multi-trait GBLUP models and single-environment omics models, and the highest prediction accuracy was achieved when modeling genetic covariance as an unstructured covariance model. We also demonstrated that omics data can be used to prioritize loci from one population with omics data to improve genomic prediction in a distantly-related population using a two-kernel linear model that accommodated both likely casual loci with large-effect and loci that explain little or no phenotypic variance. We propose that the two-kernel linear model is superior to most genomic prediction models that assume each variant is equally likely to affect the trait and can be used to improve prediction accuracy for any trait with prior knowledge of genetic architecture.


2021 ◽  
Author(s):  
Haixiao Hu ◽  
Malachy Campbell ◽  
Trevor Howard Yeats ◽  
Xuying Zheng ◽  
Daniel Runcie ◽  
...  

Abstract Key message Integration of multi-omics data improved prediction accuracies of oat agronomic and seed nutritional traits in multi-environment trials and distantly-related populations in addition to the single-environment prediction.Multi-omics prediction has been shown to be superior to genomic prediction with genome-wide DNA-based genetic markers (G) for predicting phenotypes. However, most of the existing studies were based on historical datasets from one environment; therefore, they were unable to evaluate the efficiency of multi-omics prediction in multi-environment trials and distantly-related populations. To fill those gaps, we designed a systematic experiment to collect omics data and evaluate 17 traits in two oat breeding populations planted in single and multiple environments. In the single-environment trial, transcriptomic BLUP (T), metabolomic BLUP (M), G + T, G + M and G + T + M models showed greater prediction accuracy than GBLUP for 5, 10, 11, 17 and 17 traits, respectively, and metabolites generally performed better than transcripts when combined with SNPs. In the multi-environment trial, multi-trait models with omics data outperformed both counterpart multi-trait GBLUP models and single-environment omics models, and the highest prediction accuracy was achieved when modeling genetic covariance as an unstructured covariance model. We also demonstrated that omics data can be used to prioritize loci from one population with omics data to improve genomic prediction in a distantly-related population using a two-kernel linear model that accommodated both likely casual loci with large-effect and loci that explain little or no phenotypic variance. We propose that the two-kernel linear model is superior to most genomic prediction models that assume each variant is equally likely to affect the trait and can be used to improve prediction accuracy for any trait with prior knowledge of genetic architecture.


2019 ◽  
Author(s):  
Daniel Runcie ◽  
Hao Cheng

ABSTRACTIncorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful.


2020 ◽  
Vol 71 (20) ◽  
pp. 6670-6683
Author(s):  
Xiongwei Zhao ◽  
Gang Nie ◽  
Yanyu Yao ◽  
Zhongjie Ji ◽  
Jianhua Gao ◽  
...  

Abstract Genomic prediction of nitrogen-use efficiency (NUE) has not previously been studied in perennial grass species exposed to low-N stress. Here, we conducted a genomic prediction of physiological traits and NUE in 184 global accessions of perennial ryegrass (Lolium perenne) in response to a normal (7.5 mM) and low (0.75 mM) supply of N. After 21 d of treatment under greenhouse conditions, significant variations in plant height increment (ΔHT), leaf fresh weight (LFW), leaf dry weight (LDW), chlorophyll index (Chl), chlorophyll fluorescence, leaf N and carbon (C) contents, C/N ratio, and NUE were observed in accessions , but to a greater extent under low-N stress. Six genomic prediction models were applied to the data, namely the Bayesian method Bayes C, Bayesian LASSO, Bayesian Ridge Regression, Ridge Regression-Best Linear Unbiased Prediction, Reproducing Kernel Hilbert Spaces, and randomForest. These models produced similar prediction accuracy of traits within the normal or low-N treatments, but the accuracy differed between the two treatments. ΔHT, LFW, LDW, and C were predicted slightly better under normal N with a mean Pearson r-value of 0.26, compared with r=0.22 under low N, while the prediction accuracies for Chl, N, C/N, and NUE were significantly improved under low-N stress with a mean r=0.45, compared with r=0.26 under normal N. The population panel contained three population structures, which generally had no effect on prediction accuracy. The moderate prediction accuracies obtained for N, C, and NUE under low-N stress are promising, and suggest a feasible means by which germplasm might be initially assessed for further detailed studies in breeding programs.


Genes ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 16 ◽  
Author(s):  
Christine Nyaga ◽  
Manje Gowda ◽  
Yoseph Beyene ◽  
Wilson T. Muriithi ◽  
Dan Makumbi ◽  
...  

Maize lethal necrosis (MLN), caused by co-infection of maize chlorotic mottle virus and sugarcane mosaic virus, can lead up to 100% yield loss. Identification and validation of genomic regions can facilitate marker assisted breeding for resistance to MLN. Our objectives were to identify marker-trait associations using genome wide association study and assess the potential of genomic prediction for MLN resistance in a large panel of diverse maize lines. A set of 1400 diverse maize tropical inbred lines were evaluated for their response to MLN under artificial inoculation by measuring disease severity or incidence and area under disease progress curve (AUDPC). All lines were genotyped with genotyping by sequencing (GBS) SNPs. The phenotypic variation was significant for all traits and the heritability estimates were moderate to high. GWAS revealed 32 significantly associated SNPs for MLN resistance (at p < 1.0 × 10−6). For disease severity, these significantly associated SNPs individually explained 3–5% of the total phenotypic variance, whereas for AUDPC they explained 3–12% of the total proportion of phenotypic variance. Most of significant SNPs were consistent with the previous studies and assists to validate and fine map the big quantitative trait locus (QTL) regions into few markers’ specific regions. A set of putative candidate genes associated with the significant markers were identified and their functions revealed to be directly or indirectly involved in plant defense responses. Genomic prediction revealed reasonable prediction accuracies. The prediction accuracies significantly increased with increasing marker densities and training population size. These results support that MLN is a complex trait controlled by few major and many minor effect genes.


Genetics ◽  
2020 ◽  
Vol 216 (1) ◽  
pp. 27-41
Author(s):  
Simon Rio ◽  
Laurence Moreau ◽  
Alain Charcosset ◽  
Tristan Mary-Huard

Populations structured into genetic groups may display group-specific linkage disequilibrium, mutations, and/or interactions between quantitative trait loci and the genetic background. These factors lead to heterogeneous marker effects affecting the efficiency of genomic prediction, especially for admixed individuals. Such individuals have a genome that is a mosaic of chromosome blocks from different origins, and may be of interest to combine favorable group-specific characteristics. We developed two genomic prediction models adapted to the prediction of admixed individuals in presence of heterogeneous marker effects: multigroup admixed genomic best linear unbiased prediction random individual (MAGBLUP-RI), modeling the ancestry of alleles; and multigroup admixed genomic best linear unbiased prediction random allele effect (MAGBLUP-RAE), modeling group-specific distributions of allele effects. MAGBLUP-RI can estimate the segregation variance generated by admixture while MAGBLUP-RAE can disentangle the variability that is due to main allele effects from the variability that is due to group-specific deviation allele effects. Both models were evaluated for their genomic prediction accuracy using a maize panel including lines from the Dent and Flint groups, along with admixed individuals. Based on simulated traits, both models proved their efficiency to improve genomic prediction accuracy compared to standard GBLUP models. For real traits, a clear gain was observed at low marker densities whereas it became limited at high marker densities. The interest of including admixed individuals in multigroup training sets was confirmed using simulated traits, but was variable using real traits. Both MAGBLUP models and admixed individuals are of interest whenever group-specific SNP allele effects exist.


2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Cheng Bian ◽  
Dzianis Prakapenka ◽  
Cheng Tan ◽  
Ruifei Yang ◽  
Di Zhu ◽  
...  

Abstract Background Genomic selection using single nucleotide polymorphism (SNP) markers has been widely used for genetic improvement of livestock, but most current methods of genomic selection are based on SNP models. In this study, we investigated the prediction accuracies of haplotype models based on fixed chromosome distances and gene boundaries compared to those of SNP models for genomic prediction of phenotypic values. We also examined the reasons for the successes and failures of haplotype genomic prediction. Methods We analyzed a swine population of 3195 Duroc boars with records on eight traits: body judging score (BJS), teat number (TN), age (AGW), loin muscle area (LMA), loin muscle depth (LMD) and back fat thickness (BF) at 100 kg live weight, and average daily gain (ADG) and feed conversion rate (FCR) from 30 to100 kg live weight. Ten-fold validation was used to evaluate the prediction accuracy of each SNP model and each multi-allelic haplotype model based on 488,124 autosomal SNPs from low-coverage sequencing. Haplotype blocks were defined using fixed chromosome distances or gene boundaries. Results Compared to the best SNP model, the accuracy of predicting phenotypic values using a haplotype model was greater by 7.4% for BJS, 7.1% for AGW, 6.6% for ADG, 4.9% for FCR, 2.7% for LMA, 1.9% for LMD, 1.4% for BF, and 0.3% for TN. The use of gene-based haplotype blocks resulted in the best prediction accuracy for LMA, LMD, and TN. Compared to estimates of SNP additive heritability, estimates of haplotype epistasis heritability were strongly correlated with the increase in prediction accuracy by haplotype models. The increase in prediction accuracy was largest for BJS, AGW, ADG, and FCR, which also had the largest estimates of haplotype epistasis heritability, 24.4% for BJS, 14.3% for AGW, 14.5% for ADG, and 17.7% for FCR. SNP and haplotype heritability profiles across the genome identified several genes with large genetic contributions to phenotypes: NUDT3 for LMA, LMD and BF, VRTN for TN, COL5A2 for BJS, BSND for ADG, and CARTPT for FCR. Conclusions Haplotype prediction models improved the accuracy for genomic prediction of phenotypes in Duroc pigs. For some traits, the best prediction accuracy was obtained with haplotypes defined using gene regions, which provides evidence that functional genomic information can improve the accuracy of haplotype genomic prediction for certain traits.


2017 ◽  
Author(s):  
Siraj Ismail Kayondo ◽  
Dunia Pino Del Carpio ◽  
Roberto Lozano ◽  
Alfred Ozimati ◽  
Marnin Wolfe ◽  
...  

AbstractCassava (Manihot esculenta Crantz), a key carbohydrate dietary source for millions of people in Africa, faces severe yield loses due to two viral diseases: cassava brown streak disease (CBSD) and cassava mosaic disease (CMD). The completion of the cassava genome sequence and the whole genome marker profiling of clones from African breeding programs (www.nextgencassava.org) provides cassava breeders the opportunity to deploy additional breeding strategies and develop superior varieties with both farmer and industry preferred traits. Here the identification of genomic segments associated with resistance to CBSD foliar symptoms and root necrosis as measured in two breeding panels at different growth stages and locations is reported. Using genome-wide association mapping and genomic prediction models we describe the genetic architecture for CBSD severity and identify loci strongly associated on chromosomes 4 and 11. Moreover, the significantly associated region on chromosome 4 colocalises with a Manihot glaziovii introgression segment and the significant SNP markers on chromosome 11 are situated within a cluster of nucleotide-binding site leucine-rich repeat (NBS-LRR) genes previously described in cassava. Overall, predictive accuracy values found in this study varied between CBSD severity traits and across GS models with Random Forest and RKHS showing the highest predictive accuracies for foliar and root CBSD severity scores.


2021 ◽  
Author(s):  
Michaela Jung ◽  
Beat Keller ◽  
Morgane Roth ◽  
Maria Jose Aranzana ◽  
Annemarie Auwerkerken ◽  
...  

Implementation of genomic tools is desirable to increase the efficiency of apple breeding. The apple reference population (apple REFPOP) proved useful for rediscovering loci, estimating genomic prediction accuracy, and studying genotype by environment interactions (GxE). Here we show contrasting genetic architecture and genomic prediction accuracies for 30 quantitative traits across up to six European locations using the apple REFPOP. A total of 59 stable and 277 location-specific associations were found using GWAS, 69.2% of which are novel when compared with 41 reviewed publications. Average genomic prediction accuracies of 0.18-0.88 were estimated using single-environment univariate, single-environment multivariate, multi-environment univariate, and multi-environment multivariate models. The GxE accounted for up to 24% of the phenotypic variability. This most comprehensive genomic study in apple in terms of trait-environment combinations provided knowledge of trait biology and prediction models that can be readily applied for marker-assisted or genomic selection, thus facilitating increased breeding efficiency.


Sign in / Sign up

Export Citation Format

Share Document