Genomic prediction for commercial traits using univariate and multivariate approaches in Nile tilapia (Oreochromis niloticus)

AbstractBackgroundOver the past three decades, Nile tilapia industry has grown into a significant aquaculture industry spread over 120 tropical and sub-tropical countries around the world accounting for 7.4% of global aquaculture production in 2015. Across species, genomic selection has been shown to increase predictive ability and genetic gain, also extending into aquaculture. Hence, the aim of this paper is to compare the predictive abilities of pedigree- and genomic-based models in univariate and multivariate approaches, with the aim to utilize genomic selection in a Nile tilapia breeding program. A total of 1444 fish were genotyped (48,960 SNP loci) and phenotyped for body weight at harvest (BW), fillet weight (FW) and fillet yield (FY). The pedigree-based analysis utilized a deep pedigree, including 14 generations. Estimated breeding values (EBVs and GEBVs) were obtained with traditional pedigree-based (PBLUP) and genomic (GBLUP) models, using both univariate and multivariate approaches. Prediction accuracy and bias were evaluated using 5 replicates of 10-fold cross-validation with three different cross-validation approaches. Further, impact of these models and approaches on the genetic evaluation was assessed based on the ranking of the selection candidates.ResultsGBLUP univariate models were found to increase the prediction accuracy and reduce bias of prediction compared to other PBLUP and multivariate approaches. Relative to pedigree-based models, prediction accuracy increased by ∼20% for FY, >75% for FW and >43% for BW. GBLUP models caused major re-ranking of the selection candidates, with no significant difference in the ranking due to univariate or multivariate GBLUP approaches. The heritabilities using multivariate GBLUP models for BW, FW and FY were 0.19 ± 0.04, 0.17 ± 0.04 and 0.23 ± 0.04 respectively. BW showed very high genetic correlation with FW (0.96 ± 0.01) and a slightly negative genetic correlation with FY (−0.11 ± 0.15).ConclusionPredictive ability of genomic prediction models is substantially higher than for classical pedigree-based models. Genomic selection is therefore beneficial to the Nile tilapia breeding program, and it is recommended in routine genetic evaluations of commercial traits in the Nile tilapia breeding nucleus.

Download Full-text

How Population Structure Impacts Genomic Selection Accuracy in Cross-Validation: Implications for Practical Breeding

Frontiers in Plant Science ◽

10.3389/fpls.2020.592977 ◽

2020 ◽

Vol 11 ◽

Author(s):

Christian R. Werner ◽

R. Chris Gaynor ◽

Gregor Gorjanc ◽

John M. Hickey ◽

Tobias Kox ◽

...

Keyword(s):

Family Structure ◽

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Cross Validation ◽

Careful Analysis ◽

Critical Approach ◽

Crop Species ◽

Breeding Programs ◽

Mendelian Sampling

Over the last two decades, the application of genomic selection has been extensively studied in various crop species, and it has become a common practice to report prediction accuracies using cross validation. However, genomic prediction accuracies obtained from random cross validation can be strongly inflated due to population or family structure, a characteristic shared by many breeding populations. An understanding of the effect of population and family structure on prediction accuracy is essential for the successful application of genomic selection in plant breeding programs. The objective of this study was to make this effect and its implications for practical breeding programs comprehensible for breeders and scientists with a limited background in quantitative genetics and genomic selection theory. We, therefore, compared genomic prediction accuracies obtained from different random cross validation approaches and within-family prediction in three different prediction scenarios. We used a highly structured population of 940 Brassica napus hybrids coming from 46 testcross families and two subpopulations. Our demonstrations show how genomic prediction accuracies obtained from among-family predictions in random cross validation and within-family predictions capture different measures of prediction accuracy. While among-family prediction accuracy measures prediction accuracy of both the parent average component and the Mendelian sampling term, within-family prediction only measures how accurately the Mendelian sampling term can be predicted. With this paper we aim to foster a critical approach to different measures of genomic prediction accuracy and a careful analysis of values observed in genomic selection experiments and reported in literature.

Download Full-text

Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods

10.1101/595397 ◽

2019 ◽

Author(s):

Daniel Runcie ◽

Hao Cheng

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Cross Validation ◽

Prediction Models ◽

Selection Index ◽

Parametric Method ◽

Multiple Traits ◽

Gold Standard Method ◽

Secondary Traits ◽

Validation Strategy

ABSTRACTIncorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful.

Download Full-text

Genomic Selection for Yield and Seed Protein Content in Soybean: A Study of Breeding Program Data and Assessment of Prediction Accuracy

Crop Science ◽

10.2135/cropsci2016.06.0496 ◽

2017 ◽

Vol 57 (3) ◽

pp. 1325-1337 ◽

Cited By ~ 22

Author(s):

Alexandra Duhnen ◽

Amandine Gras ◽

Simon Teyssèdre ◽

Michel Romestant ◽

Bruno Claustres ◽

...

Keyword(s):

Protein Content ◽

Genomic Selection ◽

Prediction Accuracy ◽

Seed Protein ◽

Breeding Program ◽

Seed Protein Content ◽

Selection For ◽

Program Data

Download Full-text

Maximizing efficiency of genomic selection in CIMMYT’s tropical maize breeding program

Theoretical and Applied Genetics ◽

10.1007/s00122-020-03696-9 ◽

2020 ◽

Author(s):

Sikiru Adeniyi Atanda ◽

Michael Olsen ◽

Juan Burgueño ◽

Jose Crossa ◽

Daniel Dzidzienyo ◽

...

Keyword(s):

Genomic Selection ◽

Prediction Accuracy ◽

Large Scale ◽

Primary Objective ◽

Breeding Program ◽

Breeding Cycle ◽

Training Set ◽

Maize Breeding ◽

Phenotypic Data ◽

Breeding Programs

Abstract Key message Historical data from breeding programs can be efficiently used to improve genomic selection accuracy, especially when the training set is optimized to subset individuals most informative of the target testing set. Abstract The current strategy for large-scale implementation of genomic selection (GS) at the International Maize and Wheat Improvement Center (CIMMYT) global maize breeding program has been to train models using information from full-sibs in a “test-half-predict-half approach.” Although effective, this approach has limitations, as it requires large full-sib populations and limits the ability to shorten variety testing and breeding cycle times. The primary objective of this study was to identify optimal experimental and training set designs to maximize prediction accuracy of GS in CIMMYT’s maize breeding programs. Training set (TS) design strategies were evaluated to determine the most efficient use of phenotypic data collected on relatives for genomic prediction (GP) using datasets containing 849 (DS1) and 1389 (DS2) DH-lines evaluated as testcrosses in 2017 and 2018, respectively. Our results show there is merit in the use of multiple bi-parental populations as TS when selected using algorithms to maximize relatedness between the training and prediction sets. In a breeding program where relevant past breeding information is not readily available, the phenotyping expenditure can be spread across connected bi-parental populations by phenotyping only a small number of lines from each population. This significantly improves prediction accuracy compared to within-population prediction, especially when the TS for within full-sib prediction is small. Finally, we demonstrate that prediction accuracy in either sparse testing or “test-half-predict-half” can further be improved by optimizing which lines are planted for phenotyping and which lines are to be only genotyped for advancement based on GP.

Download Full-text

Genomic Prediction and Genetic Correlation of Agronomic, Blackleg Disease, and Seed Quality Traits in Canola (Brassica napus L.)

Plants ◽

10.3390/plants9060719 ◽

2020 ◽

Vol 9 (6) ◽

pp. 719

Author(s):

Mulusew Fikere ◽

Denise M. Barbulescu ◽

M. Michelle Malmberg ◽

Pankaj Maharjan ◽

Phillip A. Salisbury ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Seed Quality ◽

Agronomic Traits ◽

Genetic Correlations ◽

Quality Traits ◽

Blackleg Disease ◽

Genetic Progress ◽

Seed Quality Traits

Genomic selection accelerates genetic progress in crop breeding through the prediction of future phenotypes of selection candidates based on only their genomic information. Here we report genetic correlations and genomic prediction accuracies in 22 agronomic, disease, and seed quality traits measured across multiple years (2015–2017) in replicated trials under rain-fed and irrigated conditions in Victoria, Australia. Two hundred and two spring canola lines were genotyped for 62,082 Single Nucleotide Polymorphisms (SNPs) using transcriptomic genotype-by-sequencing (GBSt). Traits were evaluated in single trait and bivariate genomic best linear unbiased prediction (GBLUP) models and cross-validation. GBLUP were also expanded to include genotype-by-environment G × E interactions. Genomic heritability varied from 0.31to 0.66. Genetic correlations were highly positive within traits across locations and years. Oil content was positively correlated with most agronomic traits. Strong, not previously documented, negative correlations were observed between average internal infection (a measure of blackleg disease) and arachidic and stearic acids. The genetic correlations between fatty acid traits followed the expected patterns based on oil biosynthesis pathways. Genomic prediction accuracy ranged from 0.29 for emergence count to 0.69 for seed yield. The incorporation of G × E translates into improved prediction accuracy by up to 6%. The genomic prediction accuracies achieved indicate that genomic selection is ready for application in canola breeding.

Download Full-text

Haplotype genomic prediction of phenotypic values based on chromosome distance and gene boundaries using low-coverage sequencing in Duroc pigs

Genetics Selection Evolution ◽

10.1186/s12711-021-00661-y ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Cheng Bian ◽

Dzianis Prakapenka ◽

Cheng Tan ◽

Ruifei Yang ◽

Di Zhu ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Prediction Models ◽

Average Daily Gain ◽

Live Weight ◽

Feed Conversion ◽

Muscle Area ◽

Haplotype Blocks ◽

Low Coverage

Abstract Background Genomic selection using single nucleotide polymorphism (SNP) markers has been widely used for genetic improvement of livestock, but most current methods of genomic selection are based on SNP models. In this study, we investigated the prediction accuracies of haplotype models based on fixed chromosome distances and gene boundaries compared to those of SNP models for genomic prediction of phenotypic values. We also examined the reasons for the successes and failures of haplotype genomic prediction. Methods We analyzed a swine population of 3195 Duroc boars with records on eight traits: body judging score (BJS), teat number (TN), age (AGW), loin muscle area (LMA), loin muscle depth (LMD) and back fat thickness (BF) at 100 kg live weight, and average daily gain (ADG) and feed conversion rate (FCR) from 30 to100 kg live weight. Ten-fold validation was used to evaluate the prediction accuracy of each SNP model and each multi-allelic haplotype model based on 488,124 autosomal SNPs from low-coverage sequencing. Haplotype blocks were defined using fixed chromosome distances or gene boundaries. Results Compared to the best SNP model, the accuracy of predicting phenotypic values using a haplotype model was greater by 7.4% for BJS, 7.1% for AGW, 6.6% for ADG, 4.9% for FCR, 2.7% for LMA, 1.9% for LMD, 1.4% for BF, and 0.3% for TN. The use of gene-based haplotype blocks resulted in the best prediction accuracy for LMA, LMD, and TN. Compared to estimates of SNP additive heritability, estimates of haplotype epistasis heritability were strongly correlated with the increase in prediction accuracy by haplotype models. The increase in prediction accuracy was largest for BJS, AGW, ADG, and FCR, which also had the largest estimates of haplotype epistasis heritability, 24.4% for BJS, 14.3% for AGW, 14.5% for ADG, and 17.7% for FCR. SNP and haplotype heritability profiles across the genome identified several genes with large genetic contributions to phenotypes: NUDT3 for LMA, LMD and BF, VRTN for TN, COL5A2 for BJS, BSND for ADG, and CARTPT for FCR. Conclusions Haplotype prediction models improved the accuracy for genomic prediction of phenotypes in Duroc pigs. For some traits, the best prediction accuracy was obtained with haplotypes defined using gene regions, which provides evidence that functional genomic information can improve the accuracy of haplotype genomic prediction for certain traits.

Download Full-text

Genomic prediction in an outcrossing and autotetraploid fruit crop: lessons from blueberry breeding

10.1101/2021.03.05.434007 ◽

2021 ◽

Author(s):

Luís Felipe V. Ferrão ◽

Rodrigo R. Amadeu ◽

Juliana Benevenuto ◽

Ivone de Bem Oliveira ◽

Patricio R. Munoz

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Recurrent Selection ◽

Vaccinium Corymbosum ◽

Breeding Program ◽

Polyploid Species ◽

Specialty Crop ◽

Validation Set ◽

Production Areas ◽

The University

AbstractBlueberry (Vaccinium corymbosum and hybrids) is a specialty crop, with expanding production and consumption worldwide. The blueberry breeding program at the University of Florida (UF) has greatly contributed to the expansion of production areas by developing low-chilling cultivars better adapted to subtropical and Mediterranean climates of the globe. The breeding program has historically focused on phenotypic recurrent selection. As an autopolyploid, outcrossing, perennial, long juvenile phase crop, blueberry’s breeding cycles are costly and time-consuming, which results in low genetic gains per unit of time. Motivated by the application of molecular markers for a more accurate selection in early stages of breeding, we performed pioneering genomic prediction studies and optimization for implementation in the blueberry breeding program. We have also addressed some complexities of sequence-based geno- typing and model parametrization for an autopolyploid crop, providing empirical contributions that can be extended to other polyploid species. We herein revisited some of our previous genomic prediction studies and described the current achievements in the crop. In this paper, our contribution for genomic prediction in an autotetraploid crop is three-fold: i) summarize previous results on the relevance of model parametrizations, such as diploid or polyploid methods, and inclusion of dominance effects; ii) assess the importance of sequence depth of coverage and genotype dosage calling steps; iii) demonstrate the real impact of genomic selection on leveraging breeding decisions by using an independent validation set. Altogether, we propose a strategy for the use of genomic selection in blueberry, with potential to be applied to other polyploid species of a similar background.

Download Full-text

CV-α: designing validations sets to increase the precision and enable multiple comparison tests in genomic prediction

10.1101/2020.11.11.376343 ◽

2020 ◽

Author(s):

Rafael Massahiro Yassue ◽

José Felipe Gonzaga Sabadin ◽

Giovanni Galli ◽

Filipe Couto Alves ◽

Roberto Fritsche-Neto

Keyword(s):

Genomic Prediction ◽

Cross Validation ◽

Prediction Models ◽

Mean Squared Error ◽

Predictive Ability ◽

Proof Of Concept ◽

Squared Error ◽

High Effect ◽

The Mean ◽

Fold Cross Validation

AbstractUsually, the comparison among genomic prediction models is based on validation schemes as Repeated Random Subsampling (RRS) or K-fold cross-validation. Nevertheless, the design of training and validation sets has a high effect on the way and subjectiveness that we compare models. Those procedures cited above have an overlap across replicates that might cause an overestimated estimate and lack of residuals independence due to resampling issues and might cause less accurate results. Furthermore, posthoc tests, such as ANOVA, are not recommended due to assumption unfulfilled regarding residuals independence. Thus, we propose a new way to sample observations to build training and validation sets based on cross-validation alpha-based design (CV-α). The CV-α was meant to create several scenarios of validation (replicates x folds), regardless of the number of treatments. Using CV-α, the number of genotypes in the same fold across replicates was much lower than K-fold, indicating higher residual independence. Therefore, based on the CV-α results, as proof of concept, via ANOVA, we could compare the proposed methodology to RRS and K-fold, applying four genomic prediction models with a simulated and real dataset. Concerning the predictive ability and bias, all validation methods showed similar performance. However, regarding the mean squared error and coefficient of variation, the CV-α method presented the best performance under the evaluated scenarios. Moreover, as it has no additional cost nor complexity, it is more reliable and allows the use of non-subjective methods to compare models and factors. Therefore, CV-α can be considered a more precise validation methodology for model selection.

Download Full-text

Optimizing Low-Cost Genotyping and Imputation Strategies for Genomic Selection in Atlantic Salmon

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400800 ◽

2019 ◽

Vol 10 (2) ◽

pp. 581-590 ◽

Cited By ~ 4

Author(s):

Smaragda Tsairidou ◽

Alastair Hamilton ◽

Diego Robledo ◽

James E. Bron ◽

Ross D. Houston

Keyword(s):

Atlantic Salmon ◽

Genomic Selection ◽

Environmental Sustainability ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Imputation Accuracy ◽

Cost Effective ◽

High Density ◽

Genotype Imputation ◽

Breeding Programs

Genomic selection enables cumulative genetic gains in key production traits such as disease resistance, playing an important role in the economic and environmental sustainability of aquaculture production. However, it requires genome-wide genetic marker data on large populations, which can be prohibitively expensive. Genotype imputation is a cost-effective method for obtaining high-density genotypes, but its value in aquaculture breeding programs which are characterized by large full-sibling families has yet to be fully assessed. The aim of this study was to optimize the use of low-density genotypes and evaluate genotype imputation strategies for cost-effective genomic prediction. Phenotypes and genotypes (78,362 SNPs) were obtained for 610 individuals from a Scottish Atlantic salmon breeding program population (Landcatch, UK) challenged with sea lice, Lepeophtheirus salmonis. The genomic prediction accuracy of genomic selection was calculated using GBLUP approaches and compared across SNP panels of varying densities and composition, with and without imputation. Imputation was tested when parents were genotyped for the optimal SNP panel, and offspring were genotyped for a range of lower density imputation panels. Reducing SNP density had little impact on prediction accuracy until 5,000 SNPs, below which the accuracy dropped. Imputation accuracy increased with increasing imputation panel density. Genomic prediction accuracy when offspring were genotyped for just 200 SNPs, and parents for 5,000 SNPs, was 0.53. This accuracy was similar to the full high density and optimal density dataset, and markedly higher than using 200 SNPs without imputation. These results suggest that imputation from very low to medium density can be a cost-effective tool for genomic selection in Atlantic salmon breeding programs.

Download Full-text

Improving Prediction Accuracy Using Multi-allelic Haplotype Prediction and Training Population Optimization in Wheat

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401165 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2265-2273 ◽

Cited By ~ 1

Author(s):

Ahmad H. Sallam ◽

Emily Conley ◽

Dzianis Prakapenka ◽

Yang Da ◽

James A. Anderson

Keyword(s):

Population Structure ◽

Protein Content ◽

Prediction Accuracy ◽

Cross Validation ◽

Predictive Ability ◽

Training Population ◽

Percentage Points ◽

And Training ◽

Fold Cross Validation ◽

Single Snps

The use of haplotypes may improve the accuracy of genomic prediction over single SNPs because haplotypes can better capture linkage disequilibrium and genomic similarity in different lines and may capture local high-order allelic interactions. Additionally, prediction accuracy could be improved by portraying population structure in the calibration set. A set of 383 advanced lines and cultivars that represent the diversity of the University of Minnesota wheat breeding program was phenotyped for yield, test weight, and protein content and genotyped using the Illumina 90K SNP Assay. Population structure was confirmed using single SNPs. Haplotype blocks of 5, 10, 15, and 20 adjacent markers were constructed for all chromosomes. A multi-allelic haplotype prediction algorithm was implemented and compared with single SNPs using both k-fold cross validation and stratified sampling optimization. After confirming population structure, the stratified sampling improved the predictive ability compared with k-fold cross validation for yield and protein content, but reduced the predictive ability for test weight. In all cases, haplotype predictions outperformed single SNPs. Haplotypes of 15 adjacent markers showed the best improvement in accuracy for all traits; however, this was more pronounced in yield and protein content. The combined use of haplotypes of 15 adjacent markers and training population optimization significantly improved the predictive ability for yield and protein content by 14.3 (four percentage points) and 16.8% (seven percentage points), respectively, compared with using single SNPs and k-fold cross validation. These results emphasize the effectiveness of using haplotypes in genomic selection to increase genetic gain in self-fertilized crops.

Download Full-text