scholarly journals Calibration and validation of predicted genomic breeding values in an advanced cycle maize population

Author(s):  
Hans-Jürgen Auinger ◽  
Christina Lehermeier ◽  
Daniel Gianola ◽  
Manfred Mayer ◽  
Albrecht E. Melchinger ◽  
...  

Abstract Key message Model training on data from all selection cycles yielded the highest prediction accuracy by attenuating specific effects of individual cycles. Expected reliability was a robust predictor of accuracies obtained with different calibration sets. Abstract The transition from phenotypic to genome-based selection requires a profound understanding of factors that determine genomic prediction accuracy. We analysed experimental data from a commercial maize breeding programme to investigate if genomic measures can assist in identifying optimal calibration sets for model training. The data set consisted of six contiguous selection cycles comprising testcrosses of 5968 doubled haploid lines genotyped with a minimum of 12,000 SNP markers. We evaluated genomic prediction accuracies in two independent prediction sets in combination with calibration sets differing in sample size and genomic measures (effective sample size, average maximum kinship, expected reliability, number of common polymorphic SNPs and linkage phase similarity). Our results indicate that across selection cycles prediction accuracies were as high as 0.57 for grain dry matter yield and 0.76 for grain dry matter content. Including data from all selection cycles in model training yielded the best results because interactions between calibration and prediction sets as well as the effects of different testers and specific years were attenuated. Among genomic measures, the expected reliability of genomic breeding values was the best predictor of empirical accuracies obtained with different calibration sets. For grain yield, a large difference between expected and empirical reliability was observed in one prediction set. We propose to use this difference as guidance for determining the weight phenotypic data of a given selection cycle should receive in model retraining and for selection when both genomic breeding values and phenotypes are available.

Animals ◽  
2019 ◽  
Vol 9 (9) ◽  
pp. 672 ◽  
Author(s):  
Beatriz Castro Dias Castro Dias Cuyabano ◽  
Hanna Wackel ◽  
Donghyun Shin ◽  
Cedric Gondro

Genomic models that incorporate dense marker information have been widely used for predicting genomic breeding values since they were first introduced, and it is known that the relationship between individuals in the reference population and selection candidates affects the prediction accuracy. When genomic evaluation is performed over generations of the same population, prediction accuracy is expected to decay if the reference population is not updated. Therefore, the reference population must be updated in each generation, but little is known about the optimal way to do it. This study presents an empirical assessment of the prediction accuracy of genomic breeding values of production traits, across five generations in two Korean pig breeds. We verified the decay in prediction accuracy over time when the reference population was not updated. Additionally we compared the prediction accuracy using only the previous generation as the reference population, as opposed to using all previous generations as the reference population. Overall, the results suggested that, although there is a clear need to continuously update the reference population, it may not be necessary to keep all ancestral genotypes. Finally, comprehending how the accuracy of genomic prediction evolves over generations within a population adds relevant information to improve the performance of genomic selection.


2021 ◽  
Vol 12 ◽  
Author(s):  
Stefan Wilson ◽  
Chaozhi Zheng ◽  
Chris Maliepaard ◽  
Han A. Mulder ◽  
Richard G. F. Visser ◽  
...  

Use of genomic prediction (GP) in tetraploid is becoming more common. Therefore, we think it is the right time for a comparison of GP models for tetraploid potato. GP models were compared that contrasted shrinkage with variable selection, parametric vs. non-parametric models and different ways of accounting for non-additive genetic effects. As a complement to GP, association studies were carried out in an attempt to understand the differences in prediction accuracy. We compared our GP models on a data set consisting of 147 cultivars, representing worldwide diversity, with over 39 k GBS markers and measurements on four tuber traits collected in six trials at three locations during 2 years. GP accuracies ranged from 0.32 for tuber count to 0.77 for dry matter content. For all traits, differences between GP models that utilised shrinkage penalties and those that performed variable selection were negligible. This was surprising for dry matter, as only a few additive markers explained over 50% of phenotypic variation. Accuracy for tuber count increased from 0.35 to 0.41, when dominance was included in the model. This result is supported by Genome Wide Association Study (GWAS) that found additive and dominance effects accounted for 37% of phenotypic variation, while significant additive effects alone accounted for 14%. For tuber weight, the Reproducing Kernel Hilbert Space (RKHS) model gave a larger improvement in prediction accuracy than explicitly modelling epistatic effects. This is an indication that capturing the between locus epistatic effects of tuber weight can be done more effectively using the semi-parametric RKHS model. Our results show good opportunities for GP in 4x potato.


Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


Animals ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 2050
Author(s):  
Beatriz Castro Dias Cuyabano ◽  
Gabriel Rovere ◽  
Dajeong Lim ◽  
Tae Hun Kim ◽  
Hak Kyo Lee ◽  
...  

It is widely known that the environment influences phenotypic expression and that its effects must be accounted for in genetic evaluation programs. The most used method to account for environmental effects is to add herd and contemporary group to the model. Although generally informative, the herd effect treats different farms as independent units. However, if two farms are located physically close to each other, they potentially share correlated environmental factors. We introduce a method to model herd effects that uses the physical distances between farms based on the Global Positioning System (GPS) coordinates as a proxy for the correlation matrix of these effects that aims to account for similarities and differences between farms due to environmental factors. A population of Hanwoo Korean cattle was used to evaluate the impact of modelling herd effects as correlated, in comparison to assuming the farms as completely independent units, on the variance components and genomic prediction. The main result was an increase in the reliabilities of the predicted genomic breeding values compared to reliabilities obtained with traditional models (across four traits evaluated, reliabilities of prediction presented increases that ranged from 0.05 ± 0.01 to 0.33 ± 0.03), suggesting that these models may overestimate heritabilities. Although little to no significant gain was obtained in phenotypic prediction, the increased reliability of the predicted genomic breeding values is of practical relevance for genetic evaluation programs.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 245-246
Author(s):  
Cláudio U Magnabosco ◽  
Fernando Lopes ◽  
Valentina Magnabosco ◽  
Raysildo Lobo ◽  
Leticia Pereira ◽  
...  

Abstract The aim of the study was to evaluate prediction methods, validation approaches and pseudo-phenotypes for the prediction of the genomic breeding values of feed efficiency related traits in Nellore cattle. It used the phenotypic and genotypic information of 4,329 and 3,594 animals, respectively, which were tested for residual feed intake (RFI), dry matter intake (DMI), feed efficiency (FE), feed conversion ratio (FCR), residual body weight gain (RG), and residual intake and body weight gain (RIG). Six prediction methods were used: ssGBLUP, BayesA, BayesB, BayesCπ, BLASSO, and BayesR. Three validation approaches were used: 1) random: where the data was randomly divided into ten subsets and the validation was done in each subset at a time; 2) age: the division into the training (2010 to 2016) and validation population (2017) were based on the year of birth; 3) genetic breeding value (EBV) accuracy: the data was split in the training population being animals with accuracy above 0.45; and validation population those below 0.45. We checked the accuracy and bias of genomic value (GEBV). The results showed that the GEBV accuracy was the highest when the prediction is obtained with ssGBLUP (0.05 to 0.31) (Figure 1). The low heritability obtained, mainly for FE (0.07 ± 0.03) and FCR (0.09 ± 0.03), limited the GEBVs accuracy, which ranged from low to moderate. The regression coefficient estimates were close to 1, and similar between the prediction methods, validation approaches, and pseudo-phenotypes. The cross-validation presented the most accurate predictions ranging from 0.07 to 0.037. The prediction accuracy was higher for phenotype adjusted for fixed effects than for EBV and EBV deregressed (30.0 and 34.3%, respectively). Genomic prediction can provide a reliable estimate of genomic breeding values for RFI, DMI, RG and RGI, as to even say that those traits may have higher genetic gain than FE and FCR.


1968 ◽  
Vol 70 (1) ◽  
pp. 19-27 ◽  
Author(s):  
C. A. Foster ◽  
C. E. Wright

SummeryThree sampling experiments were conducted to examine the effect of sample size and sampling intensity on the precision of dry-matter content and botanical composition estimates of perennial rye-grass-white clover herbage. One of these experiments examined the between-sample variability of these attributes and of dry-matter yield in relation to other sources of experimental error in a small-plot sward trial. The sample sizes examined were 800 g, 400 g, 200 g, 100 g, 50 g and 25 g green weight. In general the accuracy of dry-matter content and botanical composition estimates decreased with decreasing sample size. The between-sample variabilities of 25 g and 50g samples were high in relation to their between-plot variabilities. Single 100 g samples provided reasonably good estimates of these attributes and of dry-matter yield, but single 200 g samples provided a more satisfactory margin for error. Samples larger than 200 g appeared to be unnecessary. When weight-for-weight comparisons of single and duplicate samples were made there appeared to be little advantage in duplicate sampling. A theoretical examination of measurement inaccuracies inherent in the techniques used in small-plot sward trialssuggested that variation in plot length measurements in particular may make an undesirable contribution to the variability of such trials. A procedure for the conduct of small-plot trials is recommended. It is concluded that, where plot size and replication are limited, further improvement in the precision of such trials will not be readily attainable.


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0261274
Author(s):  
Harrison J. Lamb ◽  
Ben J. Hayes ◽  
Imtiaz A. S. Randhawa ◽  
Loan T. Nguyen ◽  
Elizabeth M. Ross

Most traits in livestock, crops and humans are polygenic, that is, a large number of loci contribute to genetic variation. Effects at these loci lie along a continuum ranging from common low-effect to rare high-effect variants that cumulatively contribute to the overall phenotype. Statistical methods to calculate the effect of these loci have been developed and can be used to predict phenotypes in new individuals. In agriculture, these methods are used to select superior individuals using genomic breeding values; in humans these methods are used to quantitatively measure an individual’s disease risk, termed polygenic risk scores. Both fields typically use SNP array genotypes for the analysis. Recently, genotyping-by-sequencing has become popular, due to lower cost and greater genome coverage (including structural variants). Oxford Nanopore Technologies’ (ONT) portable sequencers have the potential to combine the benefits genotyping-by-sequencing with portability and decreased turn-around time. This introduces the potential for in-house clinical genetic disease risk screening in humans or calculating genomic breeding values on-farm in agriculture. Here we demonstrate the potential of the later by calculating genomic breeding values for four traits in cattle using low-coverage ONT sequence data and comparing these breeding values to breeding values calculated from SNP arrays. At sequencing coverages between 2X and 4X the correlation between ONT breeding values and SNP array-based breeding values was > 0.92 when imputation was used and > 0.88 when no imputation was used. With an average sequencing coverage of 0.5x the correlation between the two methods was between 0.85 and 0.92 using imputation, depending on the trait. This suggests that ONT sequencing has potential for in clinic or on-farm genomic prediction, however, further work to validate these findings in a larger population still remains.


2019 ◽  
Vol 64 (No. 4) ◽  
pp. 160-165 ◽  
Author(s):  
Bryan Irvine Lopez ◽  
Vanessa Viterbo ◽  
Choul Won Song ◽  
Kang Seok Seo

Abstract: Genetic parameters and accuracy of genomic prediction for production traits in a Duroc population were estimated. Data were on 24 828 purebred Duroc pigs born in 2000–2016. After quality control procedures, 30 263 single nucleotide polymorphism markers and 560 animals remained that were used to predict the genomic breeding values of individuals. Accuracies of predicted breeding values for average daily gain (ADG), backfat thickness (BF), loin muscle area (LMA), lean percentage (LP) and age at 90 kg (D90) between pedigree-based and single-step methods were compared. Analyses were carried out with a multivariate animal model to estimate genetic parameters for production traits while univariate analyses were performed to predict the genomic breeding values of individuals. Heritability estimates from pedigree analysis were moderate to high. Heritability estimates and standard error for ADG, BF, LMA, LP and D90 were 0.35 ± 0.01, 0.35 ± 0.11, 0.24 ± 0.04, 0.42 ± 0.11 and 0.37 ± 0.03, respectively. Genetic correlations of ADG with BF and LP were low and negative. Genetic correlations of LMA with ADG, BF, LP and D90 were –0.37, –0.27, 0.48 and 0.31, respectively. High correlations were observed between ADG and D90 (–0.98), and also between BF and LP (–0.93). Accuracies of genomic breeding values for ADG, BF, LMA, LP and D90 were 0.30, 0.33, 0.38, 0.40 and 0.28, respectively. Corresponding accuracies using pedigree-based method were 0.29, 0.32, 0.38, 0.39 and 0.27, respectively. The results showed that the single-step method did not show significant advantage compared to the pedigree-based method.


2018 ◽  
Vol 61 (2) ◽  
pp. 207-213 ◽  
Author(s):  
Pourya Davoudi ◽  
Rostam Abdollahi-Arpanahi ◽  
Ardeshir Nejati-Javaremi

Abstract. The accuracy of genomic prediction of quantitative traits based on single nucleotide polymorphism (SNP) markers depends among other factors on the allele frequency distribution of quantitative trait loci (QTL). Therefore, the aim of this study was to investigate different QTL allele frequency distributions and their effect on the accuracy of genomic estimated breeding values (GEBVs) using best linear unbiased genomic prediction (GBLUP) in simulated data. A population of 1000 individuals composed of 500 males and 500 females as well as a genome of 1000 cM consisting of 10 chromosomes and with a mutation rate of 2.5 × 10−5 per locus was simulated. QTL frequencies were derived from five distributions of allele frequency including constant, uniform, U-shaped, L-shaped and minor allele frequency (MAF) less than 0.01 (lowMAF). QTL effects were generated from a standard normal distribution. The number of QTL was assumed to be 500, and the simulation was done in 10 replications. The genomic prediction accuracy in the first-validation generation in constant, and the uniform allele frequency distribution was 0.59 and 0.57, respectively. Results showed that the highest accuracy of GEBVs was obtained with constant and uniform distributions followed by L-shaped, U-shaped and lowMAF QTL allele frequency distribution. The regression of true breeding values on predicted breeding values in the first-validation generation was 0.94, 0.92, 0.88, 0.85 and 0.75 for constant, uniform, L-shaped, U-shaped and lowMAF distributions, respectively. Depite different values of regression coefficients, in all scenarios GEBVs are biased downward. Overall, results showed that when QTL had a lower MAF relative to SNP markers, a low linkage disequilibrium (LD) was observed, which had a negative effect on the accuracy of GEBVs. Hence, the effect of the QTL allele frequency distribution on prediction accuracy can be alleviated through using a genomic relationship weighted by MAF or an LD-adjusted relationship matrix.


Sign in / Sign up

Export Citation Format

Share Document