The effects of training population design on genomic prediction accuracy in wheat

AbstractGenomic selection offers several routes for increasing genetic gain or efficiency of plant breeding programs. In various species of livestock there is empirical evidence of increased rates of genetic gain from the use of genomic selection to target different aspects of the breeder’s equation. Accurate predictions of genomic breeding value are central to this and the design of training sets is in turn central to achieving sufficient levels of accuracy. In summary, small numbers of close relatives and very large numbers of distant relatives are expected to enable accurate predictions.To quantify the effect of some of the properties of training sets on the accuracy of genomic selection in crops we performed an extensive field-based winter wheat trial. In summary, this trial involved the construction of 44 F2:4 bi- and triparental populations, from which 2992 lines were grown on four field locations and yield was measured. For each line, genotype data were generated for 25,000 segregating single nucleotide polymorphism markers. The overall heritability of yield was estimated to 0.65, and estimates within individual families ranged between 0.10 and 0.85. Within cross genomic prediction accuracies of yield BLUEs were 0.125 – 0.127 using two different cross-validation approaches, and generally increased with training set size. Using related crosses in training and validation sets generally resulted in higher prediction accuracies than using unrelated crosses. The results of this study emphasize the importance of the training set design in relation to the genetic material to which the resulting prediction model is to be applied.

Download Full-text

Divergent Genomic Selection for Herbage Accumulation and Days-To-Heading in Perennial Ryegrass

Agronomy ◽

10.3390/agronomy10030340 ◽

2020 ◽

Vol 10 (3) ◽

pp. 340

Author(s):

Marty Faville ◽

Mingshu Cao ◽

Jana Schmidt ◽

Douglas Ryan ◽

Siva Ganesh ◽

...

Keyword(s):

Genomic Selection ◽

Perennial Ryegrass ◽

Genetic Gain ◽

Genomic Prediction ◽

Prediction Models ◽

Selection Response ◽

Training Set ◽

Days To Heading ◽

Selection For ◽

Target Environment

Increasing the rate of genetic gain for dry matter (DM) yield in perennial ryegrass (Lolium perenne L.), which is a key source of nutrition for ruminants in temperate environments, is an important goal for breeders. Genomic selection (GS) is a strategy used to improve genetic gain by using molecular marker information to predict breeding values in selection candidates. An empirical assessment of GS for herbage accumulation (HA; proxy for DM yield) and days-to-heading (DTH) was completed by using existing genomic prediction models to conduct one cycle of divergent GS in four selection populations (Pop I G1 and G3; Pop III G1 and G3), for each trait. G1 populations were the offspring of the training set and G3 populations were two generations further on from that. The HA of the High GEBV selection group (SG) progenies, averaged across all four populations, was 28% higher (p < 0.05) than Low GEBV SGs when assessed in the target environment, while it did not differ significantly in a second environment. Divergence was greater in Pop I (43%–65%) than Pop III (10%–16%) and the selection response was higher in G1 than in G3. Divergent GS for DTH also produced significant (p < 0.05) differences between High and Low GEBV SGs in G1 populations (+6.3 to 9.1 days; 31%–61%) and smaller, non-significant (p > 0.05) responses in G3. This study shows that genomic prediction models, trained from a small, composite reference set, can be used to improve traits with contrasting genetic architectures in perennial ryegrass. The results highlight the importance of target environment selection for training models, as well as the influence of relatedness between the training set and selection populations.

Download Full-text

The look ahead trace back optimizer for genomic selection under transparent and opaque simulators

Scientific Reports ◽

10.1038/s41598-021-83567-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Fatemeh Amini ◽

Felipe Restrepo Franco ◽

Guiping Hu ◽

Lizhi Wang

Keyword(s):

Genomic Selection ◽

Genetic Gain ◽

Genomic Prediction ◽

In Planta ◽

Data Set ◽

Look Ahead ◽

Trace Back ◽

New Variant ◽

The Look ◽

Partially Observable

AbstractRecent advances in genomic selection (GS) have demonstrated the importance of not only the accuracy of genomic prediction but also the intelligence of selection strategies. The look ahead selection algorithm, for example, has been found to significantly outperform the widely used truncation selection approach in terms of genetic gain, thanks to its strategy of selecting breeding parents that may not necessarily be elite themselves but have the best chance of producing elite progeny in the future. This paper presents the look ahead trace back algorithm as a new variant of the look ahead approach, which introduces several improvements to further accelerate genetic gain especially under imperfect genomic prediction. Perhaps an even more significant contribution of this paper is the design of opaque simulators for evaluating the performance of GS algorithms. These simulators are partially observable, explicitly capture both additive and non-additive genetic effects, and simulate uncertain recombination events more realistically. In contrast, most existing GS simulation settings are transparent, either explicitly or implicitly allowing the GS algorithm to exploit certain critical information that may not be possible in actual breeding programs. Comprehensive computational experiments were carried out using a maize data set to compare a variety of GS algorithms under four simulators with different levels of opacity. These results reveal how differently a same GS algorithm would interact with different simulators, suggesting the need for continued research in the design of more realistic simulators. As long as GS algorithms continue to be trained in silico rather than in planta, the best way to avoid disappointing discrepancy between their simulated and actual performances may be to make the simulator as akin to the complex and opaque nature as possible.

Download Full-text

Genomic Selection for Any Dairy Breeding Program via Optimized Investment in Phenotyping and Genotyping

Frontiers in Genetics ◽

10.3389/fgene.2021.637017 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jana Obšteter ◽

Janez Jenko ◽

Gregor Gorjanc

Keyword(s):

Genomic Selection ◽

Genetic Gain ◽

Breeding Program ◽

Progeny Testing ◽

Initial Training ◽

Training Population ◽

Breeding Programs ◽

Use Of Resources ◽

Dairy Cattle Breeding ◽

Close Relatives

This paper evaluates the potential of maximizing genetic gain in dairy cattle breeding by optimizing investment into phenotyping and genotyping. Conventional breeding focuses on phenotyping selection candidates or their close relatives to maximize selection accuracy for breeders and quality assurance for producers. Genomic selection decoupled phenotyping and selection and through this increased genetic gain per year compared to the conventional selection. Although genomic selection is established in well-resourced breeding programs, small populations and developing countries still struggle with the implementation. The main issues include the lack of training animals and lack of financial resources. To address this, we simulated a case-study of a small dairy population with a number of scenarios with equal available resources yet varied use of resources for phenotyping and genotyping. The conventional progeny testing scenario collected 11 phenotypic records per lactation. In genomic selection scenarios, we reduced phenotyping to between 10 and 1 phenotypic records per lactation and invested the saved resources into genotyping. We tested these scenarios at different relative prices of phenotyping to genotyping and with or without an initial training population for genomic selection. Reallocating a part of phenotyping resources for repeated milk records to genotyping increased genetic gain compared to the conventional selection scenario regardless of the amount and relative cost of phenotyping, and the availability of an initial training population. Genetic gain increased by increasing genotyping, despite reduced phenotyping. High-genotyping scenarios even saved resources. Genomic selection scenarios expectedly increased accuracy for young non-phenotyped candidate males and females, but also proven females. This study shows that breeding programs should optimize investment into phenotyping and genotyping to maximize return on investment. Our results suggest that any dairy breeding program using conventional progeny testing with repeated milk records can implement genomic selection without increasing the level of investment.

Download Full-text

Genomic Prediction of Tropical Maize Resistance to Fall Armyworm and Weevils: Genomic Selection Should Focus on Effective Training Set Determination

10.20944/preprints202007.0336.v1 ◽

2020 ◽

Author(s):

Arfang Badji ◽

Lewis Machida ◽

Daniel Bomet Kwemoi ◽

Frank Kumi ◽

Dennis Okii ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Insect Pests ◽

Fall Armyworm ◽

Sub Saharan Africa ◽

Tropical Maize ◽

Training Set ◽

Maize Weevil ◽

Maize Resistance ◽

Sub Saharan

Genomic selection (GS) can accelerate variety release by shortening variety development phase when factors that influence prediction accuracies (PA) of genomic prediction (GP) models such as training set (TS) size and relationship with the breeding set (BS) are optimized beforehand. In this study, PAs for the resistance to fall armyworm (FAW) and maize weevil (MW) in a diverse tropical maize panel composed of 341 double haploid and inbred lines were estimated. Both phenotypic best linear unbiased predictors (BLUPs) and estimators (BLUEs) were predicted using 17 parametric, semi-parametric, and nonparametric algorithms with a 10-fold and 5 repetitions cross-validation strategy. n. For both MW and FAW resistance datasets with an RBTS of 37%, PAs achieved with BLUPs were at least as twice as higher than those realized with BLUEs. The PAs achieved with BLUPs for MW resistance traits: grain weight loss (GWL), adult progeny emergence (AP), and number of affected kernels (AK) varied from 0.66 to 0.82. The PAs were also high for FAW resistance RBTS datasets, varying from 0.694 to 0.714 (for RBTS of 37%) to 0.843 to 0.844 (for RBTS of 85%). The PAs for FAW resistance with PBTS were generally high varying from 0.83 to 0.86, except for one dataset that had PAs ranging from 0.11 to 0.75. GP models showed generally similar predictive abilities for each trait while the TS designation was determinant. There was a highly positive correlation (R=0.92***) between TS size and PAs for the RBTS approach while, for the PBTS, these parameters were highly negatively correlated (R=-0.44***), indicating the importance of the degree of kinship between the TS and the BS with the smallest TS (31%) achieving the highest PAs (0.86). This study paves the way towards the use of GS for maize resistance to insect pests in sub-Saharan Africa.

Download Full-text

The effectiveness of genomic selection for milk production traits of Holstein dairy cattle

Asian-Australasian Journal of Animal Sciences ◽

10.5713/ajas.19.0546 ◽

2020 ◽

Vol 33 (3) ◽

pp. 382-389 ◽

Cited By ~ 1

Author(s):

Yun-Mi Lee ◽

Chang-Gwon Dang ◽

Mohammad Z. Alam ◽

You-Sam Kim ◽

Kwang-Hyeon Cho ◽

...

Keyword(s):

Genomic Selection ◽

Milk Production ◽

Genetic Gain ◽

Milk Fat ◽

Breeding Value ◽

Production Traits ◽

Milk Production Traits ◽

Data Set ◽

Selection For ◽

Estimated Breeding Value

Objective: This study was conducted to test the efficiency of genomic selection for milk production traits in a Korean Holstein cattle population.Methods: A total of 506,481 milk production records from 293,855 animals (2,090 heads with single nucleotide polymorphism information) were used to estimate breeding value by single step best linear unbiased prediction.Results: The heritability estimates for milk, fat, and protein yields in the first parity were 0.28, 0.26, and 0.23, respectively. As the parity increased, the heritability decreased for all milk production traits. The estimated generation intervals of sire for the production of bulls (LSB) and that for the production of cows (LSC) were 7.9 and 8.1 years, respectively, and the estimated generation intervals of dams for the production of bulls (LDB) and cows (LDC) were 4.9 and 4.2 years, respectively. In the overall data set, the reliability of genomic estimated breeding value (GEBV) increased by 9% on average over that of estimated breeding value (EBV), and increased by 7% in cows with test records, about 4% in bulls with progeny records, and 13% in heifers without test records. The difference in the reliability between GEBV and EBV was especially significant for the data from young bulls, i.e. 17% on average for milk (39% vs 22%), fat (39% vs 22%), and protein (37% vs 22%) yields, respectively. When selected for the milk yield using GEBV, the genetic gain increased about 7.1% over the gain with the EBV in the cows with test records, and by 2.9% in bulls with progeny records, while the genetic gain increased by about 24.2% in heifers without test records and by 35% in young bulls without progeny records.Conclusion: More genetic gains can be expected through the use of GEBV than EBV, and genomic selection was more effective in the selection of young bulls and heifers without test records.

Download Full-text

Genomic predictive ability for foliar nutritive traits in perennial ryegrass

10.1101/727958 ◽

2019 ◽

Author(s):

Sai Krishna Arojju ◽

Mingshu Cao ◽

M. Z. Zulfi Jahufer ◽

Brent A Barrett ◽

Marty J Faville

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Nutritive Value ◽

Prediction Models ◽

Genotypic Variation ◽

Genetic Correlations ◽

Predictive Ability ◽

Water Soluble ◽

Training Set ◽

Sib Families

AbstractForage nutritive value impacts animal nutrition, which underpins livestock productivity, reproduction and health. Genetic improvement for nutritive traits has been limited, as they are typically expensive and time-consuming to measure through conventional methods. Genomic selection is appropriate for such complex and expensive traits, enabling cost-effective prediction of breeding values using genome-wide markers. The aims of the present study were to assess the potential of genomic selection for a range of nutritive traits in a multi-population training set, and to quantify contributions of genotypic, environmental and genotype-by-environment (G × E) variance components to trait variation and heritability for nutritive traits. The training set consisted of a total of 517 half-sibling (half-sib) families, from five advanced breeding populations, evaluated in two distinct New Zealand grazing environments. Autumn-harvested samples were analyzed for 18 nutritive traits and maternal parents of the half-sib families were genotyped using genotyping-by-sequencing. Significant (P<0.05) genotypic variation was detected for all nutritive traits and genomic heritability (h2g) was moderate to high (0.20 to 0.74). G × E interactions were significant and particularly large for water soluble carbohydrate (WSC), crude fat, phosphorus (P) and crude protein. GBLUP, KGD-GBLUP and BayesC genomic prediction models displayed similar predictive ability, estimated by 10-fold cross validation, for all nutritive traits with values ranging from r = 0.16 to 0.45 using phenotypes from across two environments. High predictive ability was observed for the mineral traits sulphur (0.44), sodium (0.45) and magnesium (0.45) and the lowest values were observed for P (0.16), digestibility (0.22) and high molecular weight WSC (0.23). Predictive ability estimates for most nutritive traits were retained when marker number was reduced from 1 million to as few as 50,000. The moderate to high predictive abilities observed suggests implementation of genomic selection is feasible for most of the nutritive traits examined. For traits with lower predictive ability, multi-trait genomic prediction approaches that exploit the strong genetic correlations observed amongst some nutritive traits may be useful. This appears to be particularly important for WSC, considered one of the primary constituent of nutritive value for forages.

Download Full-text

Optimizing genomic prediction of host resistance to koi herpesvirus disease in carp

10.1101/609784 ◽

2019 ◽

Author(s):

Christos Palaiokostas ◽

Tomas Vesely ◽

Martin Kocour ◽

Martin Prchal ◽

Dagmar Pokorova ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Genetic Relationships ◽

Animal Health ◽

Economic Losses ◽

The European Union ◽

Koi Herpesvirus ◽

Wide Range ◽

Close Relatives

AbstractGenomic selection (GS) is increasingly applied in breeding programmes of major aquaculture species, enabling improved prediction accuracy and genetic gain compared to pedigree-based approaches. Koi Herpesvirus disease (KHVD) is notifiable by the World Organisation for Animal Health and the European Union, causing major economic losses to carp production. Genomic selection has potential to breed carp with improved resistance to KHVD, thereby contributing to disease control. In the current study, Restriction-site Associated DNA sequencing (RAD-seq) was applied on a population of 1,425 common carp juveniles which had been challenged with Koi herpes virus, followed by sampling of survivors and mortalities. Genomic selection (GS) was tested on a wide range of scenarios by varying both SNP densities and the genetic relationships between training and validation sets. The accuracy of correctly identifying KHVD resistant animals using genomic selection was between 8 and 18 % higher than pedigree best linear unbiased predictor (pBLUP) depending on the tested scenario. Furthermore, minor decreases in prediction accuracy were observed with decreased SNP density. However, the genetic relationship between the training and validation sets was a key factor in the efficacy of genomic prediction of KHVD resistance in carp, with substantially lower prediction accuracy when the relationships between the training and validation sets did not contain close relatives.

Download Full-text

Genomic Prediction of Resistance to Tar Spot Complex of Maize in Multiple Populations Using Genotyping-by-Sequencing SNPs

Frontiers in Plant Science ◽

10.3389/fpls.2021.672525 ◽

2021 ◽

Vol 12 ◽

Author(s):

Shiliang Cao ◽

Junqiao Song ◽

Yibing Yuan ◽

Ao Zhang ◽

Jiaojiao Ren ◽

...

Keyword(s):

Genetic Diversity ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Genetic Relationships ◽

Genotyping By Sequencing ◽

Genotypic Diversity ◽

Complex Trait ◽

Training Set ◽

Tar Spot ◽

Training Sets

Tar spot complex (TSC) is one of the most important foliar diseases in tropical maize. TSC resistance could be furtherly improved by implementing marker-assisted selection (MAS) and genomic selection (GS) individually, or by implementing them stepwise. Implementation of GS requires a profound understanding of factors affecting genomic prediction accuracy. In the present study, an association-mapping panel and three doubled haploid populations, genotyped with genotyping-by-sequencing, were used to estimate the effectiveness of GS for improving TSC resistance. When the training and prediction sets were independent, moderate-to-high prediction accuracies were achieved across populations by using the training sets with broader genetic diversity, or in pairwise populations having closer genetic relationships. A collection of inbred lines with broader genetic diversity could be used as a permanent training set for TSC improvement, which can be updated by adding more phenotyped lines having closer genetic relationships with the prediction set. The prediction accuracies estimated with a few significantly associated SNPs were moderate-to-high, and continuously increased as more significantly associated SNPs were included. It confirmed that TSC resistance could be furtherly improved by implementing GS for selecting multiple stable genomic regions simultaneously, or by implementing MAS and GS stepwise. The factors of marker density, marker quality, and heterozygosity rate of samples had minor effects on the estimation of the genomic prediction accuracy. The training set size, the genetic relationship between training and prediction sets, phenotypic and genotypic diversity of the training sets, and incorporating known trait-marker associations played more important roles in improving prediction accuracy. The result of the present study provides insight into less complex trait improvement via GS in maize.

Download Full-text

Genomic selection for any dairy breeding program via optimized investment in phenotyping and genotyping

10.1101/2020.08.16.252841 ◽

2020 ◽

Author(s):

Jana Obšteter ◽

Janez Jenko ◽

Gregor Gorjanc

Keyword(s):

Genomic Selection ◽

Genetic Gain ◽

Breeding Program ◽

Progeny Testing ◽

Initial Training ◽

Training Population ◽

Breeding Programs ◽

Female Candidates ◽

Use Of Resources ◽

Close Relatives

AbstractThis paper evaluates the potential of maximizing genetic gain in dairy cattle breeding by optimizing investment into phenotyping and genotyping. Conventional breeding focuses on phenotyping selection candidates or their close relatives to maximize selection accuracy for breeders and quality assurance for producers. Genomic selection decoupled phenotyping and selection and through this increased genetic gain per year compared to the conventional selection. Although genomic selection is established in well-resourced breeding programs, small populations and developing countries still struggle with the implementation. The main issues include the lack of training animals and lack of financial resources. To address this, we simulated a case-study of a small dairy population with a number of scenarios with equal resources yet varied use of resources for phenotyping and genotyping. The conventional progeny testing scenario had 11 phenotype records per lactation. In genomic scenarios, we reduced phenotyping to between 10 and 1 phenotype records per lactation and invested the saved resources into genotyping. We tested these scenarios at different relative prices of phenotyping to genotyping and with or without an initial training population for genomic selection. Reallocating a part of phenotyping resources for repeated milk records to genotyping increased genetic gain compared to the conventional scenario regardless of the amount and relative cost of phenotyping, and the availability of an initial training population. Genetic gain increased by increasing genotyping, despite reduced phenotyping. High-genotyping scenarios even saved resources. Genomic scenarios expectedly increased accuracy for young non-phenotyped male and female candidates, but also cows. This study shows that breeding programs should optimize investment into phenotyping and genotyping to maximise return on investment. Our results suggest that any dairy breeding program using conventional progeny testing with repeated milk records can implement genomic selection without increasing the level of investment.

Download Full-text

Identification of superior parental lines for biparental crossing via genomic prediction

PLoS ONE ◽

10.1371/journal.pone.0243159 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243159

Author(s):

Ping-Yuan Chung ◽

Chen-Tuo Liao

Keyword(s):

Genetic Gain ◽

Genomic Prediction ◽

Genomic Diversity ◽

Breeding Value ◽

Selection Strategy ◽

Parental Selection ◽

Breeding Values ◽

Truncation Selection ◽

Selection Approach ◽

Parental Lines

A parental selection approach based on genomic prediction has been developed to help plant breeders identify a set of superior parental lines from a candidate population before conducting field trials. A classical parental selection approach based on genomic prediction usually involves truncation selection, i.e., selecting the top fraction of accessions on the basis of their genomic estimated breeding values (GEBVs). However, truncation selection inevitably results in the loss of genomic diversity during the breeding process. To preserve genomic diversity, the selection of closely related accessions should be avoided during parental selection. We thus propose a new index to quantify the genomic diversity for a set of candidate accessions, and analyze two real rice (Oryza sativa L.) genome datasets to compare several selection strategies. Our results showed that the pure truncation selection strategy produced the best starting breeding value but the least genomic diversity in the base population, leading to less genetic gain. On the other hand, strategies that considered only genomic diversity resulted in greater genomic diversity but less favorable starting breeding values, leading to more genetic gain but unsatisfactorily performing recombination inbred lines (RILs) in progeny populations. Among all strategies investigated in this study, compromised strategies, which considered both GEBVs and genomic diversity, produced the best or second-best performing RILs mainly because these strategies balance the starting breeding value with the maintenance of genomic diversity.

Download Full-text