scholarly journals Training Set Optimization for Sparse Phenotyping in Genomic Selection: A Conceptual Overview

2021 ◽  
Vol 12 ◽  
Author(s):  
Julio Isidro y Sánchez ◽  
Deniz Akdemir

Genomic selection (GS) is becoming an essential tool in breeding programs due to its role in increasing genetic gain per unit time. The design of the training set (TRS) in GS is one of the key steps in the implementation of GS in plant and animal breeding programs mainly because (i) TRS optimization is critical for the efficiency and effectiveness of GS, (ii) breeders test genotypes in multi-year and multi-location trials to select the best-performing ones. In this framework, TRS optimization can help to decrease the number of genotypes to be tested and, therefore, reduce phenotyping cost and time, and (iii) we can obtain better prediction accuracies from optimally selected TRS than an arbitrary TRS. Here, we concentrate the efforts on reviewing the lessons learned from TRS optimization studies and their impact on crop breeding and discuss important features for the success of TRS optimization under different scenarios. In this article, we review the lessons learned from training population optimization in plants and the major challenges associated with the optimization of GS including population size, the relationship between training and test set (TS), update of TRS, and the use of different packages and algorithms for TRS implementation in GS. Finally, we describe general guidelines to improving the rate of genetic improvement by maximizing the use of the TRS optimization in the GS framework.

2021 ◽  
Vol 12 ◽  
Author(s):  
Jana Obšteter ◽  
Janez Jenko ◽  
Gregor Gorjanc

This paper evaluates the potential of maximizing genetic gain in dairy cattle breeding by optimizing investment into phenotyping and genotyping. Conventional breeding focuses on phenotyping selection candidates or their close relatives to maximize selection accuracy for breeders and quality assurance for producers. Genomic selection decoupled phenotyping and selection and through this increased genetic gain per year compared to the conventional selection. Although genomic selection is established in well-resourced breeding programs, small populations and developing countries still struggle with the implementation. The main issues include the lack of training animals and lack of financial resources. To address this, we simulated a case-study of a small dairy population with a number of scenarios with equal available resources yet varied use of resources for phenotyping and genotyping. The conventional progeny testing scenario collected 11 phenotypic records per lactation. In genomic selection scenarios, we reduced phenotyping to between 10 and 1 phenotypic records per lactation and invested the saved resources into genotyping. We tested these scenarios at different relative prices of phenotyping to genotyping and with or without an initial training population for genomic selection. Reallocating a part of phenotyping resources for repeated milk records to genotyping increased genetic gain compared to the conventional selection scenario regardless of the amount and relative cost of phenotyping, and the availability of an initial training population. Genetic gain increased by increasing genotyping, despite reduced phenotyping. High-genotyping scenarios even saved resources. Genomic selection scenarios expectedly increased accuracy for young non-phenotyped candidate males and females, but also proven females. This study shows that breeding programs should optimize investment into phenotyping and genotyping to maximize return on investment. Our results suggest that any dairy breeding program using conventional progeny testing with repeated milk records can implement genomic selection without increasing the level of investment.


2017 ◽  
Vol 17 (3) ◽  
pp. 4-13 ◽  
Author(s):  
Ralph Renger ◽  
Jirina Foltysova ◽  
Jessica Renger ◽  
Wayne Booze

This paper focuses on the application of systems thinking, systems theory, and systems evaluation theory (SET) in evaluating modern day systems. SET consists of three steps purposively sequenced with each being a prerequisite for the success of the next step. The first foundational step is to define the system. Systems thinking provides theoretical rationale for defining the system boundaries, components, and relationships. However, there is no literature describing how to define these system elements. Using an example from the evaluation of several United States cardiac care systems, the paper shares a number of methods used to define the system boundaries, components, and relationships. The paper describes how each of these elements informs the evaluation of step two of SET—evaluating system efficiency. The discussion shares lessons learned, and notes the relationship between methods used in system and program evaluation.


Author(s):  
Sikiru Adeniyi Atanda ◽  
Michael Olsen ◽  
Juan Burgueño ◽  
Jose Crossa ◽  
Daniel Dzidzienyo ◽  
...  

Abstract Key message Historical data from breeding programs can be efficiently used to improve genomic selection accuracy, especially when the training set is optimized to subset individuals most informative of the target testing set. Abstract The current strategy for large-scale implementation of genomic selection (GS) at the International Maize and Wheat Improvement Center (CIMMYT) global maize breeding program has been to train models using information from full-sibs in a “test-half-predict-half approach.” Although effective, this approach has limitations, as it requires large full-sib populations and limits the ability to shorten variety testing and breeding cycle times. The primary objective of this study was to identify optimal experimental and training set designs to maximize prediction accuracy of GS in CIMMYT’s maize breeding programs. Training set (TS) design strategies were evaluated to determine the most efficient use of phenotypic data collected on relatives for genomic prediction (GP) using datasets containing 849 (DS1) and 1389 (DS2) DH-lines evaluated as testcrosses in 2017 and 2018, respectively. Our results show there is merit in the use of multiple bi-parental populations as TS when selected using algorithms to maximize relatedness between the training and prediction sets. In a breeding program where relevant past breeding information is not readily available, the phenotyping expenditure can be spread across connected bi-parental populations by phenotyping only a small number of lines from each population. This significantly improves prediction accuracy compared to within-population prediction, especially when the TS for within full-sib prediction is small. Finally, we demonstrate that prediction accuracy in either sparse testing or “test-half-predict-half” can further be improved by optimizing which lines are planted for phenotyping and which lines are to be only genotyped for advancement based on GP.


Author(s):  
Xabi Cazenave ◽  
Bernard Petit ◽  
Marc Lateur ◽  
Hilde Nybom ◽  
Jiri Sedlak ◽  
...  

Abstract Genomic selection is an attractive strategy for apple breeding that could reduce the length of breeding cycles. A possible limitation to the practical implementation of this approach lies in the creation of a training set large and diverse enough to ensure accurate predictions. In this study, we investigated the potential of combining two available populations, i.e. genetic resources and elite material, in order to obtain a large training set with a high genetic diversity. We compared the predictive ability of genomic predictions within-population, across-population or when combining both populations, and tested a model accounting for population-specific marker effects in this last case. The obtained predictive abilities were moderate to high according to the studied trait and small increases in predictive ability could be obtained for some traits when the two populations were combined into a unique training set. We also investigated the potential of such a training set to predict hybrids resulting from crosses between the two populations, with a focus on the method to design the training set and the best proportion of each population to optimize predictions. The measured predictive abilities were very similar for all the proportions, except for the extreme cases where only one of the two populations was used in the training set, in which case predictive abilities could be lower than when using both populations. Using an optimization algorithm to choose the genotypes in the training set also led to higher predictive abilities than when the genotypes were chosen at random. Our results provide guidelines to initiate breeding programs that use genomic selection when the implementation of the training set is a limitation.


2020 ◽  
Vol 10 (7) ◽  
pp. 2465-2476
Author(s):  
Marcus O. Olatoye ◽  
Lindsay V. Clark ◽  
Nicholas R. Labonte ◽  
Hongxu Dong ◽  
Maria S. Dwiyanti ◽  
...  

Miscanthus is a perennial grass with potential for lignocellulosic ethanol production. To ensure its utility for this purpose, breeding efforts should focus on increasing genetic diversity of the nothospecies Miscanthus × giganteus (M×g) beyond the single clone used in many programs. Germplasm from the corresponding parental species M. sinensis (Msi) and M. sacchariflorus (Msa) could theoretically be used as training sets for genomic prediction of M×g clones with optimal genomic estimated breeding values for biofuel traits. To this end, we first showed that subpopulation structure makes a substantial contribution to the genomic selection (GS) prediction accuracies within a 538-member diversity panel of predominately Msi individuals and a 598-member diversity panels of Msa individuals. We then assessed the ability of these two diversity panels to train GS models that predict breeding values in an interspecific diploid 216-member M×g F2 panel. Low and negative prediction accuracies were observed when various subsets of the two diversity panels were used to train these GS models. To overcome the drawback of having only one interspecific M×g F2 panel available, we also evaluated prediction accuracies for traits simulated in 50 simulated interspecific M×g F2 panels derived from different sets of Msi and diploid Msa parents. The results revealed that genetic architectures with common causal mutations across Msi and Msa yielded the highest prediction accuracies. Ultimately, these results suggest that the ideal training set should contain the same causal mutations segregating within interspecific M×g populations, and thus efforts should be undertaken to ensure that individuals in the training and validation sets are as closely related as possible.


2021 ◽  
Author(s):  
Xabi Cazenave ◽  
Bernard Petit ◽  
Francois Laurens ◽  
Charles-Eric Durel ◽  
Helene Muranty

Genomic selection is an attractive strategy for apple breeding that could reduce the length of breeding cycles. A possible limitation to the practical implementation of this approach lies in the creation of a training set large and diverse enough to ensure accurate predictions. In this study, we investigated the potential of combining two available populations, i.e. genetic resources and elite material, in order to obtain a large training set with a high genetic diversity. We compared the predictive ability of genomic predictions within-population, across-population or when combining both populations, and tested a model accounting for population-specific marker effects in this last case. The obtained predictive abilities were moderate to high according to the studied trait and were always highest when the two populations were combined into a unique training set. We also investigated the potential of such a training set to predict hybrids resulting from crosses between the two populations, with a focus on the method to design the training set and the best proportion of each population to optimize predictions. The measured predictive abilities were very similar for all the proportions, except for the extreme cases where only one of the two populations was used in the training set, in which case predictive abilities could be lower than when using both populations. Using an optimization algorithm to choose the genotypes in the training set also led to higher predictive abilities than when the genotypes were chosen at random. Our results provide guidelines to initiate breeding programs that use genomic selection when the implementation of the training set is a limitation.


2020 ◽  
Author(s):  
Jana Obšteter ◽  
Janez Jenko ◽  
Gregor Gorjanc

AbstractThis paper evaluates the potential of maximizing genetic gain in dairy cattle breeding by optimizing investment into phenotyping and genotyping. Conventional breeding focuses on phenotyping selection candidates or their close relatives to maximize selection accuracy for breeders and quality assurance for producers. Genomic selection decoupled phenotyping and selection and through this increased genetic gain per year compared to the conventional selection. Although genomic selection is established in well-resourced breeding programs, small populations and developing countries still struggle with the implementation. The main issues include the lack of training animals and lack of financial resources. To address this, we simulated a case-study of a small dairy population with a number of scenarios with equal resources yet varied use of resources for phenotyping and genotyping. The conventional progeny testing scenario had 11 phenotype records per lactation. In genomic scenarios, we reduced phenotyping to between 10 and 1 phenotype records per lactation and invested the saved resources into genotyping. We tested these scenarios at different relative prices of phenotyping to genotyping and with or without an initial training population for genomic selection. Reallocating a part of phenotyping resources for repeated milk records to genotyping increased genetic gain compared to the conventional scenario regardless of the amount and relative cost of phenotyping, and the availability of an initial training population. Genetic gain increased by increasing genotyping, despite reduced phenotyping. High-genotyping scenarios even saved resources. Genomic scenarios expectedly increased accuracy for young non-phenotyped male and female candidates, but also cows. This study shows that breeding programs should optimize investment into phenotyping and genotyping to maximise return on investment. Our results suggest that any dairy breeding program using conventional progeny testing with repeated milk records can implement genomic selection without increasing the level of investment.


2020 ◽  
Author(s):  
Owen Powell ◽  
R. Chris Gaynor ◽  
Gregor Gorjanc ◽  
Christian R. Werner ◽  
John M. Hickey

AbstractHybrid crop breeding programs using a two-part strategy produced the most genetic gain, but a maximum avoidance of inbreeding crossing scheme was required to increase long-term genetic gain. The two-part strategy uses outbred parents to complete multiple generations per year to reduce the generation interval of hybrid crop breeding programs. The maximum avoidance of inbreeding crossing scheme manages genetic variance by maintaining uniform contributions and inbreeding coefficients across all crosses. This study performed stochastic simulations to quantify the potential of a two-part strategy in combination with two crossing schemes to increase the rate of genetic gain in hybrid crop breeding programs. The two crossing schemes were: (i) a circular crossing scheme, and (ii) a maximum avoidance of inbreeding crossing scheme. The results from this study show that the implementation of genomic selection increased the rate of genetic gain, and that the two-part hybrid crop breeding program generated the highest genetic gain. This study also shows that the maximum avoidance of inbreeding crossing scheme increased long-term genetic gain in two-part hybrid crop breeding programs completing multiple selection cycles per year, as a result of maintaining higher levels of genetic variance over time. The flexibility of the two-part strategy offers further opportunities to integrate new technologies to further increase genetic gain in hybrid crop breeding programs, such as the use of outbred training populations. However, the practical implementation of the two-part strategy will require the development of bespoke transition strategies to fundamentally change the data, logistics, and infrastructure that underpin hybrid crop breeding programs.Key messageHybrid crop breeding programs using a two-part strategy produced the most genetic gain by using outbred parents to complete multiple generations per year. However, a maximum avoidance of inbreeding crossing scheme was required to manage genetic variance and increase long-term genetic gain.


Crop Science ◽  
2014 ◽  
Vol 54 (4) ◽  
pp. 1476-1488 ◽  
Author(s):  
John M. Hickey ◽  
Susanne Dreisigacker ◽  
Jose Crossa ◽  
Sarah Hearne ◽  
Raman Babu ◽  
...  

2017 ◽  
Author(s):  
Marnin D. Wolfe ◽  
Dunia Pino Del Carpio ◽  
Olumide Alabi ◽  
Chiedozie Egesi ◽  
Lydia C. Ezenwaka ◽  
...  

ABSTRACTCassava (Manihot esculenta Crantz) is a clonally propagated staple food crop in the tropics. Genomic selection (GS) reduces selection cycle times by the prediction of breeding value for selection of unevaluated lines based on genome-wide marker data. GS has been implemented at three breeding programs in sub-Saharan Africa. Initial studies provided promising estimates of predictive abilities in single populations using standard prediction models and scenarios. In the present study we expand on previous analyses by assessing the accuracy of seven prediction models for seven traits in three prediction scenarios: (1) cross-validation within each population, (2) cross-population prediction and (3) cross-generation prediction. We also evaluated the impact of increasing training population size by phenotyping progenies selected either at random or using a genetic algorithm. Cross-validation results were mostly consistent across breeding programs, with non-additive models like RKHS predicting an average of 10% more accurately. Accuracy was generally associated with heritability. Cross-population prediction accuracy was generally low (mean 0.18 across traits and models) but prediction of cassava mosaic disease severity increased up to 57% in one Nigerian population, when combining data from another related population. Accuracy across-generation was poorer than within (cross-validation) as expected, but indicated that accuracy should be sufficient for rapid-cycling GS on several traits. Selection of prediction model made some difference across generations, but increasing training population (TP) size was more important. In some cases, using a genetic algorithm, selecting one third of progeny could achieve accuracy equivalent to phenotyping all progeny. Based on the datasets analyzed in this study, it was apparent that the size of a training population (TP) has a significant impact on prediction accuracy for most traits. We are still in the early stages of GS in this crop, but results are promising, at least for some traits. The TPs need to continue to grow and quality phenotyping is more critical than ever. General guidelines for successful GS are emerging. Phenotyping can be done on fewer individuals, cleverly selected, making for trials that are more focused on the quality of the data collected.Abbreviations(GS)Genomic selection(GBS)genotype-by-sequencing(IITA)International Institute of Tropical Agriculture(NRCRI)National Root Crops Research Institute(NaCRRI)National Crops Resources Research Institute(GEBVs)genomic estimated breeding values(TP)training population(RTWT)fresh root weight(RTNO)root number(SHTWT)fresh shoot weight(HI)harvest index(DM)dry matter(CMD)content cassava mosaic disease(MCMDS)mean CMD severity(VIGOR)early vigor


Sign in / Sign up

Export Citation Format

Share Document