Impact of selective genotyping in the training population on accuracy and bias of genomic selection

This paper evaluates the potential of maximizing genetic gain in dairy cattle breeding by optimizing investment into phenotyping and genotyping. Conventional breeding focuses on phenotyping selection candidates or their close relatives to maximize selection accuracy for breeders and quality assurance for producers. Genomic selection decoupled phenotyping and selection and through this increased genetic gain per year compared to the conventional selection. Although genomic selection is established in well-resourced breeding programs, small populations and developing countries still struggle with the implementation. The main issues include the lack of training animals and lack of financial resources. To address this, we simulated a case-study of a small dairy population with a number of scenarios with equal available resources yet varied use of resources for phenotyping and genotyping. The conventional progeny testing scenario collected 11 phenotypic records per lactation. In genomic selection scenarios, we reduced phenotyping to between 10 and 1 phenotypic records per lactation and invested the saved resources into genotyping. We tested these scenarios at different relative prices of phenotyping to genotyping and with or without an initial training population for genomic selection. Reallocating a part of phenotyping resources for repeated milk records to genotyping increased genetic gain compared to the conventional selection scenario regardless of the amount and relative cost of phenotyping, and the availability of an initial training population. Genetic gain increased by increasing genotyping, despite reduced phenotyping. High-genotyping scenarios even saved resources. Genomic selection scenarios expectedly increased accuracy for young non-phenotyped candidate males and females, but also proven females. This study shows that breeding programs should optimize investment into phenotyping and genotyping to maximize return on investment. Our results suggest that any dairy breeding program using conventional progeny testing with repeated milk records can implement genomic selection without increasing the level of investment.

Download Full-text

Optimization of Genomic Selection to Improve Disease Resistance in Two Marine Fishes, the European Sea Bass (Dicentrarchus labrax) and the Gilthead Sea Bream (Sparus aurata)

Frontiers in Genetics ◽

10.3389/fgene.2021.665920 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ronan Griot ◽

François Allal ◽

Florence Phocas ◽

Sophie Brard-Fudulea ◽

Romain Morvezen ◽

...

Keyword(s):

Population Size ◽

Genomic Selection ◽

Sparus Aurata ◽

Sea Bass ◽

Gilthead Sea Bream ◽

Sea Bream ◽

Full Density ◽

Marker Density ◽

Training Population ◽

European Sea Bass

Disease outbreaks are a major threat to the aquaculture industry, and can be controlled by selective breeding. With the development of high-throughput genotyping technologies, genomic selection may become accessible even in minor species. Training population size and marker density are among the main drivers of the prediction accuracy, which both have a high impact on the cost of genomic selection. In this study, we assessed the impact of training population size as well as marker density on the prediction accuracy of disease resistance traits in European sea bass (Dicentrarchus labrax) and gilthead sea bream (Sparus aurata). We performed a challenge to nervous necrosis virus (NNV) in two sea bass cohorts, a challenge to Vibrio harveyi in one sea bass cohort and a challenge to Photobacterium damselae subsp. piscicida in one sea bream cohort. Challenged individuals were genotyped on 57K–60K SNP chips. Markers were sampled to design virtual SNP chips of 1K, 3K, 6K, and 10K markers. Similarly, challenged individuals were randomly sampled to vary training population size from 50 to 800 individuals. The accuracy of genomic-based (GBLUP model) and pedigree-based estimated breeding values (EBV) (PBLUP model) was computed for each training population size using Monte-Carlo cross-validation. Genomic-based breeding values were also computed using the virtual chips to study the effect of marker density. For resistance to Viral Nervous Necrosis (VNN), as one major QTL was detected, the opportunity of marker-assisted selection was investigated by adding a QTL effect in both genomic and pedigree prediction models. As training population size increased, accuracy increased to reach values in range of 0.51–0.65 for full density chips. The accuracy could still increase with more individuals in the training population as the accuracy plateau was not reached. When using only the 6K density chip, accuracy reached at least 90% of that obtained with the full density chip. Adding the QTL effect increased the accuracy of the PBLUP model to values higher than the GBLUP model without the QTL effect. This work sets a framework for the practical implementation of genomic selection to improve the resistance to major diseases in European sea bass and gilthead sea bream.

Download Full-text

Training Population Optimization for Genomic Selection in Miscanthus

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401402 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2465-2476

Author(s):

Marcus O. Olatoye ◽

Lindsay V. Clark ◽

Nicholas R. Labonte ◽

Hongxu Dong ◽

Maria S. Dwiyanti ◽

...

Keyword(s):

Genomic Selection ◽

Perennial Grass ◽

Substantial Contribution ◽

Lignocellulosic Ethanol ◽

Training Population ◽

Training Set ◽

Breeding Values ◽

Single Clone ◽

Estimated Breeding Values ◽

The Ideal

Miscanthus is a perennial grass with potential for lignocellulosic ethanol production. To ensure its utility for this purpose, breeding efforts should focus on increasing genetic diversity of the nothospecies Miscanthus × giganteus (M×g) beyond the single clone used in many programs. Germplasm from the corresponding parental species M. sinensis (Msi) and M. sacchariflorus (Msa) could theoretically be used as training sets for genomic prediction of M×g clones with optimal genomic estimated breeding values for biofuel traits. To this end, we first showed that subpopulation structure makes a substantial contribution to the genomic selection (GS) prediction accuracies within a 538-member diversity panel of predominately Msi individuals and a 598-member diversity panels of Msa individuals. We then assessed the ability of these two diversity panels to train GS models that predict breeding values in an interspecific diploid 216-member M×g F2 panel. Low and negative prediction accuracies were observed when various subsets of the two diversity panels were used to train these GS models. To overcome the drawback of having only one interspecific M×g F2 panel available, we also evaluated prediction accuracies for traits simulated in 50 simulated interspecific M×g F2 panels derived from different sets of Msi and diploid Msa parents. The results revealed that genetic architectures with common causal mutations across Msi and Msa yielded the highest prediction accuracies. Ultimately, these results suggest that the ideal training set should contain the same causal mutations segregating within interspecific M×g populations, and thus efforts should be undertaken to ensure that individuals in the training and validation sets are as closely related as possible.

Download Full-text

Comparison of BLUP and Bayesian methods for different sizes of training population in genomic selection

TURKISH JOURNAL OF VETERINARY AND ANIMAL SCIENCES ◽

10.3906/vet-2001-52 ◽

2020 ◽

Vol 44 (5) ◽

pp. 994-1002

Author(s):

Samet Hasan ABACI ◽

Hasan ÖNDER

Keyword(s):

Genomic Selection ◽

Milk Yield ◽

Predictive Ability ◽

Breeding Value ◽

Training Population ◽

Breeding Values ◽

Value Prediction ◽

The Usa ◽

Population Sizes ◽

Estimated Breeding Values

This study aims to compare the accuracy of pedigree-based and genomic-based breeding value prediction for different training population sizes. In this study, Bayes (A, B, C, Cpi) and GBLUP methods for genomic selection and BLUP method for pedigree-based selection were used. Genomic and pedigree-based breeding values were estimated for partial milk yield (158 days) of Holstein cows (400 individuals) from a private enterprise in the USA. For this aim, populations were created for indirect breeding value estimates as training (322–360) and test (78–40) populations. In animals genotyped with a 54k SNP, the marker file was encoded as –10, 0, and 10 for AA, AB, and BB marker genotypes, respectively. Bayes and GBLUP methods were performed using GenSel 4.55 software. A total of 50,000 iterations were used, with the first 5000 excluded as the burn-in. Pedigree-based breeding values were estimated by REML using MTDFREML software employing an animal model. Correlations between partial milk yield and estimated breeding values were used to assess the predictive ability for methods. Bayes B method gave the highest accuracy for the indirect estimate of breeding value.

Download Full-text

Genomic prediction of fruit texture and training population optimization towards the application of genomic selection in apple

Horticulture Research ◽

10.1038/s41438-020-00370-5 ◽

2020 ◽

Vol 7 (1) ◽

Cited By ~ 1

Author(s):

Morgane Roth ◽

Hélène Muranty ◽

Mario Di Guardo ◽

Walter Guerra ◽

Andrea Patocchi ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Training Population ◽

Fruit Texture ◽

And Training

Download Full-text

Genomic Selection at Preliminary Yield Trial Stage: Training Population Design to Predict Untested Lines

Agronomy ◽

10.3390/agronomy10010060 ◽

2020 ◽

Vol 10 (1) ◽

pp. 60

Author(s):

Virginia L. Verges ◽

David A. Van Sanford

Keyword(s):

Genomic Selection ◽

Heading Date ◽

Wheat Breeding ◽

Minimal Amount ◽

Efficient Design ◽

Training Population ◽

Breeding Lines ◽

Yield Trial ◽

Small Family Size ◽

Yield Trials

Genomic selection (GS) is being applied routinely in wheat breeding programs. For the evaluation of preliminary lines, this tool is becoming important because preliminary lines are generally evaluated in few environments with no replications due to the minimal amount of seed available to the breeder. A total of 816 breeding lines belonging to advanced or preliminary yield trials were included in the study. We designed different training populations (TP) to predict lines in preliminary yield trials (PYT) consisting of: (i) advanced lines of the breeding program; (ii) 50% of the preliminary lines set belonging to many families; (iii) only full sibs, consisting of 50% of lines of each family. Results showed that the strategy of splitting the preliminary set in half, phenotyping only half of the lines to serve as the TP showed the most consistent results for the different traits. For a subset of the population of lines, we observed accuracies ranging from 0.49–0.65 for yield, 0.59–0.61 for test weight, 0.70–0.72 for heading date, and 0.49–0.50 for height. Accuracies decreased with the other training population designs, and were inconsistent across preliminary line sets and traits. From a breeder’s perspective, a prediction accuracy of 0.65 meant, at 0.2 selection intensity, 75% of the best yielding lines based on phenotypic information were correctly selected by the GS model. Our results demonstrate that, despite the small family size, an approach that includes lines from the same family in both the TP and VP, together with half sibs and more distant lines, and only phenotyping the lines included in the TP, could be a useful, efficient design for establishing a GS scheme to predict lines entering first year yield trials.

Download Full-text

Rapid Cycling Genomic Selection in a Multiparental Tropical Maize Population

G3 Genes|Genome|Genetics ◽

10.1534/g3.117.043141 ◽

2017 ◽

Vol 7 (7) ◽

pp. 2315-2326 ◽

Cited By ~ 26

Author(s):

Xuecai Zhang ◽

Paulino Pérez-Rodríguez ◽

Juan Burgueño ◽

Michael Olsen ◽

Edward Buckler ◽

...

Keyword(s):

Genetic Diversity ◽

Grain Yield ◽

Genomic Selection ◽

Breeding Period ◽

Rapid Cycling ◽

Tropical Maize ◽

Training Population ◽

Genetic Gains ◽

Maize Population ◽

Short Period

Abstract Genomic selection (GS) increases genetic gain by reducing the length of the selection cycle, as has been exemplified in maize using rapid cycling recombination of biparental populations. However, no results of GS applied to maize multi-parental populations have been reported so far. This study is the first to show realized genetic gains of rapid cycling genomic selection (RCGS) for four recombination cycles in a multi-parental tropical maize population. Eighteen elite tropical maize lines were intercrossed twice, and self-pollinated once, to form the cycle 0 (C0) training population. A total of 1000 ear-to-row C0 families was genotyped with 955,690 genotyping-by-sequencing SNP markers; their testcrosses were phenotyped at four optimal locations in Mexico to form the training population. Individuals from families with the best plant types, maturity, and grain yield were selected and intermated to form RCGS cycle 1 (C1). Predictions of the genotyped individuals forming cycle C1 were made, and the best predicted grain yielders were selected as parents of C2; this was repeated for more cycles (C2, C3, and C4), thereby achieving two cycles per year. Multi-environment trials of individuals from populations C0, C1, C2, C3, and C4, together with four benchmark checks were evaluated at two locations in Mexico. Results indicated that realized grain yield from C1 to C4 reached 0.225 ton ha−1 per cycle, which is equivalent to 0.100 ton ha−1 yr−1 over a 4.5-yr breeding period from the initial cross to the last cycle. Compared with the original 18 parents used to form cycle 0 (C0), genetic diversity narrowed only slightly during the last GS cycles (C3 and C4). Results indicate that, in tropical maize multi-parental breeding populations, RCGS can be an effective breeding strategy for simultaneously conserving genetic diversity and achieving high genetic gains in a short period of time.

Download Full-text

Genomic selection for any dairy breeding program via optimized investment in phenotyping and genotyping

10.1101/2020.08.16.252841 ◽

2020 ◽

Author(s):

Jana Obšteter ◽

Janez Jenko ◽

Gregor Gorjanc

Keyword(s):

Genomic Selection ◽

Genetic Gain ◽

Breeding Program ◽

Progeny Testing ◽

Initial Training ◽

Training Population ◽

Breeding Programs ◽

Female Candidates ◽

Use Of Resources ◽

Close Relatives

AbstractThis paper evaluates the potential of maximizing genetic gain in dairy cattle breeding by optimizing investment into phenotyping and genotyping. Conventional breeding focuses on phenotyping selection candidates or their close relatives to maximize selection accuracy for breeders and quality assurance for producers. Genomic selection decoupled phenotyping and selection and through this increased genetic gain per year compared to the conventional selection. Although genomic selection is established in well-resourced breeding programs, small populations and developing countries still struggle with the implementation. The main issues include the lack of training animals and lack of financial resources. To address this, we simulated a case-study of a small dairy population with a number of scenarios with equal resources yet varied use of resources for phenotyping and genotyping. The conventional progeny testing scenario had 11 phenotype records per lactation. In genomic scenarios, we reduced phenotyping to between 10 and 1 phenotype records per lactation and invested the saved resources into genotyping. We tested these scenarios at different relative prices of phenotyping to genotyping and with or without an initial training population for genomic selection. Reallocating a part of phenotyping resources for repeated milk records to genotyping increased genetic gain compared to the conventional scenario regardless of the amount and relative cost of phenotyping, and the availability of an initial training population. Genetic gain increased by increasing genotyping, despite reduced phenotyping. High-genotyping scenarios even saved resources. Genomic scenarios expectedly increased accuracy for young non-phenotyped male and female candidates, but also cows. This study shows that breeding programs should optimize investment into phenotyping and genotyping to maximise return on investment. Our results suggest that any dairy breeding program using conventional progeny testing with repeated milk records can implement genomic selection without increasing the level of investment.

Download Full-text

Adaptive genetic algorithm for reliable training population in plant breeding genomic selection

2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS) ◽

10.1109/icacsis.2016.7872803 ◽

2016 ◽

Cited By ~ 1

Author(s):

Sumarsih C. Purbarani ◽

Ito Wasito ◽

Ilham Kusuma

Keyword(s):

Genetic Algorithm ◽

Plant Breeding ◽

Genomic Selection ◽

Adaptive Genetic Algorithm ◽

Training Population

Download Full-text

Genomic prediction across years in a maize doubled haploid breeding program to accelerate early-stage testcross testing

Theoretical and Applied Genetics ◽

10.1007/s00122-020-03638-5 ◽

2020 ◽

Vol 133 (10) ◽

pp. 2869-2879

Author(s):

Nan Wang ◽

Hui Wang ◽

Ao Zhang ◽

Yubo Liu ◽

Diansi Yu ◽

...

Keyword(s):

Genomic Selection ◽

Doubled Haploid ◽

Early Stage ◽

Low Cost ◽

Phenotypic Selection ◽

Breeding Value ◽

Main Task ◽

Training Population ◽

Dh Lines ◽

Stage Yield

Abstract Key message Genomic selection with a multiple-year training population dataset could accelerate early-stage testcross testing by skipping the first-stage yield testing, which significantly saves the time and cost of early-stage testcross testing. Abstract With the development of doubled haploid (DH) technology, the main task for a maize breeder is to estimate the breeding values of thousands of DH lines annually. In early-stage testcross testing, genomic selection (GS) offers the opportunity of replacing expensive multiple-environment phenotyping and phenotypic selection with lower-cost genotyping and genomic estimated breeding value (GEBV)-based selection. In the present study, a total of 1528 maize DH lines, phenotyped in multiple-environment trials in three consecutive years and genotyped with a low-cost per-sample genotyping platform of rAmpSeq, were used to explore how to implement GS to accelerate early-stage testcross testing. Results showed that the average prediction accuracy estimated from the cross-validation schemes was above 0.60 across all the scenarios. The average prediction accuracies estimated from the independent validation schemes ranged from 0.23 to 0.32 across all the scenarios, when the one-year datasets were used as training population (TRN) to predict the other year data as testing population (TST). The average prediction accuracies increased to a range from 0.31 to 0.42 across all the scenarios, when the two-years datasets were used as TRN. The prediction accuracies increased to a range from 0.50 to 0.56, when the TRN consisted of two-years of breeding data and 50% of third year’s data converted from TST to TRN. This information showed that GS with a multiple-year TRN set offers the opportunity to accelerate early-stage testcross testing by skipping the first-stage yield testing, which significantly saves the time and cost of early-stage testcross testing.

Download Full-text