53. LOCAL ANCESTRY ALLOWS FOR IMPROVED GENOMIC PREDICTION IN UNDERREPRESENTED AND ADMIXED POPULATIONS

AbstractRecent advances in genomic selection (GS) have demonstrated the importance of not only the accuracy of genomic prediction but also the intelligence of selection strategies. The look ahead selection algorithm, for example, has been found to significantly outperform the widely used truncation selection approach in terms of genetic gain, thanks to its strategy of selecting breeding parents that may not necessarily be elite themselves but have the best chance of producing elite progeny in the future. This paper presents the look ahead trace back algorithm as a new variant of the look ahead approach, which introduces several improvements to further accelerate genetic gain especially under imperfect genomic prediction. Perhaps an even more significant contribution of this paper is the design of opaque simulators for evaluating the performance of GS algorithms. These simulators are partially observable, explicitly capture both additive and non-additive genetic effects, and simulate uncertain recombination events more realistically. In contrast, most existing GS simulation settings are transparent, either explicitly or implicitly allowing the GS algorithm to exploit certain critical information that may not be possible in actual breeding programs. Comprehensive computational experiments were carried out using a maize data set to compare a variety of GS algorithms under four simulators with different levels of opacity. These results reveal how differently a same GS algorithm would interact with different simulators, suggesting the need for continued research in the design of more realistic simulators. As long as GS algorithms continue to be trained in silico rather than in planta, the best way to avoid disappointing discrepancy between their simulated and actual performances may be to make the simulator as akin to the complex and opaque nature as possible.

Download Full-text

Genomic prediction using training population design in interspecific soybean populations

Molecular Breeding ◽

10.1007/s11032-021-01203-6 ◽

2021 ◽

Vol 41 (2) ◽

Author(s):

Eduardo Beche ◽

Jason D. Gillman ◽

Qijian Song ◽

Randall Nelson ◽

Tim Beissinger ◽

...

Keyword(s):

Genomic Prediction ◽

Training Population ◽

Population Design

Download Full-text

On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL

Genetics Selection Evolution ◽

10.1186/s12711-021-00607-4 ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Theo Meuwissen ◽

Irene van den Berg ◽

Mike Goddard

Keyword(s):

Variable Selection ◽

Genome Sequence ◽

Genomic Prediction ◽

Milk Fat ◽

Genotype Imputation ◽

Whole Genome Sequence ◽

Genomic Relationship Matrix ◽

Polygenic Effect ◽

Relationship Matrix ◽

Whole Genome

Abstract Background Whole-genome sequence (WGS) data are increasingly available on large numbers of individuals in animal and plant breeding and in human genetics through second-generation resequencing technologies, 1000 genomes projects, and large-scale genotype imputation from lower marker densities. Here, we present a computationally fast implementation of a variable selection genomic prediction method, that could handle WGS data on more than 35,000 individuals, test its accuracy for across-breed predictions and assess its quantitative trait locus (QTL) mapping precision. Methods The Monte Carlo Markov chain (MCMC) variable selection model (Bayes GC) fits simultaneously a genomic best linear unbiased prediction (GBLUP) term, i.e. a polygenic effect whose correlations are described by a genomic relationship matrix (G), and a Bayes C term, i.e. a set of single nucleotide polymorphisms (SNPs) with large effects selected by the model. Computational speed is improved by a Metropolis–Hastings sampling that directs computations to the SNPs, which are, a priori, most likely to be included into the model. Speed is also improved by running many relatively short MCMC chains. Memory requirements are reduced by storing the genotype matrix in binary form. The model was tested on a WGS dataset containing Holstein, Jersey and Australian Red cattle. The data contained 4,809,520 genotypes on 35,549 individuals together with their milk, fat and protein yields, and fat and protein percentage traits. Results The prediction accuracies of the Jersey individuals improved by 1.5% when using across-breed GBLUP compared to within-breed predictions. Using WGS instead of 600 k SNP-chip data yielded on average a 3% accuracy improvement for Australian Red cows. QTL were fine-mapped by locating the SNP with the highest posterior probability of being included in the model. Various QTL known from the literature were rediscovered, and a new SNP affecting milk production was discovered on chromosome 20 at 34.501126 Mb. Due to the high mapping precision, it was clear that many of the discovered QTL were the same across the five dairy traits. Conclusions Across-breed Bayes GC genomic prediction improved prediction accuracies compared to GBLUP. The combination of across-breed WGS data and Bayesian genomic prediction proved remarkably effective for the fine-mapping of QTL.

Download Full-text

Optimal breeding-value prediction using a Sparse Selection Index

Genetics ◽

10.1093/genetics/iyab030 ◽

2021 ◽

Author(s):

Marco Lopez-Cruz ◽

Gustavo de los Campos

Keyword(s):

Sample Size ◽

Dna Sequences ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Regularization Parameter ◽

Selection Index ◽

Prediction Method ◽

Training Data ◽

Breeding Value ◽

Data Set

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.

Download Full-text

MegaLMM: Mega-scale linear mixed models for genomic predictions with thousands of traits

Genome Biology ◽

10.1186/s13059-021-02416-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Daniel E. Runcie ◽

Jiayi Qu ◽

Hao Cheng ◽

Lorin Crawford

Keyword(s):

Genomic Prediction ◽

Large Scale ◽

Mixed Model ◽

Human Genetics ◽

Linear Mixed Effect Model ◽

Mixed Effect ◽

Statistical Framework ◽

Effect Model ◽

Plant Data ◽

Genetic Value

AbstractLarge-scale phenotype data can enhance the power of genomic prediction in plant and animal breeding, as well as human genetics. However, the statistical foundation of multi-trait genomic prediction is based on the multivariate linear mixed effect model, a tool notorious for its fragility when applied to more than a handful of traits. We present , a statistical framework and associated software package for mixed model analyses of a virtually unlimited number of traits. Using three examples with real plant data, we show that can leverage thousands of traits at once to significantly improve genetic value prediction accuracy.

Download Full-text

Genomic Prediction Using Alternative Strategies of Weighted Single-Step Genomic BLUP for Yearling Weight and Carcass Traits in Hanwoo Beef Cattle

Genes ◽

10.3390/genes12020266 ◽

2021 ◽

Vol 12 (2) ◽

pp. 266

Author(s):

Hossein Mehrban ◽

Masoumeh Naserkheil ◽

Deuk Hwan Lee ◽

Chungil Cho ◽

Taejeong Choi ◽

...

Keyword(s):

Quantitative Trait Loci ◽

Beef Cattle ◽

Genomic Prediction ◽

Quantitative Trait ◽

Carcass Traits ◽

Best Linear Unbiased Prediction ◽

Single Step ◽

Linear Unbiased Prediction ◽

Single Nucleotide ◽

Best Linear Unbiased

The weighted single-step genomic best linear unbiased prediction (GBLUP) method has been proposed to exploit information from genotyped and non-genotyped relatives, allowing the use of weights for single-nucleotide polymorphism in the construction of the genomic relationship matrix. The purpose of this study was to investigate the accuracy of genetic prediction using the following single-trait best linear unbiased prediction methods in Hanwoo beef cattle: pedigree-based (PBLUP), un-weighted (ssGBLUP), and weighted (WssGBLUP) single-step genomic methods. We also assessed the impact of alternative single and window weighting methods according to their effects on the traits of interest. The data was comprised of 15,796 phenotypic records for yearling weight (YW) and 5622 records for carcass traits (backfat thickness: BFT, carcass weight: CW, eye muscle area: EMA, and marbling score: MS). Also, the genotypic data included 6616 animals for YW and 5134 for carcass traits on the 43,950 single-nucleotide polymorphisms. The ssGBLUP showed significant improvement in genomic prediction accuracy for carcass traits (71%) and yearling weight (99%) compared to the pedigree-based method. The window weighting procedures performed better than single SNP weighting for CW (11%), EMA (11%), MS (3%), and YW (6%), whereas no gain in accuracy was observed for BFT. Besides, the improvement in accuracy between window WssGBLUP and the un-weighted method was low for BFT and MS, while for CW, EMA, and YW resulted in a gain of 22%, 15%, and 20%, respectively, which indicates the presence of relevant quantitative trait loci for these traits. These findings indicate that WssGBLUP is an appropriate method for traits with a large quantitative trait loci effect.

Download Full-text

Genomic prediction with non-additive effects in beef cattle: stability of variance component and genetic effect estimates against population size

BMC Genomics ◽

10.1186/s12864-021-07792-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Akio Onogi ◽

Toshio Watanabe ◽

Atsushi Ogino ◽

Kazuhito Kurogi ◽

Kenji Togashi

Keyword(s):

Population Size ◽

Genomic Prediction ◽

Variance Components ◽

Variance Component ◽

Predictive Accuracy ◽

Genetic Effect ◽

Genetic Effects ◽

Additive Effects ◽

Additive Variance ◽

Additive Genetic Effects

Abstract Background Genomic prediction is now an essential technology for genetic improvement in animal and plant breeding. Whereas emphasis has been placed on predicting the breeding values, the prediction of non-additive genetic effects has also been of interest. In this study, we assessed the potential of genomic prediction using non-additive effects for phenotypic prediction in Japanese Black, a beef cattle breed. In addition, we examined the stability of variance component and genetic effect estimates against population size by subsampling with different sample sizes. Results Records of six carcass traits, namely, carcass weight, rib eye area, rib thickness, subcutaneous fat thickness, yield rate and beef marbling score, for 9850 animals were used for analyses. As the non-additive genetic effects, dominance, additive-by-additive, additive-by-dominance and dominance-by-dominance effects were considered. The covariance structures of these genetic effects were defined using genome-wide SNPs. Using single-trait animal models with different combinations of genetic effects, it was found that 12.6–19.5 % of phenotypic variance were occupied by the additive-by-additive variance, whereas little dominance variance was observed. In cross-validation, adding the additive-by-additive effects had little influence on predictive accuracy and bias. Subsampling analyses showed that estimation of the additive-by-additive effects was highly variable when phenotypes were not available. On the other hand, the estimates of the additive-by-additive variance components were less affected by reduction of the population size. Conclusions The six carcass traits of Japanese Black cattle showed moderate or relatively high levels of additive-by-additive variance components, although incorporating the additive-by-additive effects did not improve the predictive accuracy. Subsampling analysis suggested that estimation of the additive-by-additive effects was highly reliant on the phenotypic values of the animals to be estimated, as supported by low off-diagonal values of the relationship matrix. On the other hand, estimates of the additive-by-additive variance components were relatively stable against reduction of the population size compared with the estimates of the corresponding genetic effects.

Download Full-text

Modeling first order additive × additive epistasis improves accuracy of genomic prediction for sclerotinia stem rot resistance in canola

The Plant Genome ◽

10.1002/tpg2.20088 ◽

2021 ◽

Author(s):

Mark C Derbyshire ◽

Yuphin Khentry ◽

Anita Severn‐Ellis ◽

Virginia Mwape ◽

Nur Shuhadah Mohd Saad ◽

...

Keyword(s):

Genomic Prediction ◽

Stem Rot ◽

Sclerotinia Stem Rot ◽

First Order

Download Full-text

GPS Coordinates for Modelling Correlated Herd Effects in Genomic Prediction Models Applied to Hanwoo Beef Cattle

Animals ◽

10.3390/ani11072050 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2050

Author(s):

Beatriz Castro Dias Cuyabano ◽

Gabriel Rovere ◽

Dajeong Lim ◽

Tae Hun Kim ◽

Hak Kyo Lee ◽

...

Keyword(s):

Environmental Factors ◽

Genomic Prediction ◽

Prediction Models ◽

Phenotypic Expression ◽

Genetic Evaluation ◽

Genomic Breeding ◽

Breeding Values ◽

Korean Cattle ◽

Evaluation Programs ◽

The Impact

It is widely known that the environment influences phenotypic expression and that its effects must be accounted for in genetic evaluation programs. The most used method to account for environmental effects is to add herd and contemporary group to the model. Although generally informative, the herd effect treats different farms as independent units. However, if two farms are located physically close to each other, they potentially share correlated environmental factors. We introduce a method to model herd effects that uses the physical distances between farms based on the Global Positioning System (GPS) coordinates as a proxy for the correlation matrix of these effects that aims to account for similarities and differences between farms due to environmental factors. A population of Hanwoo Korean cattle was used to evaluate the impact of modelling herd effects as correlated, in comparison to assuming the farms as completely independent units, on the variance components and genomic prediction. The main result was an increase in the reliabilities of the predicted genomic breeding values compared to reliabilities obtained with traditional models (across four traits evaluated, reliabilities of prediction presented increases that ranged from 0.05 ± 0.01 to 0.33 ± 0.03), suggesting that these models may overestimate heritabilities. Although little to no significant gain was obtained in phenotypic prediction, the increased reliability of the predicted genomic breeding values is of practical relevance for genetic evaluation programs.

Download Full-text