Improving Genomic Prediction of Crossbred and Purebred Dairy Cattle

This study assessed the accuracy and bias of genomic prediction (GP) in purebred Holstein (H) and Jersey (J) as well as crossbred (H and J) validation cows using different reference sets and prediction strategies. The reference sets were made up of different combinations of 36,695 H and J purebreds and crossbreds. Additionally, the effect of using different sets of marker genotypes on GP was studied (conventional panel: 50k, custom panel enriched with, or close to, causal mutations: XT_50k, and conventional high-density with a limited custom set: pruned HDnGBS). We also compared the use of genomic best linear unbiased prediction (GBLUP) and Bayesian (emBayesR) models, and the traits tested were milk, fat, and protein yields. On average, by including crossbred cows in the reference population, the prediction accuracies increased by 0.01–0.08 and were less biased (regression coefficient closer to 1 by 0.02–0.16), and the benefit was greater for crossbreds compared to purebreds. The accuracy of prediction increased by 0.02 using XT_50k compared to 50k genotypes without affecting the bias. Although using pruned HDnGBS instead of 50k also increased the prediction accuracy by about 0.02, it increased the bias for purebred predictions in emBayesR models. Generally, emBayesR outperformed GBLUP for prediction accuracy when using 50k or pruned HDnGBS genotypes, but the benefits diminished with XT_50k genotypes. Crossbred predictions derived from a joint pure H and J reference were similar in accuracy to crossbred predictions derived from the two separate purebred reference sets and combined proportional to breed composition. However, the latter approach was less biased by 0.13. Most interestingly, using an equalized breed reference instead of an H-dominated reference, on average, reduced the bias of prediction by 0.16–0.19 and increased the accuracy by 0.04 for crossbred and J cows, with a little change in the H accuracy. In conclusion, we observed improved genomic predictions for both crossbreds and purebreds by equalizing breed contributions in a mixed breed reference that included crossbred cows. Furthermore, we demonstrate, that compared to the conventional 50k or high-density panels, our customized set of 50k sequence markers improved or matched the prediction accuracy and reduced bias with both GBLUP and Bayesian models.

Download Full-text

A study of Genomic Prediction across Generations of Two Korean Pig Populations

Animals ◽

10.3390/ani9090672 ◽

2019 ◽

Vol 9 (9) ◽

pp. 672 ◽

Cited By ~ 1

Author(s):

Beatriz Castro Dias Castro Dias Cuyabano ◽

Hanna Wackel ◽

Donghyun Shin ◽

Cedric Gondro

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Reference Population ◽

Relevant Information ◽

Production Traits ◽

Genomic Evaluation ◽

Genomic Breeding ◽

Breeding Values ◽

The Relationship ◽

Dense Marker

Genomic models that incorporate dense marker information have been widely used for predicting genomic breeding values since they were first introduced, and it is known that the relationship between individuals in the reference population and selection candidates affects the prediction accuracy. When genomic evaluation is performed over generations of the same population, prediction accuracy is expected to decay if the reference population is not updated. Therefore, the reference population must be updated in each generation, but little is known about the optimal way to do it. This study presents an empirical assessment of the prediction accuracy of genomic breeding values of production traits, across five generations in two Korean pig breeds. We verified the decay in prediction accuracy over time when the reference population was not updated. Additionally we compared the prediction accuracy using only the previous generation as the reference population, as opposed to using all previous generations as the reference population. Overall, the results suggested that, although there is a clear need to continuously update the reference population, it may not be necessary to keep all ancestral genotypes. Finally, comprehending how the accuracy of genomic prediction evolves over generations within a population adds relevant information to improve the performance of genomic selection.

Download Full-text

Accounting for Group-Specific Allele Effects and Admixture in Genomic Predictions: Theory and Experimental Evaluation in Maize

Genetics ◽

10.1534/genetics.120.303278 ◽

2020 ◽

Vol 216 (1) ◽

pp. 27-41

Author(s):

Simon Rio ◽

Laurence Moreau ◽

Alain Charcosset ◽

Tristan Mary-Huard

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Prediction Models ◽

Best Linear Unbiased Prediction ◽

Linear Unbiased Prediction ◽

Modeling Group ◽

A Genome ◽

Specific Allele ◽

Best Linear Unbiased ◽

Unbiased Prediction

Populations structured into genetic groups may display group-specific linkage disequilibrium, mutations, and/or interactions between quantitative trait loci and the genetic background. These factors lead to heterogeneous marker effects affecting the efficiency of genomic prediction, especially for admixed individuals. Such individuals have a genome that is a mosaic of chromosome blocks from different origins, and may be of interest to combine favorable group-specific characteristics. We developed two genomic prediction models adapted to the prediction of admixed individuals in presence of heterogeneous marker effects: multigroup admixed genomic best linear unbiased prediction random individual (MAGBLUP-RI), modeling the ancestry of alleles; and multigroup admixed genomic best linear unbiased prediction random allele effect (MAGBLUP-RAE), modeling group-specific distributions of allele effects. MAGBLUP-RI can estimate the segregation variance generated by admixture while MAGBLUP-RAE can disentangle the variability that is due to main allele effects from the variability that is due to group-specific deviation allele effects. Both models were evaluated for their genomic prediction accuracy using a maize panel including lines from the Dent and Flint groups, along with admixed individuals. Based on simulated traits, both models proved their efficiency to improve genomic prediction accuracy compared to standard GBLUP models. For real traits, a clear gain was observed at low marker densities whereas it became limited at high marker densities. The interest of including admixed individuals in multigroup training sets was confirmed using simulated traits, but was variable using real traits. Both MAGBLUP models and admixed individuals are of interest whenever group-specific SNP allele effects exist.

Download Full-text

Genetic architecture and genomic prediction accuracy of apple quantitative traits across environments

10.1101/2021.11.29.470309 ◽

2021 ◽

Author(s):

Michaela Jung ◽

Beat Keller ◽

Morgane Roth ◽

Maria Jose Aranzana ◽

Annemarie Auwerkerken ◽

...

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Genetic Architecture ◽

Quantitative Traits ◽

Prediction Models ◽

Phenotypic Variability ◽

Reference Population ◽

Genomic Study ◽

Genomic Tools ◽

Breeding Efficiency

Implementation of genomic tools is desirable to increase the efficiency of apple breeding. The apple reference population (apple REFPOP) proved useful for rediscovering loci, estimating genomic prediction accuracy, and studying genotype by environment interactions (GxE). Here we show contrasting genetic architecture and genomic prediction accuracies for 30 quantitative traits across up to six European locations using the apple REFPOP. A total of 59 stable and 277 location-specific associations were found using GWAS, 69.2% of which are novel when compared with 41 reviewed publications. Average genomic prediction accuracies of 0.18-0.88 were estimated using single-environment univariate, single-environment multivariate, multi-environment univariate, and multi-environment multivariate models. The GxE accounted for up to 24% of the phenotypic variability. This most comprehensive genomic study in apple in terms of trait-environment combinations provided knowledge of trait biology and prediction models that can be readily applied for marker-assisted or genomic selection, thus facilitating increased breeding efficiency.

Download Full-text

Optimizing Low-Cost Genotyping and Imputation Strategies for Genomic Selection in Atlantic Salmon

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400800 ◽

2019 ◽

Vol 10 (2) ◽

pp. 581-590 ◽

Cited By ~ 4

Author(s):

Smaragda Tsairidou ◽

Alastair Hamilton ◽

Diego Robledo ◽

James E. Bron ◽

Ross D. Houston

Keyword(s):

Atlantic Salmon ◽

Genomic Selection ◽

Environmental Sustainability ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Imputation Accuracy ◽

Cost Effective ◽

High Density ◽

Genotype Imputation ◽

Breeding Programs

Genomic selection enables cumulative genetic gains in key production traits such as disease resistance, playing an important role in the economic and environmental sustainability of aquaculture production. However, it requires genome-wide genetic marker data on large populations, which can be prohibitively expensive. Genotype imputation is a cost-effective method for obtaining high-density genotypes, but its value in aquaculture breeding programs which are characterized by large full-sibling families has yet to be fully assessed. The aim of this study was to optimize the use of low-density genotypes and evaluate genotype imputation strategies for cost-effective genomic prediction. Phenotypes and genotypes (78,362 SNPs) were obtained for 610 individuals from a Scottish Atlantic salmon breeding program population (Landcatch, UK) challenged with sea lice, Lepeophtheirus salmonis. The genomic prediction accuracy of genomic selection was calculated using GBLUP approaches and compared across SNP panels of varying densities and composition, with and without imputation. Imputation was tested when parents were genotyped for the optimal SNP panel, and offspring were genotyped for a range of lower density imputation panels. Reducing SNP density had little impact on prediction accuracy until 5,000 SNPs, below which the accuracy dropped. Imputation accuracy increased with increasing imputation panel density. Genomic prediction accuracy when offspring were genotyped for just 200 SNPs, and parents for 5,000 SNPs, was 0.53. This accuracy was similar to the full high density and optimal density dataset, and markedly higher than using 200 SNPs without imputation. These results suggest that imputation from very low to medium density can be a cost-effective tool for genomic selection in Atlantic salmon breeding programs.

Download Full-text

The superiority of multi-trait models with genotype-by-environment interactions in a limited number of environments for genomic prediction in pigs

Journal of Animal Science and Biotechnology ◽

10.1186/s40104-020-00493-8 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Hailiang Song ◽

Qin Zhang ◽

Xiangdong Ding

Keyword(s):

Genomic Prediction ◽

Production Systems ◽

Genetic Correlations ◽

Reproductive Traits ◽

Reference Population ◽

Linear Unbiased Prediction ◽

Genotype By Environment ◽

Best Linear Unbiased ◽

Effect Estimation ◽

Two Populations

Abstract Background Different production systems and climates could lead to genotype-by-environment (G × E) interactions between populations, and the inclusion of G × E interactions is becoming essential in breeding decisions. The objective of this study was to investigate the performance of multi-trait models in genomic prediction in a limited number of environments with G × E interactions. Results In total, 2,688 and 1,384 individuals with growth and reproduction phenotypes, respectively, from two Yorkshire pig populations with similar genetic backgrounds were genotyped with the PorcineSNP80 panel. Single- and multi-trait models with genomic best linear unbiased prediction (GBLUP) and BayesC π were implemented to investigate their genomic prediction abilities with 20 replicates of five-fold cross-validation. Our results regarding between-environment genetic correlations of growth and reproductive traits (ranging from 0.618 to 0.723) indicated the existence of G × E interactions between these two Yorkshire pig populations. For single-trait models, genomic prediction with GBLUP was only 1.1% more accurate on average in the combined population than in single populations, and no significant improvements were obtained by BayesC π for most traits. In addition, single-trait models with either GBLUP or BayesC π produced greater bias for the combined population than for single populations. However, multi-trait models with GBLUP and BayesC π better accommodated G × E interactions, yielding 2.2% – 3.8% and 1.0% – 2.5% higher prediction accuracies for growth and reproductive traits, respectively, compared to those for single-trait models of single populations and the combined population. The multi-trait models also yielded lower bias and larger gains in the case of a small reference population. The smaller improvement in prediction accuracy and larger bias obtained by the single-trait models in the combined population was mainly due to the low consistency of linkage disequilibrium between the two populations, which also caused the BayesC π method to always produce the largest standard error in marker effect estimation for the combined population. Conclusions In conclusion, our findings confirmed that directly combining populations to enlarge the reference population is not efficient in improving the accuracy of genomic prediction in the presence of G × E interactions, while multi-trait models perform better in a limited number of environments with G × E interactions.

Download Full-text

A classic approach for determining genomic prediction accuracy under terminal drought stress and well-watered conditions in wheat landraces and cultivars

PLoS ONE ◽

10.1371/journal.pone.0247824 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0247824

Author(s):

Morteza Shabannejad ◽

Mohammad-Reza Bihamta ◽

Eslam Majidi-Hervan ◽

Hadi Alipour ◽

Asa Ebrahimi

Keyword(s):

Drought Stress ◽

Genomic Selection ◽

Bread Wheat ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Terminal Drought ◽

Genome Wide ◽

Best Linear Unbiased ◽

Terminal Drought Stress ◽

Trait Associations

The present study aimed to improve the accuracy of genomic prediction of 16 agronomic traits in a diverse bread wheat (Triticum aestivum L.) germplasm under terminal drought stress and well-watered conditions in semi-arid environments. An association panel including 87 bread wheat cultivars and 199 landraces from Iran bread wheat germplasm was planted under two irrigation systems in semi-arid climate zones. The whole association panel was genotyped with 9047 single nucleotide polymorphism markers using the genotyping-by-sequencing method. A number of 23 marker-trait associations were selected for traits under each condition, whereas 17 marker-trait associations were common between terminal drought stress and well-watered conditions. The identified marker-trait associations were mostly single nucleotide polymorphisms with minor allele effects. This study examined the effect of population structure, genomic selection method (ridge regression-best linear unbiased prediction, genomic best-linear unbiased predictions, and Bayesian ridge regression), training set size, and type of marker set on genomic prediction accuracy. The prediction accuracies were low (-0.32) to moderate (0.52). A marker set including 93 significant markers identified through genome-wide association studies with P values ≤ 0.001 increased the genomic prediction accuracy for all traits under both conditions. This study concluded that obtaining the highest genomic prediction accuracy depends on the extent of linkage disequilibrium, the genetic architecture of trait, genetic diversity of the population, and the genomic selection method. The results encouraged the integration of genome-wide association study and genomic selection to enhance genomic prediction accuracy in applied breeding programs.

Download Full-text

Functionally prioritised whole-genome sequence variants improve the accuracy ofgenomic prediction for heat tolerance

10.21203/rs.3.rs-598177/v1 ◽

2021 ◽

Author(s):

Evans K. Cheruiyot ◽

Mekonnen Haile-Mariam ◽

Benjamin G. Cocks ◽

Iona M. MacLeod ◽

Raphael Mrode ◽

...

Keyword(s):

Heat Tolerance ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Well Being ◽

Sequence Variants ◽

Crossbred Cows ◽

Reference Set ◽

Qtl Discovery ◽

Snp Panel ◽

Jersey Cows

Abstract Background Heat tolerance is a trait of economic importance in the context of warm climates and the effects of global warming on livestock, production, reproduction, health, and well-being. It is desirable to improve the prediction accuracy for heat tolerance to help accelerate the genetic gain for this trait. This study investigated the improvement in prediction accuracy for heat tolerance when selected sets of sequence variants from a large genome-wide association study (GWAS) were incorporated into a standard 50k SNP panel used by the industry. Methods Over 40,000 dairy cattle (Holsteins, Jersey, and crossbreds) with genotype and phenotype data were analysed. The phenotypes used to measure an individual’s heat tolerance were defined as the rate of milk production decline (slope traits for the yield of milk, fat, and protein) with a rising temperature-humidity index. We used Holstein and Jersey cows to select sequence variants linked to heat tolerance based on GWAS. We then investigated the accuracy of prediction when sets of these pre-selected sequence variants were added to the 50k industry SNP array used routinely for genomic evaluations in Australia. We used a bull reference set to develop the genomic prediction equations and then validated them in an independent set of Holsteins, Jersey, and crossbred cows. The genomic prediction analyses were performed using BayesR and BayesRC methods. Results The accuracy of genomic prediction for heat tolerance improved by up to 7%, 5%, and 10% in Holsteins, Jersey, and crossbred cows, respectively, when sets of selected sequence markers from Holsteins (i.e., single-breed QTL discovery set) were added to the 50k industry SNP panel. Using pre-selected sequence variants identified based on a combined set of Holstein and Jersey cows in a multi-breed QTL discovery, a set of 6,132 to 6,422 SNPs generally improved accuracy, especially in the Jersey validation set. Combining Holstein and Jersey bulls (multi-breed) in the reference set improved prediction accuracy compared to using only Holstein bulls in the reference set. Conclusion Informative sequence markers can be prioritised to improve the genetic prediction of heat tolerance in different breeds, and these variants, in addition to providing biological insight, have direct application in the development of customized SNP arrays or can be utilised via imputation into current SNP sets.

Download Full-text

Genomic Prediction of Two Complex Orthopedic Traits Across Multiple Pure and Mixed Breed Dogs

Frontiers in Genetics ◽

10.3389/fgene.2021.666740 ◽

2021 ◽

Vol 12 ◽

Author(s):

Liping Jiang ◽

Zhuo Li ◽

Jessica J. Hayward ◽

Kei Hayashi ◽

Ursula Krotscheck ◽

...

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Cruciate Ligament ◽

Pearson Correlation ◽

Snp Array ◽

Reference Population ◽

Genotype Data ◽

Dna Array ◽

Cranial Cruciate Ligament ◽

Validation Population

Canine hip dysplasia (CHD) and rupture of the cranial cruciate ligament (RCCL) are two complex inherited orthopedic traits of dogs. These two traits may occur concurrently in the same dog. Genomic prediction of these two diseases would benefit veterinary medicine, the dog’s owner, and dog breeders because of their high prevalence, and because both traits result in painful debilitating osteoarthritis in affected joints. In this study, 842 unique dogs from 6 breeds with hip and stifle phenotypes were genotyped on a customized Illumina high density 183 k single nucleotide polymorphism (SNP) array and also analyzed using an imputed dataset of 20,487,155 SNPs. To implement genomic prediction, two different statistical methods were employed: Genomic Best Linear Unbiased Prediction (GBLUP) and a Bayesian method called BayesC. The cross-validation results showed that the two methods gave similar prediction accuracy (r = 0.3–0.4) for CHD (measured as Norberg angle) and RCCL in the multi-breed population. For CHD, the average correlation of the AUC was 0.71 (BayesC) and 0.70 (GBLUP), which is a medium level of prediction accuracy and consistent with Pearson correlation results. For RCCL, the correlation of the AUC was slightly higher. The prediction accuracy of GBLUP from the imputed genotype data was similar to the accuracy from DNA array data. We demonstrated that the genomic prediction of CHD and RCCL with DNA array genotype data is feasible in a multiple breed population if there is a genetic connection, such as breed, between the reference population and the validation population. Albeit these traits have heritability of about one-third, higher accuracy is needed to implement in a natural population and predicting a complex phenotype will require much larger number of dogs within a breed and across breeds. It is possible that with higher accuracy, genomic prediction of these orthopedic traits could be implemented in a clinical setting for early diagnosis and treatment, and the selection of dogs for breeding. These results need continuous improvement in model prediction through ongoing genotyping and data sharing. When genomic prediction indicates that a dog is susceptible to one of these orthopedic traits, it should be accompanied by clinical and radiographic screening at an acceptable age with appropriate follow-up.

Download Full-text

Haplotype Analysis of Genomic Prediction Using Structural and Functional Genomic Information for Seven Human Phenotypes

Frontiers in Genetics ◽

10.3389/fgene.2020.588907 ◽

2020 ◽

Vol 11 ◽

Author(s):

Zuoxiang Liang ◽

Cheng Tan ◽

Dzianis Prakapenka ◽

Li Ma ◽

Yang Da

Keyword(s):

Body Mass Index ◽

Total Cholesterol ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Haplotype Analysis ◽

Low Density Lipoproteins ◽

High Density ◽

High Density Lipoproteins ◽

Low Density ◽

Genomic Information

Genomic prediction using multi-allelic haplotype models improved the prediction accuracy for all seven human phenotypes, the normality transformed high density lipoproteins, low density lipoproteins, total cholesterol, triglycerides, weight, and the original height and body mass index without normality transformation. Eight SNP sets with 40,941-380,705 SNPs were evaluated. The increase in prediction accuracy due to haplotypes was 1.86-8.12%. Haplotypes using fixed chromosome distances had the best prediction accuracy for four phenotypes, fixed number of SNPs for two phenotypes, and gene-based haplotypes for high density lipoproteins and height (tied for best). Haplotypes of coding genes were more accurate than haplotypes of all autosome genes that included both coding and noncoding genes for triglycerides and weight, and nearly the same as haplotypes of all autosome genes for the other phenotypes. Haplotypes of noncoding genes (mostly lncRNAs) only improved the prediction accuracy over the SNP models for high density lipoproteins, total cholesterol, and height. ChIP-seq haplotypes had better prediction accuracy than gene-based haplotypes for total cholesterol, body mass index and low density lipoproteins. The accuracy of ChIP-seq haplotypes was most striking for low density lipoproteins, where all four haplotype models with ChIP-seq haplotypes had similarly high prediction accuracy over the best prediction model with gene-based haplotypes. Haplotype epistasis was shown to be the reason for the increased accuracy due to haplotypes. Low density lipoproteins had the largest haplotype epistasis heritability that explained 14.70% of the phenotypic variance and was 31.27% of the SNP additive heritability, and the largest increase in prediction accuracy relative to the best SNP model (8.12%). Relative to the SNP additive heritability of the same regions, noncoding genes had the highest haplotype epistasis heritability, followed by coding genes and ChIP-seq for the seven phenotypes. SNP and haplotype heritability profiles showed that the integration of SNP and haplotype additive values compensated the weakness of haplotypes in estimating SNP heritabilities for four phenotypes, whereas models with haplotype additive values fully accounted for SNP additive values for three phenotypes. These results showed that haplotype analysis can be a method to utilize functional and structural genomic information to improve the accuracy of genomic prediction.

Download Full-text

Accuracies of direct genomic breeding values for birth and weaning weights of registered Charolais cattle in Mexico

Animal Production Science ◽

10.1071/an18363 ◽

2020 ◽

Vol 60 (6) ◽

pp. 772

Author(s):

Francisco J. Jahuey-Martínez ◽

Gaspar M. Parra-Bracamonte ◽

Dorian J. Garrick ◽

Nicolás López-Villalobos ◽

Juan C. Martínez-González ◽

...

Keyword(s):

Genomic Prediction ◽

Reference Population ◽

Single Step ◽

Bayesian Regression ◽

Nucleotide Polymorphisms ◽

Linear Unbiased Prediction ◽

Genomic Breeding ◽

Breeding Values ◽

Charolais Cattle ◽

Best Linear Unbiased

Context Genomic prediction is now routinely used in many livestock species to rank individuals based on genomic breeding values (GEBV). Aims This study reports the first assessment aimed to evaluate the accuracy of direct GEBV for birth (BW) and weaning (WW) weights of registered Charolais cattle in Mexico. Methods The population assessed included 823 animals genotyped with an array of 77000 single nucleotide polymorphisms. Genomic prediction used genomic best linear unbiased prediction (GBLUP), Bayes C (BC), and single-step Bayesian regression (SSBR) methods in comparison with a pedigree-based BLUP method. Key results Our results show that the genomic prediction methods provided low and similar accuracies to BLUP. The prediction accuracy of GBLUP and BC were identical at 0.31 for BW and 0.29 for WW, similar to BLUP. Prediction accuracies of SSBR for BW and WW were up to 4% higher than those by BLUP. Conclusions Genomic prediction is feasible under current conditions, and provides a slight improvement using SSBR. Implications Some limitations on reference population size and structure were identified and need to be addressed to obtain more accurate predictions in liveweight traits under the prevalent cattle breeding conditions of Mexico.

Download Full-text