scholarly journals Evaluation of Bayesian Alphabet and GBLUP Based on Different Marker Density for Genomic Prediction in Alpine Merino Sheep

Author(s):  
Shaohua Zhu ◽  
Tingting Guo ◽  
Chao Yuan ◽  
Jianbin Liu ◽  
Jianye Li ◽  
...  

ABSTRACT The marker density, the heritability level of trait and the statistical models adopted are critical to the accuracy of genomic prediction (GP) or selection (GS). If the potential of GP is to be fully utilized to optimize the effect of breeding and selection, in addition to incorporating the above factors into simulated data for analysis, it is essential to incorporate these factors into real data for understanding their impact on GP accuracy, more clearly and intuitively. Herein, we studied the genomic prediction of six wool traits of sheep by two different models, including Bayesian Alphabet (BayesA, BayesB, BayesC π and Bayesian LASSO) and genomic best linear unbiased prediction (GBLUP). We adopted 5-fold cross-validation to perform the accuracy evaluation based on the genotyping data of Alpine Merino sheep (n = 821). The main aim was to study the influence and interaction of different models and marker densities on GP accuracy. The GP accuracy of the six traits was found to be between 0.28 and 0.60, as demonstrated by the cross-validation results. We showed that the accuracy of GP could be improved by increasing the marker density, which is closely related to the model adopted and the heritability level of the trait. Moreover, based on two different marker densities, it was derived that the prediction effect of GBLUP model for traits with low heritability was better; while with the increase of heritability level, the advantage of Bayesian Alphabet would be more obvious, therefore, different models of GP are appropriate in different traits. These findings indicated the significance of applying appropriate models for GP which would assist in further exploring the optimization of GP.

2021 ◽  
Author(s):  
Shaohua Zhu ◽  
Tingting Guo ◽  
Chao Yuan ◽  
Jianbin Liu ◽  
Jianye Li ◽  
...  

Abstract BackgroundThe marker density, the heritability level of trait and the statistical models adopted are critical to the accuracy of genomic prediction (GP) or genomic selection (GS). The studies on the impact of the above factors on accuracy of GP are usually focused on the comparison and discussion of simulated datasets. If the potential of GS is to be fully utilized to optimize the effect of breeding and selection, it is essential to incorporate these factors into real data for understanding their impact on GP accuracy, more clearly and intuitively. Herein, we studied the genomic prediction of six wool traits of sheep by two different models, including genomic best linear unbiased prediction (GBLUP), and Bayes-Alphabet. We adopted 5-fold cross-validation to perform the accuracy evaluation based on the genotyping data of Alpine Merino sheep (n=821). ResultsThe GP accuracy of the six traits was found to be between 0.28 and 0.60, as demonstrated by the cross-validation results. We showed that the accuracy of GP could be improved by increasing the marker density, which is closely related to the model adopted and the heritability level of the trait. Moreover, based on two different marker densities, it was derived that the prediction effect of GBLUP model for traits with low heritability was better (GBLUP has the highest accuracy of 28.57% higher than Bayes-Alphabet); while with the increase of heritability level, the advantage of Bayes-Alphabet would be more obvious, therefore, different models of GP are appropriate in different traits. ConclusionThis is the first study of optimization of GP has been applied to the domesticated Alpine Merino sheep populations. The main aim was to study the influence and interaction of different models and marker densities on GP accuracy. These findings indicated the significance of applying appropriate models for GP which would assist in further exploring the optimization of GP.


2021 ◽  
Author(s):  
Miguel Angel Raffo ◽  
Pernille Sarup ◽  
Xiangyu Guo ◽  
Huiming Liu ◽  
Jeppe Reitan Andersen ◽  
...  

Abstract Epistasis is the principal non-additive genetic effect in inbred wheat lines and can be used to develop cultivars based on total genetic merit. Correct models for variance components (VCs) estimation are needed to disentangle the genetic architecture of complex traits in wheat. We aimed to i) evaluate the performance of extended genomic best linear unbiased prediction (EG-BLUP) and the natural and orthogonal interactions approach (NOIA) for VCs estimation in a commercial wheat-breeding population, and ii) investigate whether including epistasis in genomic prediction enhance predictive ability (PA) for wheat breeding lines. In total, 2,060 sixth-generation (F6) lines from Nordic Seed A/S breeding company were phenotyped for grain yield over 21-year-x-location combinations in Denmark, and genotyped using 15K Illumina-BeadChip. Four models were used to estimate VCs and heritability at plot level: i) Baseline, ii) Genomic best linear unbiased prediction (G-BLUP), iii) EG-BLUP, and iv) NOIA. Narrow- and broad-sense heritabilities estimated with G-BLUP were 0.15 and 0.31, respectively. EG-BLUP and NOIA failed to achieve orthogonal partition of genetic variances. Even though NOIA removed Hardy-Weinberg equilibrium assumption, both models yielded very similar estimates, indicating that linkage disequilibrium causes the lack of orthogonality. The PA was studied using leave-one-line-out and leave-one-breeding-cycle-out cross-validations. Both EG-BLUP and NOIA increased PA significantly (16.5%) compared to G-BLUP in leave-one-line-out cross-validation. However, the improvement for including epistasis was not observed in the leave-one-breeding-cycle-out cross-validation. We conclude that although the variance partition into orthogonal genetic effects was not possible, epistatic models can be useful to enhance predictions of total genetic merit.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 266
Author(s):  
Hossein Mehrban ◽  
Masoumeh Naserkheil ◽  
Deuk Hwan Lee ◽  
Chungil Cho ◽  
Taejeong Choi ◽  
...  

The weighted single-step genomic best linear unbiased prediction (GBLUP) method has been proposed to exploit information from genotyped and non-genotyped relatives, allowing the use of weights for single-nucleotide polymorphism in the construction of the genomic relationship matrix. The purpose of this study was to investigate the accuracy of genetic prediction using the following single-trait best linear unbiased prediction methods in Hanwoo beef cattle: pedigree-based (PBLUP), un-weighted (ssGBLUP), and weighted (WssGBLUP) single-step genomic methods. We also assessed the impact of alternative single and window weighting methods according to their effects on the traits of interest. The data was comprised of 15,796 phenotypic records for yearling weight (YW) and 5622 records for carcass traits (backfat thickness: BFT, carcass weight: CW, eye muscle area: EMA, and marbling score: MS). Also, the genotypic data included 6616 animals for YW and 5134 for carcass traits on the 43,950 single-nucleotide polymorphisms. The ssGBLUP showed significant improvement in genomic prediction accuracy for carcass traits (71%) and yearling weight (99%) compared to the pedigree-based method. The window weighting procedures performed better than single SNP weighting for CW (11%), EMA (11%), MS (3%), and YW (6%), whereas no gain in accuracy was observed for BFT. Besides, the improvement in accuracy between window WssGBLUP and the un-weighted method was low for BFT and MS, while for CW, EMA, and YW resulted in a gain of 22%, 15%, and 20%, respectively, which indicates the presence of relevant quantitative trait loci for these traits. These findings indicate that WssGBLUP is an appropriate method for traits with a large quantitative trait loci effect.


2020 ◽  
Author(s):  
Fanny Mollandin ◽  
Andrea Rau ◽  
Pascal Croiseau

ABSTRACTTechnological advances and decreasing costs have led to the rise of increasingly dense genotyping data, making feasible the identification of potential causal markers. Custom genotyping chips, which combine medium-density genotypes with a custom genotype panel, can capitalize on these candidates to potentially yield improved accuracy and interpretability in genomic prediction. A particularly promising model to this end is BayesR, which divides markers into four effect size classes. BayesR has been shown to yield accurate predictions and promise for quantitative trait loci (QTL) mapping in real data applications, but an extensive benchmarking in simulated data is currently lacking. Based on a set of real genotypes, we generated simulated data under a variety of genetic architectures, phenotype heritabilities, and we evaluated the impact of excluding or including causal markers among the genotypes. We define several statistical criteria for QTL mapping, including several based on sliding windows to account for linkage disequilibrium. We compare and contrast these statistics and their ability to accurately prioritize known causal markers. Overall, we confirm the strong predictive performance for BayesR in moderately to highly heritable traits, particularly for 50k custom data. In cases of low heritability or weak linkage disequilibrium with the causal marker in 50k genotypes, QTL mapping is a challenge, regardless of the criterion used. BayesR is a promising approach to simultaneously obtain accurate predictions and interpretable classifications of SNPs into effect size classes. We illustrated the performance of BayesR in a variety of simulation scenarios, and compared the advantages and limitations of each.


2021 ◽  
Vol 12 ◽  
Author(s):  
Zigui Wang ◽  
Hao Cheng

Genomic prediction has been widely used in multiple areas and various genomic prediction methods have been developed. The majority of these methods, however, focus on statistical properties and ignore the abundant useful biological information like genome annotation or previously discovered causal variants. Therefore, to improve prediction performance, several methods have been developed to incorporate biological information into genomic prediction, mostly in single-trait analysis. A commonly used method to incorporate biological information is allocating molecular markers into different classes based on the biological information and assigning separate priors to molecular markers in different classes. It has been shown that such methods can achieve higher prediction accuracy than conventional methods in some circumstances. However, these methods mainly focus on single-trait analysis, and available priors of these methods are limited. Thus, in both single-trait and multiple-trait analysis, we propose the multi-class Bayesian Alphabet methods, in which multiple Bayesian Alphabet priors, including RR-BLUP, BayesA, BayesB, BayesCΠ, and Bayesian LASSO, can be used for markers allocated to different classes. The superior performance of the multi-class Bayesian Alphabet in genomic prediction is demonstrated using both real and simulated data. The software tool JWAS offers open-source routines to perform these analyses.


Genetics ◽  
2020 ◽  
Vol 216 (1) ◽  
pp. 27-41
Author(s):  
Simon Rio ◽  
Laurence Moreau ◽  
Alain Charcosset ◽  
Tristan Mary-Huard

Populations structured into genetic groups may display group-specific linkage disequilibrium, mutations, and/or interactions between quantitative trait loci and the genetic background. These factors lead to heterogeneous marker effects affecting the efficiency of genomic prediction, especially for admixed individuals. Such individuals have a genome that is a mosaic of chromosome blocks from different origins, and may be of interest to combine favorable group-specific characteristics. We developed two genomic prediction models adapted to the prediction of admixed individuals in presence of heterogeneous marker effects: multigroup admixed genomic best linear unbiased prediction random individual (MAGBLUP-RI), modeling the ancestry of alleles; and multigroup admixed genomic best linear unbiased prediction random allele effect (MAGBLUP-RAE), modeling group-specific distributions of allele effects. MAGBLUP-RI can estimate the segregation variance generated by admixture while MAGBLUP-RAE can disentangle the variability that is due to main allele effects from the variability that is due to group-specific deviation allele effects. Both models were evaluated for their genomic prediction accuracy using a maize panel including lines from the Dent and Flint groups, along with admixed individuals. Based on simulated traits, both models proved their efficiency to improve genomic prediction accuracy compared to standard GBLUP models. For real traits, a clear gain was observed at low marker densities whereas it became limited at high marker densities. The interest of including admixed individuals in multigroup training sets was confirmed using simulated traits, but was variable using real traits. Both MAGBLUP models and admixed individuals are of interest whenever group-specific SNP allele effects exist.


2017 ◽  
Author(s):  
Haipeng Yu ◽  
Matthew L. Spangler ◽  
Ronald M. Lewis ◽  
Gota Morota

AbstractGenetic connectedness refers to a measure of genetic relatedness across management units (e.g., herds and flocks). With the presence of high genetic connectedness in management units, best linear unbiased prediction (BLUP) is known to provide reliable comparisons between genetic values. Genetic connectedness has been studied for pedigree-based BLUP; however, relatively little attention has been paid to using genomic information to measure connectedness. In this study, we assessed genome-based connectedness across management units by applying prediction error variance of difference (PEVD), coefficient of determination (CD), and prediction error correlation (r) to a combination of computer simulation and real data (mice and cattle). We found that genomic information (G) increased the estimate of connectedness among individuals from different management units compared to that based on pedigree (A). A disconnected design benefited the most. In both datasets, PEVD and CD statistics inferred increased connectedness across units when using G- rather than A-based relatedness suggesting stronger connectedness. With r once using allele frequencies equal to one-half or scaling G to values between 0 and 2, which is intrinsic to A, connectedness also increased with genomic information. However, PEVD occasionally increased, and r decreased when obtained using the alternative form of G, instead suggesting less connectedness. Such inconsistencies were not found with CD. We contend that genomic relatedness strengthens measures of genetic connectedness across units and has the potential to aid genomic evaluation of livestock species.The problem of connectedness or disconnectedness is particularly important in genetic evaluation of managed populations such as domesticated livestock. When selecting among animals from different management units (e.g., herds and flocks), caution is needed; choosing one animal over others across management units may be associated with greater uncertainty than selection within management units. Such uncertainty is reduced if individuals from different management units are genetically linked or connected. In such a case, best linear unbiased prediction (BLUP) offers meaningful comparison of the breeding values across management units for genetic evaluation (e.g., Kuehn et al., 2007).Structures of breeding programs have a direct influence on levels of connectedness. Wide use of artificial insemination (AI) programs generally increases genetic connectedness across management units. For example, dairy cattle populations are considered highly connected due to dissemination of genetic material from a small number of highly selected sires. The situation may be different for species with less use of AI and more use of natural service mating such as for beef cattle or sheep populations. Under these scenarios, the magnitude of connectedness across management units is reduced and genetic links are largely confined within management units.Pedigree-based genetic connectedness has been evaluated and applied in practice (e.g., Kuehn et al., 2009; Eikje and Lewis, 2015). However, there is a relative paucity of use of genomic information such as single nucletide polymorphisms (SNPs) to ascertain connectedness. It still remains elusive in what scenarios genomics can strengthen connectedness and how much gain can be expected relative to use of pedigree information alone. Connectedness statistics have been used to optimize selective genotyping and phenotyping in simulated livestock (Pszczola et al., 2012) and plant populations (Maenhout et al., 2010), and in real maize (Rincent et al., 2012; Isidro et al., 2015), and real rice data (Isidro et al., 2015). These studies concluded that the greater the connectedness between the reference and validation populations, the greater the predictive performance. However, 1) connectedness among different management units and 2) differences in connectedness measures between pedigree and genomic relatedness were not explored in those studies. For better understanding of genome-based connectedness, it is critical to examine how the presence of management units comes into play. For instance, genomic relatedness provides relationships between distant individuals that appear disconnected according to the pedigree information. In addition, it captures Mendelian sampling that is not present in pedigree relationships (Hill and Weir, 2011). Thus, genomic information is expected to strengthen measures of connectedness, which in turn refines comparisons of genetic values across different management units. The objective of this study was to assess measures of genetic connectedness across management units with use of genomic information. We leveraged the combination of real data and computer simulation to compare gains in measures of connectedness when moving from pedigree to genomic relationships. First, we studied a heterogenous mice dataset stratified by cage. Then we investigated approaches to measure connectedness using real cattle data coupled with simulated management units to have greater control over the degree of confounding between fixed management groups and genetic relationships.


Animals ◽  
2020 ◽  
Vol 10 (4) ◽  
pp. 569
Author(s):  
Chen Wei ◽  
Hanpeng Luo ◽  
Bingru Zhao ◽  
Kechuan Tian ◽  
Xixia Huang ◽  
...  

Genomic evaluations are a method for improving the accuracy of breeding value estimation. This study aimed to compare estimates of genetic parameters and the accuracy of breeding values for wool traits in Merino sheep between pedigree-based best linear unbiased prediction (PBLUP) and single-step genomic best linear unbiased prediction (ssGBLUP) using Bayesian inference. Data were collected from 28,391 yearlings of Chinese Merino sheep (classified in 1992–2018) at the Xinjiang Gonaisi Fine Wool Sheep-Breeding Farm, China. Subjectively-assessed wool traits, namely, spinning count (SC), crimp definition (CRIM), oil (OIL), and body size (BS), and objectively-measured traits, namely, fleece length (FL), greasy fleece weight (GFW), mean fiber diameter (MFD), crimp number (CN), and body weight pre-shearing (BWPS), were analyzed. The estimates of heritability for wool traits were low to moderate. The largest h2 values were observed for FL (0.277) and MFD (0.290) with ssGBLUP. The heritabilities estimated for wool traits with ssGBLUP were slightly higher than those obtained with PBLUP. The accuracies of breeding values were low to moderate, ranging from 0.362 to 0.573 for the whole population and from 0.318 to 0.676 for the genotyped subpopulation. The correlation between the estimated breeding values (EBVs) and genomic EBVs (GEBVs) ranged from 0.717 to 0.862 for the whole population, and the relative increase in accuracy when comparing EBVs with GEBVs ranged from 0.372% to 7.486% for these traits. However, in the genotyped population, the rank correlation between the estimates obtained with PBLUP and ssGBLUP was reduced to 0.525 to 0.769, with increases in average accuracy of 3.016% to 11.736% for the GEBVs in relation to the EBVs. Thus, genomic information could allow us to more accurately estimate the relationships between animals and improve estimates of heritability and the accuracy of breeding values by ssGBLUP.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Hailiang Song ◽  
Qin Zhang ◽  
Xiangdong Ding

Abstract Background Different production systems and climates could lead to genotype-by-environment (G × E) interactions between populations, and the inclusion of G × E interactions is becoming essential in breeding decisions. The objective of this study was to investigate the performance of multi-trait models in genomic prediction in a limited number of environments with G × E interactions. Results In total, 2,688 and 1,384 individuals with growth and reproduction phenotypes, respectively, from two Yorkshire pig populations with similar genetic backgrounds were genotyped with the PorcineSNP80 panel. Single- and multi-trait models with genomic best linear unbiased prediction (GBLUP) and BayesC π were implemented to investigate their genomic prediction abilities with 20 replicates of five-fold cross-validation. Our results regarding between-environment genetic correlations of growth and reproductive traits (ranging from 0.618 to 0.723) indicated the existence of G × E interactions between these two Yorkshire pig populations. For single-trait models, genomic prediction with GBLUP was only 1.1% more accurate on average in the combined population than in single populations, and no significant improvements were obtained by BayesC π for most traits. In addition, single-trait models with either GBLUP or BayesC π produced greater bias for the combined population than for single populations. However, multi-trait models with GBLUP and BayesC π better accommodated G × E interactions, yielding 2.2% – 3.8% and 1.0% – 2.5% higher prediction accuracies for growth and reproductive traits, respectively, compared to those for single-trait models of single populations and the combined population. The multi-trait models also yielded lower bias and larger gains in the case of a small reference population. The smaller improvement in prediction accuracy and larger bias obtained by the single-trait models in the combined population was mainly due to the low consistency of linkage disequilibrium between the two populations, which also caused the BayesC π method to always produce the largest standard error in marker effect estimation for the combined population. Conclusions In conclusion, our findings confirmed that directly combining populations to enlarge the reference population is not efficient in improving the accuracy of genomic prediction in the presence of G × E interactions, while multi-trait models perform better in a limited number of environments with G × E interactions.


Sign in / Sign up

Export Citation Format

Share Document