scholarly journals Evaluation of GBLUP and Bayes-Alphabet Based on Different Marker Density For Genomic Prediction in Alpine Merino Sheep

Author(s):  
Shaohua Zhu ◽  
Tingting Guo ◽  
Chao Yuan ◽  
Jianbin Liu ◽  
Jianye Li ◽  
...  

Abstract BackgroundThe marker density, the heritability level of trait and the statistical models adopted are critical to the accuracy of genomic prediction (GP) or genomic selection (GS). The studies on the impact of the above factors on accuracy of GP are usually focused on the comparison and discussion of simulated datasets. If the potential of GS is to be fully utilized to optimize the effect of breeding and selection, it is essential to incorporate these factors into real data for understanding their impact on GP accuracy, more clearly and intuitively. Herein, we studied the genomic prediction of six wool traits of sheep by two different models, including genomic best linear unbiased prediction (GBLUP), and Bayes-Alphabet. We adopted 5-fold cross-validation to perform the accuracy evaluation based on the genotyping data of Alpine Merino sheep (n=821). ResultsThe GP accuracy of the six traits was found to be between 0.28 and 0.60, as demonstrated by the cross-validation results. We showed that the accuracy of GP could be improved by increasing the marker density, which is closely related to the model adopted and the heritability level of the trait. Moreover, based on two different marker densities, it was derived that the prediction effect of GBLUP model for traits with low heritability was better (GBLUP has the highest accuracy of 28.57% higher than Bayes-Alphabet); while with the increase of heritability level, the advantage of Bayes-Alphabet would be more obvious, therefore, different models of GP are appropriate in different traits. ConclusionThis is the first study of optimization of GP has been applied to the domesticated Alpine Merino sheep populations. The main aim was to study the influence and interaction of different models and marker densities on GP accuracy. These findings indicated the significance of applying appropriate models for GP which would assist in further exploring the optimization of GP.

Author(s):  
Shaohua Zhu ◽  
Tingting Guo ◽  
Chao Yuan ◽  
Jianbin Liu ◽  
Jianye Li ◽  
...  

ABSTRACT The marker density, the heritability level of trait and the statistical models adopted are critical to the accuracy of genomic prediction (GP) or selection (GS). If the potential of GP is to be fully utilized to optimize the effect of breeding and selection, in addition to incorporating the above factors into simulated data for analysis, it is essential to incorporate these factors into real data for understanding their impact on GP accuracy, more clearly and intuitively. Herein, we studied the genomic prediction of six wool traits of sheep by two different models, including Bayesian Alphabet (BayesA, BayesB, BayesC π and Bayesian LASSO) and genomic best linear unbiased prediction (GBLUP). We adopted 5-fold cross-validation to perform the accuracy evaluation based on the genotyping data of Alpine Merino sheep (n = 821). The main aim was to study the influence and interaction of different models and marker densities on GP accuracy. The GP accuracy of the six traits was found to be between 0.28 and 0.60, as demonstrated by the cross-validation results. We showed that the accuracy of GP could be improved by increasing the marker density, which is closely related to the model adopted and the heritability level of the trait. Moreover, based on two different marker densities, it was derived that the prediction effect of GBLUP model for traits with low heritability was better; while with the increase of heritability level, the advantage of Bayesian Alphabet would be more obvious, therefore, different models of GP are appropriate in different traits. These findings indicated the significance of applying appropriate models for GP which would assist in further exploring the optimization of GP.


2021 ◽  
Author(s):  
Miguel Angel Raffo ◽  
Pernille Sarup ◽  
Xiangyu Guo ◽  
Huiming Liu ◽  
Jeppe Reitan Andersen ◽  
...  

Abstract Epistasis is the principal non-additive genetic effect in inbred wheat lines and can be used to develop cultivars based on total genetic merit. Correct models for variance components (VCs) estimation are needed to disentangle the genetic architecture of complex traits in wheat. We aimed to i) evaluate the performance of extended genomic best linear unbiased prediction (EG-BLUP) and the natural and orthogonal interactions approach (NOIA) for VCs estimation in a commercial wheat-breeding population, and ii) investigate whether including epistasis in genomic prediction enhance predictive ability (PA) for wheat breeding lines. In total, 2,060 sixth-generation (F6) lines from Nordic Seed A/S breeding company were phenotyped for grain yield over 21-year-x-location combinations in Denmark, and genotyped using 15K Illumina-BeadChip. Four models were used to estimate VCs and heritability at plot level: i) Baseline, ii) Genomic best linear unbiased prediction (G-BLUP), iii) EG-BLUP, and iv) NOIA. Narrow- and broad-sense heritabilities estimated with G-BLUP were 0.15 and 0.31, respectively. EG-BLUP and NOIA failed to achieve orthogonal partition of genetic variances. Even though NOIA removed Hardy-Weinberg equilibrium assumption, both models yielded very similar estimates, indicating that linkage disequilibrium causes the lack of orthogonality. The PA was studied using leave-one-line-out and leave-one-breeding-cycle-out cross-validations. Both EG-BLUP and NOIA increased PA significantly (16.5%) compared to G-BLUP in leave-one-line-out cross-validation. However, the improvement for including epistasis was not observed in the leave-one-breeding-cycle-out cross-validation. We conclude that although the variance partition into orthogonal genetic effects was not possible, epistatic models can be useful to enhance predictions of total genetic merit.


2021 ◽  
Vol 13 (1) ◽  
pp. 348
Author(s):  
Lukasz Skowron ◽  
Monika Sak-Skowron

The first of the research objectives discussed in this article was to analyze the differences related to the valuation of particular factors influencing the purchase process in the smartphone industry, expressed by respondents with different sensitivity and environmental awareness, as well as the assessment of their knowledge about the impact of smartphones on the natural environment. The second objective of the research was to determine whether the level of environmental sensitivity, awareness and knowledge about the impact of smartphones on the environment has a statistically significant influence on the respondents’ choice of smartphone brand. The survey was conducted using an on-line questionnaire, distributed by a specialized research agency on a representative sample of over 1000 Polish residents. In order to identify the various customers clusters, the expectation-maximization algorithm and the v-fold cross-validation were used. Additionally, in order to analyze the significance level of differences between clusters the nonparametric Mann-Whitney U-test was carried out. The results show unequivocally that people with a different approach to ecological issues demonstrate statistically significant differences in their purchasing behaviors in the smartphone industry. Furthermore, it was noticed that in the case of comparing some smartphones brands, there is a statistically confirmed difference in the environmental sensitivity and awareness of the customers who use them. Moreover, the research has shown that in Polish customers’ consciousness smartphones are mistakenly considered to be relatively safe and environmentally friendly products.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 266
Author(s):  
Hossein Mehrban ◽  
Masoumeh Naserkheil ◽  
Deuk Hwan Lee ◽  
Chungil Cho ◽  
Taejeong Choi ◽  
...  

The weighted single-step genomic best linear unbiased prediction (GBLUP) method has been proposed to exploit information from genotyped and non-genotyped relatives, allowing the use of weights for single-nucleotide polymorphism in the construction of the genomic relationship matrix. The purpose of this study was to investigate the accuracy of genetic prediction using the following single-trait best linear unbiased prediction methods in Hanwoo beef cattle: pedigree-based (PBLUP), un-weighted (ssGBLUP), and weighted (WssGBLUP) single-step genomic methods. We also assessed the impact of alternative single and window weighting methods according to their effects on the traits of interest. The data was comprised of 15,796 phenotypic records for yearling weight (YW) and 5622 records for carcass traits (backfat thickness: BFT, carcass weight: CW, eye muscle area: EMA, and marbling score: MS). Also, the genotypic data included 6616 animals for YW and 5134 for carcass traits on the 43,950 single-nucleotide polymorphisms. The ssGBLUP showed significant improvement in genomic prediction accuracy for carcass traits (71%) and yearling weight (99%) compared to the pedigree-based method. The window weighting procedures performed better than single SNP weighting for CW (11%), EMA (11%), MS (3%), and YW (6%), whereas no gain in accuracy was observed for BFT. Besides, the improvement in accuracy between window WssGBLUP and the un-weighted method was low for BFT and MS, while for CW, EMA, and YW resulted in a gain of 22%, 15%, and 20%, respectively, which indicates the presence of relevant quantitative trait loci for these traits. These findings indicate that WssGBLUP is an appropriate method for traits with a large quantitative trait loci effect.


2020 ◽  
Author(s):  
Fanny Mollandin ◽  
Andrea Rau ◽  
Pascal Croiseau

ABSTRACTTechnological advances and decreasing costs have led to the rise of increasingly dense genotyping data, making feasible the identification of potential causal markers. Custom genotyping chips, which combine medium-density genotypes with a custom genotype panel, can capitalize on these candidates to potentially yield improved accuracy and interpretability in genomic prediction. A particularly promising model to this end is BayesR, which divides markers into four effect size classes. BayesR has been shown to yield accurate predictions and promise for quantitative trait loci (QTL) mapping in real data applications, but an extensive benchmarking in simulated data is currently lacking. Based on a set of real genotypes, we generated simulated data under a variety of genetic architectures, phenotype heritabilities, and we evaluated the impact of excluding or including causal markers among the genotypes. We define several statistical criteria for QTL mapping, including several based on sliding windows to account for linkage disequilibrium. We compare and contrast these statistics and their ability to accurately prioritize known causal markers. Overall, we confirm the strong predictive performance for BayesR in moderately to highly heritable traits, particularly for 50k custom data. In cases of low heritability or weak linkage disequilibrium with the causal marker in 50k genotypes, QTL mapping is a challenge, regardless of the criterion used. BayesR is a promising approach to simultaneously obtain accurate predictions and interpretable classifications of SNPs into effect size classes. We illustrated the performance of BayesR in a variety of simulation scenarios, and compared the advantages and limitations of each.


2013 ◽  
Vol 284-287 ◽  
pp. 3111-3114
Author(s):  
Hsiang Chuan Liu ◽  
Wei Sung Chen ◽  
Ben Chang Shia ◽  
Chia Chen Lee ◽  
Shang Ling Ou ◽  
...  

In this paper, a novel fuzzy measure, high order lambda measure, was proposed, based on the Choquet integral with respect to this new measure, a novel composition forecasting model which composed the GM(1,1) forecasting model, the time series model and the exponential smoothing model was also proposed. For evaluating the efficiency of this improved composition forecasting model, an experiment with a real data by using the 5 fold cross validation mean square error was conducted. The performances of Choquet integral composition forecasting model with the P-measure, Lambda-measure, L-measure and high order lambda measure, respectively, a ridge regression composition forecasting model and a multiple linear regression composition forecasting model and the traditional linear weighted composition forecasting model were compared. The experimental results showed that the Choquet integral composition forecasting model with respect to the high order lambda measure has the best performance.


2016 ◽  
Vol 43 (2) ◽  
pp. 159-173 ◽  
Author(s):  
Amer Al-Badarneh ◽  
Emad Al-Shawakfa ◽  
Basel Bani-Ismail ◽  
Khaleel Al-Rababah ◽  
Safwan Shatnawi

This paper investigates the impact of using different indexing approaches (full-word, stem, and root) when classifying Arabic text. In this study, the naïve Bayes classifier is used to construct the multinomial classification models and is evaluated using stratified k-fold cross-validation ( k ranges from 2 to 10). It is also uses a corpus that consists of 1000 normalized Arabic documents. The results of one experiment in this study show that significant accuracy improvements have occurred when the full-word form is used in most k-folds. Further experiments show that the classifier has achieved the highest accuracy in the eight-fold by using 7/8–1/8 train–test ratio, despite the indexing approach being used. The overall results of this study show that the classifier has achieved the maximum micro-average accuracy 99.36%, either by using the full-word form or the stem form. This proves that the stem is a better choice to use when classifying Arabic text, because it makes the corpus dataset smaller and this will enhance both the processing time and storage utilization, and achieve the highest level of accuracy.


Genetics ◽  
2020 ◽  
Vol 216 (1) ◽  
pp. 27-41
Author(s):  
Simon Rio ◽  
Laurence Moreau ◽  
Alain Charcosset ◽  
Tristan Mary-Huard

Populations structured into genetic groups may display group-specific linkage disequilibrium, mutations, and/or interactions between quantitative trait loci and the genetic background. These factors lead to heterogeneous marker effects affecting the efficiency of genomic prediction, especially for admixed individuals. Such individuals have a genome that is a mosaic of chromosome blocks from different origins, and may be of interest to combine favorable group-specific characteristics. We developed two genomic prediction models adapted to the prediction of admixed individuals in presence of heterogeneous marker effects: multigroup admixed genomic best linear unbiased prediction random individual (MAGBLUP-RI), modeling the ancestry of alleles; and multigroup admixed genomic best linear unbiased prediction random allele effect (MAGBLUP-RAE), modeling group-specific distributions of allele effects. MAGBLUP-RI can estimate the segregation variance generated by admixture while MAGBLUP-RAE can disentangle the variability that is due to main allele effects from the variability that is due to group-specific deviation allele effects. Both models were evaluated for their genomic prediction accuracy using a maize panel including lines from the Dent and Flint groups, along with admixed individuals. Based on simulated traits, both models proved their efficiency to improve genomic prediction accuracy compared to standard GBLUP models. For real traits, a clear gain was observed at low marker densities whereas it became limited at high marker densities. The interest of including admixed individuals in multigroup training sets was confirmed using simulated traits, but was variable using real traits. Both MAGBLUP models and admixed individuals are of interest whenever group-specific SNP allele effects exist.


2020 ◽  
Author(s):  
Rafael Massahiro Yassue ◽  
José Felipe Gonzaga Sabadin ◽  
Giovanni Galli ◽  
Filipe Couto Alves ◽  
Roberto Fritsche-Neto

AbstractUsually, the comparison among genomic prediction models is based on validation schemes as Repeated Random Subsampling (RRS) or K-fold cross-validation. Nevertheless, the design of training and validation sets has a high effect on the way and subjectiveness that we compare models. Those procedures cited above have an overlap across replicates that might cause an overestimated estimate and lack of residuals independence due to resampling issues and might cause less accurate results. Furthermore, posthoc tests, such as ANOVA, are not recommended due to assumption unfulfilled regarding residuals independence. Thus, we propose a new way to sample observations to build training and validation sets based on cross-validation alpha-based design (CV-α). The CV-α was meant to create several scenarios of validation (replicates x folds), regardless of the number of treatments. Using CV-α, the number of genotypes in the same fold across replicates was much lower than K-fold, indicating higher residual independence. Therefore, based on the CV-α results, as proof of concept, via ANOVA, we could compare the proposed methodology to RRS and K-fold, applying four genomic prediction models with a simulated and real dataset. Concerning the predictive ability and bias, all validation methods showed similar performance. However, regarding the mean squared error and coefficient of variation, the CV-α method presented the best performance under the evaluated scenarios. Moreover, as it has no additional cost nor complexity, it is more reliable and allows the use of non-subjective methods to compare models and factors. Therefore, CV-α can be considered a more precise validation methodology for model selection.


Sign in / Sign up

Export Citation Format

Share Document