PSI-B-21 Alternative SNP weighting for multi-step and single-step genomic BLUP in the presence of causative variants

2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 228-229
Author(s):  
Bruna Santana ◽  
Molly Riser ◽  
Breno O Fragomeni

Abstract This study aimed to evaluate the accuracy of genomic prediction with simulated data, using SNP markers, causal quantitative trait nucleotide (QTN), and the combination of both. The methods used were the best linear unbiased prediction (GBLUP) and single-step GBLUP (ssGBLUP), with alternative SNP weights. Data were simulated using the package AlphasimR. Trait heritability of 0.3 was assumed, and genetic variance was fully accounted for by 100 or 1000 QTNs. A population with an effective size of 200 was selected, and 20 generations were simulated. The genomic information mimicked the 29 bovine chromosomes and included 50k SNP markers evenly distributed across the genome. Approximately 16800 genotypes were available from selected sires and dams in generations 16–19, and 2000 animals in generation 20. Phenotypes for young animals were not included in the analysis, as they were used in the validation. For GBLUP, three pseudo-phenotypes were considered: the raw phenotype, the true breeding value, and the true breeding value with noise added. The genomic relationship matrix was weighted using quadratic weights, calculated based on the SNP variance, and non-linear A, following different equation parameters. The scenario with exclusively causal variants presented accuracies close to 1 for 100 QTL, and slightly lower in the 1000 QTL. For the SNP + QTN scenario, quadratic weights promoted higher accuracy gains than the SNPs alone, especially in the 100 QTN trait. Accuracies converged at higher values for both quadratic and non-linear A weights in the 100 QTN scenario. For the 1000 QTN trait, quadratic weights diverged and reduced accuracy, while non-linear A maintained accuracy at their peaks, depending on the equation parameters. Parameters of non-linear A for highest accuracy were different in each scenario and type of analysis. Proportionally, gains in accuracy were more prominent with GBLUP than with ssGBLUP.

2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 51-51
Author(s):  
Sajjad Toghiani ◽  
Ling-Yun Chang ◽  
El H Hay ◽  
Andrew J Roberts ◽  
Samuel E Aggrey ◽  
...  

Abstract The dramatic advancement in genotyping technology has greatly reduced the complexity and cost of genotyping. The continuous increase in the density of marker panels is resulting in little to no improvement in the accuracy of genomic selection. Direct inversion of the genomic relationship matrix is infeasible for some livestock populations due to the excessive computational cost. In addition, most animals in genetic evaluation programs are non-genotyped. Including these animals in a genomic evaluation requires the imputation of the missing genotypes when using regression methods. To overcome these challenges, a hybrid approach is proposed. This approach fits a subset of SNP markers selected based on FST scores and a classical polygenic effect. The method was first tested using only genotyped animals and then extended to accommodate non-genotyped animals. The proposed approach was evaluated using simulated data for a trait with heritability of 0.1 and 0.4 and weaning weight in a crossbred beef cattle population. When all animals were genotyped, the hybrid approach using only 2.5% of prioritized SNPs exceeded the prediction accuracies of BayesB, BayesC, and GBLUP by more than 7%. When non-genotyped animals were incorporated, the proposed approach significantly outperformed ss-GBLUP method in terms of prediction accuracy under both simulated heritability scenarios. Although the results seem to depend on the genetic complexity of the trait, the proposed approach resulted in higher prediction accuracies than current methods. Furthermore, its computational costs in terms of CPU time and peak memory are substantially lower than the current methods.


2019 ◽  
Vol 97 (8) ◽  
pp. 3237-3245
Author(s):  
Amanda M Maiorano ◽  
Alula Assen ◽  
Piter Bijma ◽  
Ching-Yi Chen ◽  
Josineudson Augusto II Vasconcelos Silva ◽  
...  

Abstract Pooling semen of multiple boars is commonly used in swine production systems. Compared with single boar systems, this technique changes family structure creating maternal half-sib families. The aim of this simulation study was to investigate how pooling semen affects the accuracy of estimating direct and maternal effects for individual piglet birth weight, in purebred pigs. Different scenarios of pooling semen were simulated by allowing the same female to mate from 1 to 6 boars, per insemination, whereas litter size was kept constant (N = 12). In each pooled boar scenario, genomic information was used to construct either the genomic relationship matrix (G) or to reconstruct pedigree in addition to G. Genotypes were generated for 60,000 SNPs evenly distributed across 18 autosomes. From the 5 simulated generations, only animals from generations 3 to 5 were genotyped (N = 36,000). Direct and maternal true breeding values (TBV) were computed as the sum of the effects of the 1,080 QTLs. Phenotypes were constructed as the sum of direct TBV, maternal TBV, an overall mean of 1.25 kg, and a residual effect. The simulated heritabilities for direct and maternal effects were 0.056 and 0.19, respectively, and the genetic correlation between both effects was −0.25. All simulations were replicated 5 times. Variance components and direct and maternal heritability were estimated using average information REML. Predictions were computed via pedigree-based BLUP and single-step genomic BLUP (ssGBLUP). Genotyped littermates in the last generation were used for validation. Prediction accuracies were calculated as correlations between EBV and TBV for direct (accdirect) and maternal (accmat) effects. When boars were known, accdirect were 0.21 (1 boar) and 0.26 (6 boars) for BLUP, whereas for ssGBLUP, they were 0.38 (1 boar) and 0.43 (6 boars). When boars were unknown, accdirect was lower in BLUP but similar in ssGBLUP. For the scenario with known boars, accmat was 0.58 and 0.63 for 1 and 6 boars, respectively, under ssGBLUP. For unknown boars, accmat was 0.63 for 2 boars and 0.62 for 6 boars in ssGBLUP. In general, accdirect and accmat were lower in the single-boar scenario compared with pooled semen scenarios, indicating that a half-sib structure is more adequate to estimate direct and maternal effects. Using pooled semen from multiple boars can help us to improve accuracy of predicting maternal and direct effects when maternal half-sib families are larger than 2.


2011 ◽  
Vol 93 (3) ◽  
pp. 203-219 ◽  
Author(s):  
KATHRYN E. KEMPER ◽  
DAVID L. EMERY ◽  
STEPHEN C. BISHOP ◽  
HUTTON ODDY ◽  
BENJAMIN J. HAYES ◽  
...  

SummaryGenetic resistance to gastrointestinal worms is a complex trait of great importance in both livestock and humans. In order to gain insights into the genetic architecture of this trait, a mixed breed population of sheep was artificially infected with Trichostrongylus colubriformis (n=3326) and then Haemonchus contortus (n=2669) to measure faecal worm egg count (WEC). The population was genotyped with the Illumina OvineSNP50 BeadChip and 48 640 single nucleotide polymorphism (SNP) markers passed the quality controls. An independent population of 316 sires of mixed breeds with accurate estimated breeding values for WEC were genotyped for the same SNP to assess the results obtained from the first population. We used principal components from the genomic relationship matrix among genotyped individuals to account for population stratification, and a novel approach to directly account for the sampling error associated with each SNP marker regression. The largest marker effects were estimated to explain an average of 0·48% (T. colubriformis) or 0·08% (H. contortus) of the phenotypic variance in WEC. These effects are small but consistent with results from other complex traits. We also demonstrated that methods which use all markers simultaneously can successfully predict genetic merit for resistance to worms, despite the small effects of individual markers. Correlations of genomic predictions with breeding values of the industry sires reached a maximum of 0·32. We estimate that effective across-breed predictions of genetic merit with multi-breed populations will require an average marker spacing of approximately 10 kbp.


2014 ◽  
Vol 54 (5) ◽  
pp. 544 ◽  
Author(s):  
N. Moghaddar ◽  
A. A. Swan ◽  
J. H. J. van der Werf

The objective of this study was to predict the accuracy of genomic prediction for 26 traits, including weight, muscle, fat, and wool quantity and quality traits, in Australian sheep based on a large, multi-breed reference population. The reference population consisted of two research flocks, with the main breeds being Merino, Border Leicester (BL), Poll Dorset (PD), and White Suffolk (WS). The genomic estimated breeding value (GEBV) was based on GBLUP (genomic best linear unbiased prediction), applying a genomic relationship matrix calculated from the 50K Ovine SNP chip marker genotypes. The accuracy of GEBV was evaluated as the Pearson correlation coefficient between GEBV and accurate estimated breeding value based on progeny records in a set of genotyped industry animals. The accuracies of weight traits were relatively low to moderate in PD and WS breeds (0.11–0.27) and moderate to relatively high in BL and Merino (0.25–0.63). The accuracy of muscle and fat traits was moderate to relatively high across all breeds (between 0.21 and 0.55). The accuracy of GEBV of yearling and adult wool traits in Merino was, on average, high (0.33–0.75). The results showed the accuracy of genomic prediction depends on trait heritability and the effective size of the reference population, whereas the observed GEBV accuracies were more related to the breed proportions in the multi-breed reference population. No extra gain in within-breed GEBV accuracy was observed based on across breed information. More investigations are required to determine the precise effect of across-breed information on within-breed genomic prediction.


2021 ◽  
Vol 10 (3) ◽  
pp. 202-207

The aim of this investigation was to develop restricted selection index aiming to improve 305-day yields of milk (MY), fat (FY) and protein (PY), while keeping the deterioration in days open (DO), calving interval (CI) and number of services per conception (NSPC) at minimum levels in Holstein cows. The data represent 3682 records of 1122 cows, daughters of 95 sires and 712 dams. The data were analyzed by multi-trait animal model with repeated measures. Eight selection indexes (five unrestricted and three restricted) were derived using MY, FY, PY, CI, DO and NSPC in various combinations as sources of information in the indexes. However, the true breeding value included MY, FY and PY. The highest accuracy of selection (0.60) resulted from selection based on the full index. Milk yield and SPC appeared to be the most valuable traits in the full index. Combining the two traits into one index (the best reduced index) gave 0.57 accuracy of selection. The index based on MY alone (the most accurate single trait index) gave 0.53 accuracy. It seems possible to reduce the expected genetic deterioration in the reproductive traits by restricting the full index to result in zero genetic change in NSPC (rTI=0.48). This restriction will allow the breeder to mitigate the deterioration in DO and CI by 12 and 16 days respectively, by sacrificing with part of the expected genetic improvement in productive traits (29, 40 and 48% in MY, FY and PY, respectively).


2020 ◽  
Vol 10 (6) ◽  
pp. 2069-2078 ◽  
Author(s):  
Christos Palaiokostas ◽  
Shannon M. Clarke ◽  
Henrik Jeuthe ◽  
Rudiger Brauning ◽  
Timothy P. Bilton ◽  
...  

Arctic charr (Salvelinus alpinus) is a species of high economic value for the aquaculture industry, and of high ecological value due to its Holarctic distribution in both marine and freshwater environments. Novel genome sequencing approaches enable the study of population and quantitative genetic parameters even on species with limited or no prior genomic resources. Low coverage genotyping by sequencing (GBS) was applied in a selected strain of Arctic charr in Sweden originating from a landlocked freshwater population. For the needs of the current study, animals from year classes 2013 (171 animals, parental population) and 2017 (759 animals; 13 full sib families) were used as a template for identifying genome wide single nucleotide polymorphisms (SNPs). GBS libraries were constructed using the PstI and MspI restriction enzymes. Approximately 14.5K SNPs passed quality control and were used for estimating a genomic relationship matrix. Thereafter a wide range of analyses were conducted in order to gain insights regarding genetic diversity and investigate the efficiency of the genomic information for parentage assignment and breeding value estimation. Heterozygosity estimates for both year classes suggested a slight excess of heterozygotes. Furthermore, FST estimates among the families of year class 2017 ranged between 0.009 – 0.066. Principal components analysis (PCA) and discriminant analysis of principal components (DAPC) were applied aiming to identify the existence of genetic clusters among the studied population. Results obtained were in accordance with pedigree records allowing the identification of individual families. Additionally, DNA parentage verification was performed, with results in accordance with the pedigree records with the exception of a putative dam where full sib genotypes suggested a potential recording error. Breeding value estimation for juvenile growth through the usage of the estimated genomic relationship matrix clearly outperformed the pedigree equivalent in terms of prediction accuracy (0.51 opposed to 0.31). Overall, low coverage GBS has proven to be a cost-effective genotyping platform that is expected to boost the selection efficiency of the Arctic charr breeding program.


2019 ◽  
Vol 51 (1) ◽  
Author(s):  
Øyvind Nordbø ◽  
Arne B. Gjuvsland ◽  
Leiv Sigbjørn Eikje ◽  
Theo Meuwissen

Abstract Background The main aim of single-step genomic predictions was to facilitate optimal selection in populations consisting of both genotyped and non-genotyped individuals. However, in spite of intensive research, biases still occur, which make it difficult to perform optimal selection across groups of animals. The objective of this study was to investigate whether incomplete genotype datasets with errors could be a potential source of level-bias between genotyped and non-genotyped animals and between animals genotyped on different single nucleotide polymorphism (SNP) panels in single-step genomic predictions. Results Incomplete and erroneous genotypes of young animals caused biases in breeding values between groups of animals. Systematic noise or missing data for less than 1% of the SNPs in the genotype data had substantial effects on the differences in breeding values between genotyped and non-genotyped animals, and between animals genotyped on different chips. The breeding values of young genotyped individuals were biased upward, and the magnitude was up to 0.8 genetic standard deviations, compared with breeding values of non-genotyped individuals. Similarly, the magnitude of a small value added to the diagonal of the genomic relationship matrix affected the level of average breeding values between groups of genotyped and non-genotyped animals. Cross-validation accuracies and regression coefficients were not sensitive to these factors. Conclusions Because, historically, different SNP chips have been used for genotyping different parts of a population, fine-tuning of imputation within and across SNP chips and handling of missing genotypes are crucial for reducing bias. Although all the SNPs used for estimating breeding values are present on the chip used for genotyping young animals, incompleteness and some genotype errors might lead to level-biases in breeding values.


2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 49-50
Author(s):  
Yvette Steyn ◽  
Daniela Lourenco ◽  
Ignacy Misztal

Abstract Multi-breed evaluations have the advantage of increasing the size of the reference population for genomic evaluations and are quite simple; however, combining breeds usually have a negative impact on prediction accuracy. The aim of this study was to evaluate the use of a multi-breed genomic relationship matrix (G), where SNP for each breed are non-shared. The multi-breed G is set assuming known genotypes for one breed and missing genotypes for the remaining breeds. This setup may avoid spurious IBS relationships between breeds and considers breed-specific allele frequencies. This scenario was contrasted to multi-breed evaluations where all SNP are shared, i.e., the same SNP, and to single-breed evaluations. Different SNP densities, namely 9k and 45k, and different effective population sizes (Ne) were tested. Five breeds mimicking recent beef cattle populations that diverged from the same historical population were simulated using different selection criteria. It was assumed that QTL effects were the same over all breeds. For the recent population, generations 1 to 9 had approximately half of the animals genotyped, whereas all 1200 animals were genotyped in generation 10. Genotyped animals in generation 10 were set as validation; therefore, each breed had a validation set. Analysis were performed using single-step GBLUP (ssGBLUP). Prediction accuracy was calculated as correlation between true (T) and genomic estimated (GE) BV. Accuracies of GEBV were lower for the larger Ne and low SNP density. All three scenarios using 45K resulted in similar accuracies, suggesting that the marker density is high enough to account for relationships and linkage disequilibrium with QTL. A shared multi-breed evaluation using 9K resulted in a decrease of accuracy of 0.08 for a smaller Ne and 0.11 for a larger Ne. This loss was mostly avoided when markers were treated as non-shared within the same genomic relationship matrix.


2021 ◽  
Author(s):  
Mitchell J. Feldmann ◽  
Hans-Peter Piepho ◽  
Steven J. Knapp

Many important traits in plants, animals, and microbes are polygenic and are therefore difficult to improve through traditional marker?assisted selection. Genomic prediction addresses this by enabling the inclusion of all genetic data in a mixed model framework. The main method for predicting breeding values is genomic best linear unbiased prediction (GBLUP), which uses the realized genomic relationship or kinship matrix (K) to connect genotype to phenotype. The use of relationship matrices allows information to be shared for estimating the genetic values for observed entries and predicting genetic values for unobserved entries. One of the key parameters of such models is genomic heritability (h2g), or the variance of a trait associated with a genome-wide sample of DNA polymorphisms. Here we discuss the relationship between several common methods for calculating the genomic relationship matrix and propose a new matrix based on the average semivariance that yields accurate estimates of genomic variance in the observed population regardless of the focal population quality as well as accurate breeding value predictions in unobserved samples. Notably, our proposed method is highly similar to the approach presented by Legarra (2016) despite different mathematical derivations and statistical perspectives and only deviates from the classic approach presented in VanRaden (2008) by a scaling factor. With current approaches, we found that the genomic heritability tends to be either over- or underestimated depending on the scaling and centering applied to the marker matrix (Z), the value of the average diagonal element of K, and the assortment of alleles and heterozygosity (H) in the observed population and that, unlike its predecessors, our newly proposed kinship matrix KASV yields accurate estimates of h2g in the observed population, generalizes to larger populations, and produces BLUPs equivalent to common methods in plants and animals.


Sign in / Sign up

Export Citation Format

Share Document