scholarly journals PSVIII-27 A weighted genomic relationship matrix based on FST prioritized SNPs for genomic selection

2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 262-262
Author(s):  
Ling-Yun Chang ◽  
Sajjad Toghiani ◽  
E L Hamidi Hay ◽  
Samuel E Aggrey ◽  
Romdhane Rekaya

Abstract Using low to moderate density SNP marker panels, a substantial increase in accuracy was achieved. The dramatic increase in the number of identified variants due to advances in next generation sequencing was expected to significantly increase the accuracy of genomic selection (GS). Unfortunately, little to no improvement was observed. For mixed model-based approaches, using all SNPs in the panel to compute the observed relationship matrix (G) will not increase accuracy as the additive relationships between individuals can be accurately estimated using a much smaller number of markers. Due to these limitations, variant prioritization has become a necessity to improve accuracy. Further, it has been shown that weighting SNPs when calculating G could be effective in improving the accuracy of GS. FST as a measure population differential has been successfully used to identify genome segments under selection pressure. Consequently, FST could be used to both prioritize SNPs and to derive their relative weight in the calculation of the genomic relationship matrix. A population of 15,000 animals genotyped for 400K SNP markers uniformly-distributed along 10 chromosomes was simulated. A trait with heritability 0.3 genetically controlled by two hundred QTL was generated. The top 20K SNPs based on their FST scores were used either alone or with the remaining 380K SNPs to compute G with or without weighting. When only the top 20K SNPs were used to compute G, two scenarios were considered: 1) equal weights for all SNPs or 2) weights proportional to the SNP FST scores. When all 400K SNP markers were used, different weighting scenarios were evaluated. The results clearly showed that prioritizing SNP markers based on their FST score and using the latter to compute relative weights has increased the genetic similarity between training and validations animals and resulted in more than 5% improvement in the accuracy of GS.

2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 51-51
Author(s):  
Sajjad Toghiani ◽  
Ling-Yun Chang ◽  
El H Hay ◽  
Andrew J Roberts ◽  
Samuel E Aggrey ◽  
...  

Abstract The dramatic advancement in genotyping technology has greatly reduced the complexity and cost of genotyping. The continuous increase in the density of marker panels is resulting in little to no improvement in the accuracy of genomic selection. Direct inversion of the genomic relationship matrix is infeasible for some livestock populations due to the excessive computational cost. In addition, most animals in genetic evaluation programs are non-genotyped. Including these animals in a genomic evaluation requires the imputation of the missing genotypes when using regression methods. To overcome these challenges, a hybrid approach is proposed. This approach fits a subset of SNP markers selected based on FST scores and a classical polygenic effect. The method was first tested using only genotyped animals and then extended to accommodate non-genotyped animals. The proposed approach was evaluated using simulated data for a trait with heritability of 0.1 and 0.4 and weaning weight in a crossbred beef cattle population. When all animals were genotyped, the hybrid approach using only 2.5% of prioritized SNPs exceeded the prediction accuracies of BayesB, BayesC, and GBLUP by more than 7%. When non-genotyped animals were incorporated, the proposed approach significantly outperformed ss-GBLUP method in terms of prediction accuracy under both simulated heritability scenarios. Although the results seem to depend on the genetic complexity of the trait, the proposed approach resulted in higher prediction accuracies than current methods. Furthermore, its computational costs in terms of CPU time and peak memory are substantially lower than the current methods.


Genes ◽  
2019 ◽  
Vol 10 (11) ◽  
pp. 922
Author(s):  
Ling-Yun Chang ◽  
Sajjad Toghiani ◽  
El Hamidi Hay ◽  
Samuel E. Aggrey ◽  
Romdhane Rekaya

A dramatic increase in the density of marker panels has been expected to increase the accuracy of genomic selection (GS), unfortunately, little to no improvement has been observed. By including all variants in the association model, the dimensionality of the problem should be dramatically increased, and it could undoubtedly reduce the statistical power. Using all Single nucleotide polymorphisms (SNPs) to compute the genomic relationship matrix (G) does not necessarily increase accuracy as the additive relationships can be accurately estimated using a much smaller number of markers. Due to these limitations, variant prioritization has become a necessity to improve accuracy. The fixation index (FST) as a measure of population differentiation has been used to identify genome segments and variants under selection pressure. Using prioritized variants has increased the accuracy of GS. Additionally, FST can be used to weight the relative contribution of prioritized SNPs in computing G. In this study, relative weights based on FST scores were developed and incorporated into the calculation of G and their impact on the estimation of variance components and accuracy was assessed. The results showed that prioritizing SNPs based on their FST scores resulted in an increase in the genetic similarity between training and validation animals and improved the accuracy of GS by more than 5%.


2021 ◽  
Author(s):  
Mitchell J. Feldmann ◽  
Hans-Peter Piepho ◽  
Steven J. Knapp

Many important traits in plants, animals, and microbes are polygenic and are therefore difficult to improve through traditional marker?assisted selection. Genomic prediction addresses this by enabling the inclusion of all genetic data in a mixed model framework. The main method for predicting breeding values is genomic best linear unbiased prediction (GBLUP), which uses the realized genomic relationship or kinship matrix (K) to connect genotype to phenotype. The use of relationship matrices allows information to be shared for estimating the genetic values for observed entries and predicting genetic values for unobserved entries. One of the key parameters of such models is genomic heritability (h2g), or the variance of a trait associated with a genome-wide sample of DNA polymorphisms. Here we discuss the relationship between several common methods for calculating the genomic relationship matrix and propose a new matrix based on the average semivariance that yields accurate estimates of genomic variance in the observed population regardless of the focal population quality as well as accurate breeding value predictions in unobserved samples. Notably, our proposed method is highly similar to the approach presented by Legarra (2016) despite different mathematical derivations and statistical perspectives and only deviates from the classic approach presented in VanRaden (2008) by a scaling factor. With current approaches, we found that the genomic heritability tends to be either over- or underestimated depending on the scaling and centering applied to the marker matrix (Z), the value of the average diagonal element of K, and the assortment of alleles and heterozygosity (H) in the observed population and that, unlike its predecessors, our newly proposed kinship matrix KASV yields accurate estimates of h2g in the observed population, generalizes to larger populations, and produces BLUPs equivalent to common methods in plants and animals.


2019 ◽  
Vol 51 (1) ◽  
Author(s):  
Ivan Pocrnic ◽  
Daniela A. L. Lourenco ◽  
Yutaka Masuda ◽  
Ignacy Misztal

Abstract Background The dimensionality of genomic information is limited by the number of independent chromosome segments (Me), which is a function of the effective population size. This dimensionality can be determined approximately by singular value decomposition of the gene content matrix, by eigenvalue decomposition of the genomic relationship matrix (GRM), or by the number of core animals in the algorithm for proven and young (APY) that maximizes the accuracy of genomic prediction. In the latter, core animals act as proxies to linear combinations of Me. Field studies indicate that a moderate accuracy of genomic selection is achieved with a small dataset, but that further improvement of the accuracy requires much more data. When only one quarter of the optimal number of core animals are used in the APY algorithm, the accuracy of genomic selection is only slightly below the optimal value. This suggests that genomic selection works on clusters of Me. Results The simulation included datasets with different population sizes and amounts of phenotypic information. Computations were done by genomic best linear unbiased prediction (GBLUP) with selected eigenvalues and corresponding eigenvectors of the GRM set to zero. About four eigenvalues in the GRM explained 10% of the genomic variation, and less than 2% of the total eigenvalues explained 50% of the genomic variation. With limited phenotypic information, the accuracy of GBLUP was close to the peak where most of the smallest eigenvalues were set to zero. With a large amount of phenotypic information, accuracy increased as smaller eigenvalues were added. Conclusions A small amount of phenotypic data is sufficient to estimate only the effects of the largest eigenvalues and the associated eigenvectors that contain a large fraction of the genomic information, and a very large amount of data is required to estimate the remaining eigenvalues that account for a limited amount of genomic information. Core animals in the APY algorithm act as proxies of almost the same number of eigenvalues. By using an eigenvalues-based approach, it was possible to explain why the moderate accuracy of genomic selection based on small datasets only increases slowly as more data are added.


2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Richard Bernstein ◽  
Manuel Du ◽  
Andreas Hoppe ◽  
Kaspar Bienefeld

Abstract Background With the completion of a single nucleotide polymorphism (SNP) chip for honey bees, the technical basis of genomic selection is laid. However, for its application in practice, methods to estimate genomic breeding values need to be adapted to the specificities of the genetics and breeding infrastructure of this species. Drone-producing queens (DPQ) are used for mating control, and usually, they head non-phenotyped colonies that will be placed on mating stations. Breeding queens (BQ) head colonies that are intended to be phenotyped and used to produce new queens. Our aim was to evaluate different breeding program designs for the initiation of genomic selection in honey bees. Methods Stochastic simulations were conducted to evaluate the quality of the estimated breeding values. We developed a variation of the genomic relationship matrix to include genotypes of DPQ and tested different sizes of the reference population. The results were used to estimate genetic gain in the initial selection cycle of a genomic breeding program. This program was run over six years, and different numbers of genotyped queens per year were considered. Resources could be allocated to increase the reference population, or to perform genomic preselection of BQ and/or DPQ. Results Including the genotypes of 5000 phenotyped BQ increased the accuracy of predictions of breeding values by up to 173%, depending on the size of the reference population and the trait considered. To initiate a breeding program, genotyping a minimum number of 1000 queens per year is required. In this case, genetic gain was highest when genomic preselection of DPQ was coupled with the genotyping of 10–20% of the phenotyped BQ. For maximum genetic gain per used genotype, more than 2500 genotyped queens per year and preselection of all BQ and DPQ are required. Conclusions This study shows that the first priority in a breeding program is to genotype phenotyped BQ to obtain a sufficiently large reference population, which allows successful genomic preselection of queens. To maximize genetic gain, DPQ should be preselected, and their genotypes included in the genomic relationship matrix. We suggest, that the developed methods for genomic prediction are suitable for implementation in genomic honey bee breeding programs.


Crop Science ◽  
2014 ◽  
Vol 54 (3) ◽  
pp. 1115-1123 ◽  
Author(s):  
Patricio R. Munoz ◽  
Marcio F. R. Resende ◽  
Dudley A. Huber ◽  
Tania Quesada ◽  
Marcos D. V. Resende ◽  
...  

2011 ◽  
Vol 5 (Suppl 7) ◽  
pp. P60 ◽  
Author(s):  
Jaime Zapata-Valenzuela ◽  
Fikret Isik ◽  
Christian Maltecca ◽  
Jill Wegryzn ◽  
David Neale ◽  
...  

2011 ◽  
Vol 93 (3) ◽  
pp. 203-219 ◽  
Author(s):  
KATHRYN E. KEMPER ◽  
DAVID L. EMERY ◽  
STEPHEN C. BISHOP ◽  
HUTTON ODDY ◽  
BENJAMIN J. HAYES ◽  
...  

SummaryGenetic resistance to gastrointestinal worms is a complex trait of great importance in both livestock and humans. In order to gain insights into the genetic architecture of this trait, a mixed breed population of sheep was artificially infected with Trichostrongylus colubriformis (n=3326) and then Haemonchus contortus (n=2669) to measure faecal worm egg count (WEC). The population was genotyped with the Illumina OvineSNP50 BeadChip and 48 640 single nucleotide polymorphism (SNP) markers passed the quality controls. An independent population of 316 sires of mixed breeds with accurate estimated breeding values for WEC were genotyped for the same SNP to assess the results obtained from the first population. We used principal components from the genomic relationship matrix among genotyped individuals to account for population stratification, and a novel approach to directly account for the sampling error associated with each SNP marker regression. The largest marker effects were estimated to explain an average of 0·48% (T. colubriformis) or 0·08% (H. contortus) of the phenotypic variance in WEC. These effects are small but consistent with results from other complex traits. We also demonstrated that methods which use all markers simultaneously can successfully predict genetic merit for resistance to worms, despite the small effects of individual markers. Correlations of genomic predictions with breeding values of the industry sires reached a maximum of 0·32. We estimate that effective across-breed predictions of genetic merit with multi-breed populations will require an average marker spacing of approximately 10 kbp.


Sign in / Sign up

Export Citation Format

Share Document