PSVIII-27 A weighted genomic relationship matrix based on FST prioritized SNPs for genomic selection

Abstract Using low to moderate density SNP marker panels, a substantial increase in accuracy was achieved. The dramatic increase in the number of identified variants due to advances in next generation sequencing was expected to significantly increase the accuracy of genomic selection (GS). Unfortunately, little to no improvement was observed. For mixed model-based approaches, using all SNPs in the panel to compute the observed relationship matrix (G) will not increase accuracy as the additive relationships between individuals can be accurately estimated using a much smaller number of markers. Due to these limitations, variant prioritization has become a necessity to improve accuracy. Further, it has been shown that weighting SNPs when calculating G could be effective in improving the accuracy of GS. FST as a measure population differential has been successfully used to identify genome segments under selection pressure. Consequently, FST could be used to both prioritize SNPs and to derive their relative weight in the calculation of the genomic relationship matrix. A population of 15,000 animals genotyped for 400K SNP markers uniformly-distributed along 10 chromosomes was simulated. A trait with heritability 0.3 genetically controlled by two hundred QTL was generated. The top 20K SNPs based on their FST scores were used either alone or with the remaining 380K SNPs to compute G with or without weighting. When only the top 20K SNPs were used to compute G, two scenarios were considered: 1) equal weights for all SNPs or 2) weights proportional to the SNP FST scores. When all 400K SNP markers were used, different weighting scenarios were evaluated. The results clearly showed that prioritizing SNP markers based on their FST score and using the latter to compute relative weights has increased the genetic similarity between training and validations animals and resulted in more than 5% improvement in the accuracy of GS.

Download Full-text

330 A hybrid model for genomic selection using prioritized SNPs based on FST scores in the presence of non-genotyped animals

Journal of Animal Science ◽

10.1093/jas/skz258.102 ◽

2019 ◽

Vol 97 (Supplement_3) ◽

pp. 51-51

Author(s):

Sajjad Toghiani ◽

Ling-Yun Chang ◽

El H Hay ◽

Andrew J Roberts ◽

Samuel E Aggrey ◽

...

Keyword(s):

Genomic Selection ◽

Hybrid Approach ◽

Computational Cost ◽

Simulated Data ◽

Snp Markers ◽

Genomic Relationship Matrix ◽

Polygenic Effect ◽

Relationship Matrix ◽

Continuous Increase ◽

Missing Genotypes

Abstract The dramatic advancement in genotyping technology has greatly reduced the complexity and cost of genotyping. The continuous increase in the density of marker panels is resulting in little to no improvement in the accuracy of genomic selection. Direct inversion of the genomic relationship matrix is infeasible for some livestock populations due to the excessive computational cost. In addition, most animals in genetic evaluation programs are non-genotyped. Including these animals in a genomic evaluation requires the imputation of the missing genotypes when using regression methods. To overcome these challenges, a hybrid approach is proposed. This approach fits a subset of SNP markers selected based on FST scores and a classical polygenic effect. The method was first tested using only genotyped animals and then extended to accommodate non-genotyped animals. The proposed approach was evaluated using simulated data for a trait with heritability of 0.1 and 0.4 and weaning weight in a crossbred beef cattle population. When all animals were genotyped, the hybrid approach using only 2.5% of prioritized SNPs exceeded the prediction accuracies of BayesB, BayesC, and GBLUP by more than 7%. When non-genotyped animals were incorporated, the proposed approach significantly outperformed ss-GBLUP method in terms of prediction accuracy under both simulated heritability scenarios. Although the results seem to depend on the genetic complexity of the trait, the proposed approach resulted in higher prediction accuracies than current methods. Furthermore, its computational costs in terms of CPU time and peak memory are substantially lower than the current methods.

Download Full-text

Using the genomic relationship matrix to predict the accuracy of genomic selection

Journal of Animal Breeding and Genetics ◽

10.1111/j.1439-0388.2011.00964.x ◽

2011 ◽

Vol 128 (6) ◽

pp. 409-421 ◽

Cited By ~ 170

Author(s):

M.E. Goddard ◽

B.J. Hayes ◽

T.H.E. Meuwissen

Keyword(s):

Genomic Selection ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship

Download Full-text

A Weighted Genomic Relationship Matrix Based on Fixation Index (FST) Prioritized SNPs for Genomic Selection

Genes ◽

10.3390/genes10110922 ◽

2019 ◽

Vol 10 (11) ◽

pp. 922

Author(s):

Ling-Yun Chang ◽

Sajjad Toghiani ◽

El Hamidi Hay ◽

Samuel E. Aggrey ◽

Romdhane Rekaya

Keyword(s):

Genomic Selection ◽

Statistical Power ◽

Fixation Index ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Nucleotide Polymorphisms ◽

Genomic Relationship ◽

Single Nucleotide ◽

Relative Contribution ◽

Estimation Of Variance

A dramatic increase in the density of marker panels has been expected to increase the accuracy of genomic selection (GS), unfortunately, little to no improvement has been observed. By including all variants in the association model, the dimensionality of the problem should be dramatically increased, and it could undoubtedly reduce the statistical power. Using all Single nucleotide polymorphisms (SNPs) to compute the genomic relationship matrix (G) does not necessarily increase accuracy as the additive relationships can be accurately estimated using a much smaller number of markers. Due to these limitations, variant prioritization has become a necessity to improve accuracy. The fixation index (FST) as a measure of population differentiation has been used to identify genome segments and variants under selection pressure. Using prioritized variants has increased the accuracy of GS. Additionally, FST can be used to weight the relative contribution of prioritized SNPs in computing G. In this study, relative weights based on FST scores were developed and incorporated into the calculation of G and their impact on the estimation of variance components and accuracy was assessed. The results showed that prioritizing SNPs based on their FST scores resulted in an increase in the genetic similarity between training and validation animals and improved the accuracy of GS by more than 5%.

Download Full-text

Genomic Heritability: A Ragged Diagonal Between Bias and Variance

10.1101/2021.09.19.460999 ◽

2021 ◽

Author(s):

Mitchell J. Feldmann ◽

Hans-Peter Piepho ◽

Steven J. Knapp

Keyword(s):

Mixed Model ◽

Dna Polymorphisms ◽

Breeding Value ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Model Framework ◽

Kinship Matrix ◽

Genomic Heritability ◽

A Genome

Many important traits in plants, animals, and microbes are polygenic and are therefore difficult to improve through traditional marker?assisted selection. Genomic prediction addresses this by enabling the inclusion of all genetic data in a mixed model framework. The main method for predicting breeding values is genomic best linear unbiased prediction (GBLUP), which uses the realized genomic relationship or kinship matrix (K) to connect genotype to phenotype. The use of relationship matrices allows information to be shared for estimating the genetic values for observed entries and predicting genetic values for unobserved entries. One of the key parameters of such models is genomic heritability (h2g), or the variance of a trait associated with a genome-wide sample of DNA polymorphisms. Here we discuss the relationship between several common methods for calculating the genomic relationship matrix and propose a new matrix based on the average semivariance that yields accurate estimates of genomic variance in the observed population regardless of the focal population quality as well as accurate breeding value predictions in unobserved samples. Notably, our proposed method is highly similar to the approach presented by Legarra (2016) despite different mathematical derivations and statistical perspectives and only deviates from the classic approach presented in VanRaden (2008) by a scaling factor. With current approaches, we found that the genomic heritability tends to be either over- or underestimated depending on the scaling and centering applied to the marker matrix (Z), the value of the average diagonal element of K, and the assortment of alleles and heterozygosity (H) in the observed population and that, unlike its predecessors, our newly proposed kinship matrix KASV yields accurate estimates of h2g in the observed population, generalizes to larger populations, and produces BLUPs equivalent to common methods in plants and animals.

Download Full-text

Accuracy of genomic BLUP when considering a genomic relationship matrix based on the number of the largest eigenvalues: a simulation study

Genetics Selection Evolution ◽

10.1186/s12711-019-0516-0 ◽

2019 ◽

Vol 51 (1) ◽

Cited By ~ 3

Author(s):

Ivan Pocrnic ◽

Daniela A. L. Lourenco ◽

Yutaka Masuda ◽

Ignacy Misztal

Keyword(s):

Genomic Selection ◽

Large Fraction ◽

Genomic Variation ◽

Eigenvalue Decomposition ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Genomic Information ◽

Phenotypic Information ◽

Largest Eigenvalues

Abstract Background The dimensionality of genomic information is limited by the number of independent chromosome segments (Me), which is a function of the effective population size. This dimensionality can be determined approximately by singular value decomposition of the gene content matrix, by eigenvalue decomposition of the genomic relationship matrix (GRM), or by the number of core animals in the algorithm for proven and young (APY) that maximizes the accuracy of genomic prediction. In the latter, core animals act as proxies to linear combinations of Me. Field studies indicate that a moderate accuracy of genomic selection is achieved with a small dataset, but that further improvement of the accuracy requires much more data. When only one quarter of the optimal number of core animals are used in the APY algorithm, the accuracy of genomic selection is only slightly below the optimal value. This suggests that genomic selection works on clusters of Me. Results The simulation included datasets with different population sizes and amounts of phenotypic information. Computations were done by genomic best linear unbiased prediction (GBLUP) with selected eigenvalues and corresponding eigenvectors of the GRM set to zero. About four eigenvalues in the GRM explained 10% of the genomic variation, and less than 2% of the total eigenvalues explained 50% of the genomic variation. With limited phenotypic information, the accuracy of GBLUP was close to the peak where most of the smallest eigenvalues were set to zero. With a large amount of phenotypic information, accuracy increased as smaller eigenvalues were added. Conclusions A small amount of phenotypic data is sufficient to estimate only the effects of the largest eigenvalues and the associated eigenvectors that contain a large fraction of the genomic information, and a very large amount of data is required to estimate the remaining eigenvalues that account for a limited amount of genomic information. Core animals in the APY algorithm act as proxies of almost the same number of eigenvalues. By using an eigenvalues-based approach, it was possible to explain why the moderate accuracy of genomic selection based on small datasets only increases slowly as more data are added.

Download Full-text

Simulation studies to optimize genomic selection in honey bees

Genetics Selection Evolution ◽

10.1186/s12711-021-00654-x ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Richard Bernstein ◽

Manuel Du ◽

Andreas Hoppe ◽

Kaspar Bienefeld

Keyword(s):

Genomic Selection ◽

Genetic Gain ◽

Honey Bees ◽

Reference Population ◽

Breeding Program ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Genomic Breeding ◽

Breeding Values

Abstract Background With the completion of a single nucleotide polymorphism (SNP) chip for honey bees, the technical basis of genomic selection is laid. However, for its application in practice, methods to estimate genomic breeding values need to be adapted to the specificities of the genetics and breeding infrastructure of this species. Drone-producing queens (DPQ) are used for mating control, and usually, they head non-phenotyped colonies that will be placed on mating stations. Breeding queens (BQ) head colonies that are intended to be phenotyped and used to produce new queens. Our aim was to evaluate different breeding program designs for the initiation of genomic selection in honey bees. Methods Stochastic simulations were conducted to evaluate the quality of the estimated breeding values. We developed a variation of the genomic relationship matrix to include genotypes of DPQ and tested different sizes of the reference population. The results were used to estimate genetic gain in the initial selection cycle of a genomic breeding program. This program was run over six years, and different numbers of genotyped queens per year were considered. Resources could be allocated to increase the reference population, or to perform genomic preselection of BQ and/or DPQ. Results Including the genotypes of 5000 phenotyped BQ increased the accuracy of predictions of breeding values by up to 173%, depending on the size of the reference population and the trait considered. To initiate a breeding program, genotyping a minimum number of 1000 queens per year is required. In this case, genetic gain was highest when genomic preselection of DPQ was coupled with the genotyping of 10–20% of the phenotyped BQ. For maximum genetic gain per used genotype, more than 2500 genotyped queens per year and preselection of all BQ and DPQ are required. Conclusions This study shows that the first priority in a breeding program is to genotype phenotyped BQ to obtain a sufficiently large reference population, which allows successful genomic preselection of queens. To maximize genetic gain, DPQ should be preselected, and their genotypes included in the genomic relationship matrix. We suggest, that the developed methods for genomic prediction are suitable for implementation in genomic honey bee breeding programs.

Download Full-text

Genomic Relationship Matrix for Correcting Pedigree Errors in Breeding Populations: Impact on Genetic Parameters and Genomic Selection Accuracy

Crop Science ◽

10.2135/cropsci2012.12.0673 ◽

2014 ◽

Vol 54 (3) ◽

pp. 1115-1123 ◽

Cited By ~ 38

Author(s):

Patricio R. Munoz ◽

Marcio F. R. Resende ◽

Dudley A. Huber ◽

Tania Quesada ◽

Marcos D. V. Resende ◽

...

Keyword(s):

Genomic Selection ◽

Genetic Parameters ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Selection Accuracy ◽

Breeding Populations ◽

Pedigree Errors

Download Full-text

Genomic selection using a realized genomic relationship matrix in a Pinus taeda L. cloned population

BMC Proceedings ◽

10.1186/1753-6561-5-s7-p60 ◽

2011 ◽

Vol 5 (Suppl 7) ◽

pp. P60 ◽

Cited By ~ 1

Author(s):

Jaime Zapata-Valenzuela ◽

Fikret Isik ◽

Christian Maltecca ◽

Jill Wegryzn ◽

David Neale ◽

...

Keyword(s):

Genomic Selection ◽

Pinus Taeda ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Pinus Taeda L

Download Full-text

The distribution of SNP marker effects for faecal worm egg count in sheep, and the feasibility of using these markers to predict genetic merit for resistance to worm infections

Genetics Research ◽

10.1017/s0016672311000097 ◽

2011 ◽

Vol 93 (3) ◽

pp. 203-219 ◽

Cited By ~ 51

Author(s):

KATHRYN E. KEMPER ◽

DAVID L. EMERY ◽

STEPHEN C. BISHOP ◽

HUTTON ODDY ◽

BENJAMIN J. HAYES ◽

...

Keyword(s):

Complex Traits ◽

Sampling Error ◽

Snp Markers ◽

Snp Marker ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Phenotypic Variance ◽

Breeding Values ◽

Genetic Merit ◽

Independent Population

SummaryGenetic resistance to gastrointestinal worms is a complex trait of great importance in both livestock and humans. In order to gain insights into the genetic architecture of this trait, a mixed breed population of sheep was artificially infected with Trichostrongylus colubriformis (n=3326) and then Haemonchus contortus (n=2669) to measure faecal worm egg count (WEC). The population was genotyped with the Illumina OvineSNP50 BeadChip and 48 640 single nucleotide polymorphism (SNP) markers passed the quality controls. An independent population of 316 sires of mixed breeds with accurate estimated breeding values for WEC were genotyped for the same SNP to assess the results obtained from the first population. We used principal components from the genomic relationship matrix among genotyped individuals to account for population stratification, and a novel approach to directly account for the sampling error associated with each SNP marker regression. The largest marker effects were estimated to explain an average of 0·48% (T. colubriformis) or 0·08% (H. contortus) of the phenotypic variance in WEC. These effects are small but consistent with results from other complex traits. We also demonstrated that methods which use all markers simultaneously can successfully predict genetic merit for resistance to worms, despite the small effects of individual markers. Correlations of genomic predictions with breeding values of the industry sires reached a maximum of 0·32. We estimate that effective across-breed predictions of genetic merit with multi-breed populations will require an average marker spacing of approximately 10 kbp.

Download Full-text

A recursive algorithm for decomposition and creation of the inverse of the genomic relationship matrix

Journal of Dairy Science ◽

10.3168/jds.2011-5249 ◽

2012 ◽

Vol 95 (10) ◽

pp. 6093-6102 ◽

Cited By ~ 6

Author(s):

P. Faux ◽

N. Gengler ◽

I. Misztal

Keyword(s):

Recursive Algorithm ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship

Download Full-text