Accuracy of genomic BLUP when considering a genomic relationship matrix based on the number of the largest eigenvalues: a simulation study

Abstract Background The dimensionality of genomic information is limited by the number of independent chromosome segments (Me), which is a function of the effective population size. This dimensionality can be determined approximately by singular value decomposition of the gene content matrix, by eigenvalue decomposition of the genomic relationship matrix (GRM), or by the number of core animals in the algorithm for proven and young (APY) that maximizes the accuracy of genomic prediction. In the latter, core animals act as proxies to linear combinations of Me. Field studies indicate that a moderate accuracy of genomic selection is achieved with a small dataset, but that further improvement of the accuracy requires much more data. When only one quarter of the optimal number of core animals are used in the APY algorithm, the accuracy of genomic selection is only slightly below the optimal value. This suggests that genomic selection works on clusters of Me. Results The simulation included datasets with different population sizes and amounts of phenotypic information. Computations were done by genomic best linear unbiased prediction (GBLUP) with selected eigenvalues and corresponding eigenvectors of the GRM set to zero. About four eigenvalues in the GRM explained 10% of the genomic variation, and less than 2% of the total eigenvalues explained 50% of the genomic variation. With limited phenotypic information, the accuracy of GBLUP was close to the peak where most of the smallest eigenvalues were set to zero. With a large amount of phenotypic information, accuracy increased as smaller eigenvalues were added. Conclusions A small amount of phenotypic data is sufficient to estimate only the effects of the largest eigenvalues and the associated eigenvectors that contain a large fraction of the genomic information, and a very large amount of data is required to estimate the remaining eigenvalues that account for a limited amount of genomic information. Core animals in the APY algorithm act as proxies of almost the same number of eigenvalues. By using an eigenvalues-based approach, it was possible to explain why the moderate accuracy of genomic selection based on small datasets only increases slowly as more data are added.

Download Full-text

34 Dimensionality of Genomic Information and Its Impact on GWA and Variant Selection: A Simulation Study

Journal of Animal Science ◽

10.1093/jas/skab235.033 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 20-20

Author(s):

Sungbong Jang ◽

Shogo Tsuruta ◽

Natalia Leite ◽

Ignacy Misztal ◽

Daniela Lourenco

Keyword(s):

Effect Size ◽

Sequence Data ◽

Eigenvalue Decomposition ◽

P Value ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Variant Selection ◽

Genomic Information ◽

Largest Eigenvalues ◽

The Impact

Abstract The ability to identify true-positive variants increases as more genotyped animals are available. Although thousands of animals can be genotyped, the dimensionality of the genomic information is limited. Therefore, there is a certain number of animals that represent all chromosome segments (Me) segregating in the population. The number of Me can be approximated from the eigenvalue decomposition of the genomic relationship matrix (G). Thus, the limited dimensionality may help to identify the number of animals to be used in genome-wide association (GWA). The first objective of this study was to examine different discovery set sizes for GWA, with set sizes based on the number of largest eigenvalues explaining a certain proportion of variance in G. Additionally, we investigated the impact of incorporating variants selected from different set sizes to regular SNP chip used for genomic prediction. Sequence data were simulated that contained 500k SNP and 2k QTL, where the genetic variance was fully explained by QTL. The GWA was conducted using the number of genotyped animals equal to the number of largest eigenvalues of G (EIG) explaining 50, 60, 70, 80, 90, 95, 98, and 99 percent of the variance in G. Significant SNP had a p-value lower than 0.05 with Bonferroni correction. Further, SNP with the largest effect size (top10, 100, 500, 1k, 2k, and 4k) were also selected to be incorporated into the 50k regular chip. Genomic predictions using the 50k combined with selected SNP were conducted using single-step GBLUP (ssGBLUP). Using the number of animals corresponding to at least EIG98 enabled the identification of the largest effect size QTL. The greatest accuracy of prediction was obtained when the top 2k SNP was combined to the 50k chip. The dimensionality of genomic information should be taken into account for variant selection in GWAS.

Download Full-text

Using the genomic relationship matrix to predict the accuracy of genomic selection

Journal of Animal Breeding and Genetics ◽

10.1111/j.1439-0388.2011.00964.x ◽

2011 ◽

Vol 128 (6) ◽

pp. 409-421 ◽

Cited By ~ 170

Author(s):

M.E. Goddard ◽

B.J. Hayes ◽

T.H.E. Meuwissen

Keyword(s):

Genomic Selection ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship

Download Full-text

Accuracy of genomic selection predictions for hip height in Brahman cattle using different relationship matrices

Pesquisa Agropecuária Brasileira ◽

10.1590/s0100-204x2018000600008 ◽

2018 ◽

Vol 53 (6) ◽

pp. 717-726 ◽

Cited By ~ 1

Author(s):

Michel Marques Farah ◽

Marina Rufino Salinas Fortes ◽

Matthew Kelly ◽

Laercio Ribeiro Porto-Neto ◽

Camila Tangari Meira ◽

...

Keyword(s):

Allele Frequency ◽

High Density ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Genomic Information ◽

Pedigree Data ◽

Snp Chip ◽

Numerator Relationship Matrix ◽

Brahman Cattle

Abstract: The objective of this work was to evaluate the effects of genomic information on the genetic evaluation of hip height in Brahman cattle using different matrices built from genomic and pedigree data. Hip height measurements from 1,695 animals, genotyped with high-density SNP chip or imputed from 50 K high-density SNP chip, were used. The numerator relationship matrix (NRM) was compared with the H matrix, which incorporated the NRM and genomic relationship (G) matrix simultaneously. The genotypes were used to estimate three versions of G: observed allele frequency (HGOF), average minor allele frequency (HGMF), and frequency of 0.5 for all markers (HG50). For matrix comparisons, animal data were either used in full or divided into calibration (80% older animals) and validation (20% younger animals) datasets. The accuracy values for the NRM, HGOF, and HG50 were 0.776, 0.813, and 0.594, respectively. The NRM and HGOF showed similar minor variances for diagonal and off-diagonal elements, as well as for estimated breeding values. The use of genomic information resulted in relationship estimates similar to those obtained based on pedigree; however, HGOF is the best option for estimating the genomic relationship matrix and results in a higher prediction accuracy. The ranking of the top 20% animals was very similar for all matrices, but the ranking within them varies depending on the method used.

Download Full-text

A Weighted Genomic Relationship Matrix Based on Fixation Index (FST) Prioritized SNPs for Genomic Selection

Genes ◽

10.3390/genes10110922 ◽

2019 ◽

Vol 10 (11) ◽

pp. 922

Author(s):

Ling-Yun Chang ◽

Sajjad Toghiani ◽

El Hamidi Hay ◽

Samuel E. Aggrey ◽

Romdhane Rekaya

Keyword(s):

Genomic Selection ◽

Statistical Power ◽

Fixation Index ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Nucleotide Polymorphisms ◽

Genomic Relationship ◽

Single Nucleotide ◽

Relative Contribution ◽

Estimation Of Variance

A dramatic increase in the density of marker panels has been expected to increase the accuracy of genomic selection (GS), unfortunately, little to no improvement has been observed. By including all variants in the association model, the dimensionality of the problem should be dramatically increased, and it could undoubtedly reduce the statistical power. Using all Single nucleotide polymorphisms (SNPs) to compute the genomic relationship matrix (G) does not necessarily increase accuracy as the additive relationships can be accurately estimated using a much smaller number of markers. Due to these limitations, variant prioritization has become a necessity to improve accuracy. The fixation index (FST) as a measure of population differentiation has been used to identify genome segments and variants under selection pressure. Using prioritized variants has increased the accuracy of GS. Additionally, FST can be used to weight the relative contribution of prioritized SNPs in computing G. In this study, relative weights based on FST scores were developed and incorporated into the calculation of G and their impact on the estimation of variance components and accuracy was assessed. The results showed that prioritizing SNPs based on their FST scores resulted in an increase in the genetic similarity between training and validation animals and improved the accuracy of GS by more than 5%.

Download Full-text

295 Parameter estimation under genomic selection

Journal of Animal Science ◽

10.1093/jas/skaa278.056 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 31-32

Author(s):

Ignacy Misztal

Keyword(s):

Parameter Estimation ◽

Genomic Selection ◽

Sparse Matrix ◽

Parameters Estimation ◽

Random Regression ◽

Genomic Relationship Matrix ◽

Correlated Response ◽

Relationship Matrix ◽

Genomic Information ◽

Over Time

Abstract Genetic parameters are important in animal breeding for many tasks, including as input to a model for genetic evaluation, to estimate genetic gain due to selection, and to estimate correlated response due to selection on major traits. Before the genomic era, parameter estimation was facilitated by sparse structure of mixed model equations. Methods such as AI REML with sparse matrix inversion or MCMC via Gibbs sampling could estimate parameters for populations exceeding 1 million animals. With genomic selection (GS) and single-step GBLUP, the genomic matrices are mostly dense, and costs of parameter estimation increased dramatically. The estimation with 20K genotyped animals can take many days. Details in matching pedigree and genomic information influence estimated parameters. Estimation without the genomic information when GS is practiced leads to biases due to genomic-preselection. Truncating data to too few generations or to only genotyped animals leads to additional biases by excluding data on which the selection was practiced. Current studies indicate strong declines in heritability due to GS. Regular models for parameter estimation compute parameters only for the base population. Models that trace changes of parameters over time, such as random regression model on year of birth or a multiple trait model treating times slices as separate traits, are very expensive. A good compromise in parameter estimation under GS is to use slices of only 2–3 generations, with genotypes of young animals removed. When complete populations are genotyped, estimations with large number of genotyped animals are possible either with a SNP model or with GBLUP (inversion of genomic relationship matrix by APY algorithm). For simple models, Method R can provide estimates for any data size. An indirect indication of changing parameters over time is reduced predictivity or lower genetic trend despite increased data. Parameter estimation in GS would benefit from new, efficient tools.

Download Full-text

Simulation studies to optimize genomic selection in honey bees

Genetics Selection Evolution ◽

10.1186/s12711-021-00654-x ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Richard Bernstein ◽

Manuel Du ◽

Andreas Hoppe ◽

Kaspar Bienefeld

Keyword(s):

Genomic Selection ◽

Genetic Gain ◽

Honey Bees ◽

Reference Population ◽

Breeding Program ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Genomic Breeding ◽

Breeding Values

Abstract Background With the completion of a single nucleotide polymorphism (SNP) chip for honey bees, the technical basis of genomic selection is laid. However, for its application in practice, methods to estimate genomic breeding values need to be adapted to the specificities of the genetics and breeding infrastructure of this species. Drone-producing queens (DPQ) are used for mating control, and usually, they head non-phenotyped colonies that will be placed on mating stations. Breeding queens (BQ) head colonies that are intended to be phenotyped and used to produce new queens. Our aim was to evaluate different breeding program designs for the initiation of genomic selection in honey bees. Methods Stochastic simulations were conducted to evaluate the quality of the estimated breeding values. We developed a variation of the genomic relationship matrix to include genotypes of DPQ and tested different sizes of the reference population. The results were used to estimate genetic gain in the initial selection cycle of a genomic breeding program. This program was run over six years, and different numbers of genotyped queens per year were considered. Resources could be allocated to increase the reference population, or to perform genomic preselection of BQ and/or DPQ. Results Including the genotypes of 5000 phenotyped BQ increased the accuracy of predictions of breeding values by up to 173%, depending on the size of the reference population and the trait considered. To initiate a breeding program, genotyping a minimum number of 1000 queens per year is required. In this case, genetic gain was highest when genomic preselection of DPQ was coupled with the genotyping of 10–20% of the phenotyped BQ. For maximum genetic gain per used genotype, more than 2500 genotyped queens per year and preselection of all BQ and DPQ are required. Conclusions This study shows that the first priority in a breeding program is to genotype phenotyped BQ to obtain a sufficiently large reference population, which allows successful genomic preselection of queens. To maximize genetic gain, DPQ should be preselected, and their genotypes included in the genomic relationship matrix. We suggest, that the developed methods for genomic prediction are suitable for implementation in genomic honey bee breeding programs.

Download Full-text

Genomic Relationship Matrix for Correcting Pedigree Errors in Breeding Populations: Impact on Genetic Parameters and Genomic Selection Accuracy

Crop Science ◽

10.2135/cropsci2012.12.0673 ◽

2014 ◽

Vol 54 (3) ◽

pp. 1115-1123 ◽

Cited By ~ 38

Author(s):

Patricio R. Munoz ◽

Marcio F. R. Resende ◽

Dudley A. Huber ◽

Tania Quesada ◽

Marcos D. V. Resende ◽

...

Keyword(s):

Genomic Selection ◽

Genetic Parameters ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Selection Accuracy ◽

Breeding Populations ◽

Pedigree Errors

Download Full-text

Genomic selection using a realized genomic relationship matrix in a Pinus taeda L. cloned population

BMC Proceedings ◽

10.1186/1753-6561-5-s7-p60 ◽

2011 ◽

Vol 5 (Suppl 7) ◽

pp. P60 ◽

Cited By ~ 1

Author(s):

Jaime Zapata-Valenzuela ◽

Fikret Isik ◽

Christian Maltecca ◽

Jill Wegryzn ◽

David Neale ◽

...

Keyword(s):

Genomic Selection ◽

Pinus Taeda ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Pinus Taeda L

Download Full-text

PSVIII-27 A weighted genomic relationship matrix based on FST prioritized SNPs for genomic selection

Journal of Animal Science ◽

10.1093/jas/skz258.533 ◽

2019 ◽

Vol 97 (Supplement_3) ◽

pp. 262-262

Author(s):

Ling-Yun Chang ◽

Sajjad Toghiani ◽

E L Hamidi Hay ◽

Samuel E Aggrey ◽

Romdhane Rekaya

Keyword(s):

Genomic Selection ◽

Mixed Model ◽

Relative Weight ◽

Snp Markers ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Improve Accuracy ◽

Increase In Accuracy ◽

Generation Sequencing

Abstract Using low to moderate density SNP marker panels, a substantial increase in accuracy was achieved. The dramatic increase in the number of identified variants due to advances in next generation sequencing was expected to significantly increase the accuracy of genomic selection (GS). Unfortunately, little to no improvement was observed. For mixed model-based approaches, using all SNPs in the panel to compute the observed relationship matrix (G) will not increase accuracy as the additive relationships between individuals can be accurately estimated using a much smaller number of markers. Due to these limitations, variant prioritization has become a necessity to improve accuracy. Further, it has been shown that weighting SNPs when calculating G could be effective in improving the accuracy of GS. FST as a measure population differential has been successfully used to identify genome segments under selection pressure. Consequently, FST could be used to both prioritize SNPs and to derive their relative weight in the calculation of the genomic relationship matrix. A population of 15,000 animals genotyped for 400K SNP markers uniformly-distributed along 10 chromosomes was simulated. A trait with heritability 0.3 genetically controlled by two hundred QTL was generated. The top 20K SNPs based on their FST scores were used either alone or with the remaining 380K SNPs to compute G with or without weighting. When only the top 20K SNPs were used to compute G, two scenarios were considered: 1) equal weights for all SNPs or 2) weights proportional to the SNP FST scores. When all 400K SNP markers were used, different weighting scenarios were evaluated. The results clearly showed that prioritizing SNP markers based on their FST score and using the latter to compute relative weights has increased the genetic similarity between training and validations animals and resulted in more than 5% improvement in the accuracy of GS.

Download Full-text