genomic relationship matrix
Recently Published Documents


TOTAL DOCUMENTS

93
(FIVE YEARS 60)

H-INDEX

12
(FIVE YEARS 3)

BMC Genomics ◽  
2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Cornelius Nel ◽  
Phillip Gurman ◽  
Andrew Swan ◽  
Julius van der Werf ◽  
Margaretha Snyman ◽  
...  

Abstract Background South Africa and Australia shares multiple important sheep breeds. For some of these breeds, genomic breeding values are provided to breeders in Australia, but not yet in South Africa. Combining genomic resources could facilitate development for across country selection, but the influence of population structures could be important to the compatability of genomic data from varying origins. The genetic structure within and across breeds, countries and strains was evaluated in this study by population genomic parameters derived from SNP-marker data. Populations were first analysed by breed and country of origin and then by subpopulations of South African and Australian Merinos. Results Mean estimated relatedness according to the genomic relationship matrix varied by breed (-0.11 to 0.16) and bloodline (-0.08 to 0.06) groups and depended on co-ancestry as well as recent genetic links. Measures of divergence across bloodlines (FST: 0.04–0.12) were sometimes more distant than across some breeds (FST: 0.05–0.24), but the divergence of common breeds from their across-country equivalents was weak (FST: 0.01–0.04). According to mean relatedness, FST, PCA and Admixture, the Australian Ultrafine line was better connected to the SA Cradock Fine Wool flock than with other AUS bloodlines. Levels of linkage disequilibrium (LD) between adjacent markers was generally low, but also varied across breeds (r2: 0.14–0.22) as well as bloodlines (r2: 0.15–0.19). Patterns of LD decay was also unique to breeds, but bloodlines differed only at the absolute level. Estimates of effective population size (Ne) showed genetic diversity to be high for the majority of breeds (Ne: 128–418) but also for bloodlines (Ne: 137–369). Conclusions This study reinforced the genetic complexity and diversity of important sheep breeds, especially the Merino breed. The results also showed that implications of isolation can be highly variable and extended beyond breed structures. However, knowledge of useful links across these population substructures allows for a fine-tuned approach in the combination of genomic resources. Isolation across country rarely proved restricting compared to other structures considered. Consequently, research into the accuracy of across-country genomic prediction is recommended.


Author(s):  
Osval Antonio Montesinos López ◽  
Abelardo Montesinos López ◽  
Jose Crossa

AbstractThis data preparation chapter is of paramount importance for implementing statistical machine learning methods for genomic selection. We present the basic linear mixed model that gives rise to BLUE and BLUP and explain how to decide when to use fixed or random effects that give rise to best linear unbiased estimates (BLUE or BLUEs) and best linear unbiased predictors (BLUP or BLUPs). The R codes for fitting linear mixed model for the data are given in small examples. We emphasize tools for computing BLUEs and BLUPs for many linear combinations of interest in genomic-enabled prediction and plant breeding. We present tools for cleaning, imputing, and detecting minor and major allele frequency computation, marker recodification, frequency of heterogeneous, frequency of NAs, and three methods for computing the genomic relationship matrix. In addition, scaling and data compression of inputs are important in statistical machine learning. For a more extensive description of linear mixed models, see Chap. 10.1007/978-3-030-89010-0_5.


Author(s):  
M Bermann ◽  
D Lourenco ◽  
I Misztal

Abstract The objectives of this study were to develop an efficient algorithm for calculating prediction error variances (PEV) for GBLUP models using the Algorithm for Proven and Young (APY), extend it to single-step GBLUP (ssGBLUP), and to apply this algorithm for approximating the theoretical reliabilities for single and multiple trait models in ssGBLUP. The PEV with APY was calculated by block-sparse inversion, efficiently exploiting the sparse structure of the inverse of the genomic relationship matrix with APY. Single-step GBLUP reliabilities were approximated by combining reliabilities with and without genomic information in terms of effective record contributions. Multi-trait reliabilities relied on single-trait results adjusted using the genetic and residual covariance matrices among traits. Tests involved two datasets provided by the American Angus Association. A small dataset (Data1) was used for comparing the approximated reliabilities with the reliabilities obtained by the inversion of the left-hand side of the mixed model equations. The large dataset (Data2) was used for evaluating the computational performance of the algorithm. Analyses with both datasets used single-trait and three-trait models. The number of animals in the pedigree ranged from 167,951 in Data1 to 10,213,401 in Data2, with 50,000 and 20,000 genotyped animals for single-trait and multiple trait-analysis, respectively, in Data1 and 335,325 in Data2. Correlations between estimated and exact reliabilities obtained by inversion ranged from 0.97 to 0.99, whereas the intercept and slope of the regression of the exact on the approximated reliabilities ranged from 0.00 to 0.04 and from 0.93 to 1.05, respectively. For the three-trait model with the largest dataset (Data2), the elapsed time for the reliability estimation was eleven minutes. The computational complexity of the proposed algorithm increased linearly with the number of genotyped animals and with the number of traits in the model. This algorithm can efficiently approximate the theoretical reliability of genomic estimated breeding values in ssGBLUP with APY for large numbers of genotyped animals at a low cost.


2021 ◽  
Author(s):  
Adam R Festa ◽  
Ross Whetten

Computer simulations of breeding strategies are an essential resource for tree breeders because they allow exploratory analyses into potential long-term impacts on genetic gain and inbreeding consequences without bearing the cost, time, or resource requirements of field experiments. Previous work has modeled the potential long-term implications on inbreeding and genetic gain using random mating and phenotypic selection. Reduction in sequencing costs has enabled the use of DNA marker-based relationship matrices in addition to or in place of pedigree-based allele sharing estimates; this has been shown to provide a significant increase in the accuracy of progeny breeding value prediction. A potential pitfall of genomic selection using genetic relationship matrices is increased coancestry among selections, leading to the accumulation of deleterious alleles and inbreeding depression. We used simulation to compare the relative genetic gain and risk of inbreeding depression within a breeding program similar to loblolly pine, utilizing pedigree-based or marker-based relationships over ten generations. We saw a faster rate of purging deleterious alleles when using a genomic relationship matrix based on markers that track identity-by-descent of segments of the genome. Additionally, we observed an increase in the rate of genetic gain when using a genomic relationship matrix instead of a pedigree-based relationship matrix. While the genetic variance of populations decreased more rapidly when using genomic-based relationship matrices as opposed to pedigree-based, there appeared to be no long-term consequences on the accumulation of deleterious alleles within the simulated breeding strategy.


Author(s):  
Rajiv Sharma ◽  
James Cockram ◽  
Keith A. Gardner ◽  
Joanne Russell ◽  
Luke Ramsay ◽  
...  

Abstract Key message Variety age and population structure detect novel QTL for yield and adaptation in wheat and barley without the need to phenotype. Abstract The process of crop breeding over the last century has delivered new varieties with increased genetic gains, resulting in higher crop performance and yield. However, in many cases, the alleles and genomic regions underpinning this success remain unknown. This is partly due to the difficulty of generating sufficient phenotypic data on large numbers of historical varieties to enable such analyses. Here we demonstrate the ability to circumvent such bottlenecks by identifying genomic regions selected over 100 years of crop breeding using age of a variety as a surrogate for yield. Rather than collecting phenotype data, we deployed ‘environmental genome-wide association scans’ (EnvGWAS) based on variety age in two of the world’s most important crops, wheat and barley, and detected strong signals of selection across both genomes. EnvGWAS identified 16 genomic regions in barley and 10 in wheat with contrasting patterns between spring and winter types of the two crops. To further examine changes in genome structure, we used the genomic relationship matrix of the genotypic data to derive eigenvectors for analysis in EigenGWAS. This detected seven major chromosomal introgressions that contributed to adaptation in wheat. EigenGWAS and EnvGWAS based on variety age avoid costly phenotyping and facilitate the identification of genomic tracts that have been under selection during breeding. Our results demonstrate the potential of using historical cultivar collections coupled with genomic data to identify chromosomal regions under selection and may help guide future plant breeding strategies to maximise the rate of genetic gain and adaptation.


Animals ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 3234
Author(s):  
José Cortes-Hernández ◽  
Adriana García-Ruiz ◽  
Carlos Gustavo Vásquez-Peláez ◽  
Felipe de Jesus Ruiz-Lopez

This study aimed to identify inbreeding coefficient (F) estimators useful for improvement programs in a small Holstein population through the evaluation of different methodologies in the Mexican Holstein population. F was estimated as follows: (a) from pedigree information (Fped); (b) through runs of homozygosity (Froh); (c) from the number of observed and expected homozygotic SNP in the individuals (Fgeno); (d) through the genomic relationship matrix (Fmg). The study included information from 4277 animals with pedigree records and 100,806 SNP. The average and standard deviation values of F were 3.11 ± 2.30 for Fped, −0.02 ± 3.55 for Fgeno, 2.77 ± 0.71 for Froh and 3.03 ± 3.05 for Fmg. The correlations between coefficients varied from 0.30 between Fped and Froh, to 0.96 between Fgeno and Fmg. Differences in the level of inbreeding among the parent’s country of origin were found regardless of the method used. The correlations among genomic inbreeding coefficients were high; however, they were low with Fped, so further research on this topic is required.


2021 ◽  
Vol 12 ◽  
Author(s):  
Andre C. Araujo ◽  
Paulo L. S. Carneiro ◽  
Hinayah R. Oliveira ◽  
Flavio S. Schenkel ◽  
Renata Veroneze ◽  
...  

The level of genetic diversity in a population is inversely proportional to the linkage disequilibrium (LD) between individual single nucleotide polymorphisms (SNPs) and quantitative trait loci (QTLs), leading to lower predictive ability of genomic breeding values (GEBVs) in high genetically diverse populations. Haplotype-based predictions could outperform individual SNP predictions by better capturing the LD between SNP and QTL. Therefore, we aimed to evaluate the accuracy and bias of individual-SNP- and haplotype-based genomic predictions under the single-step-genomic best linear unbiased prediction (ssGBLUP) approach in genetically diverse populations. We simulated purebred and composite sheep populations using literature parameters for moderate and low heritability traits. The haplotypes were created based on LD thresholds of 0.1, 0.3, and 0.6. Pseudo-SNPs from unique haplotype alleles were used to create the genomic relationship matrix (G) in the ssGBLUP analyses. Alternative scenarios were compared in which the pseudo-SNPs were combined with non-LD clustered SNPs, only pseudo-SNPs, or haplotypes fitted in a second G (two relationship matrices). The GEBV accuracies for the moderate heritability-trait scenarios fitting individual SNPs ranged from 0.41 to 0.55 and with haplotypes from 0.17 to 0.54 in the most (Ne ≅ 450) and less (Ne < 200) genetically diverse populations, respectively, and the bias fitting individual SNPs or haplotypes ranged between −0.14 and −0.08 and from −0.62 to −0.08, respectively. For the low heritability-trait scenarios, the GEBV accuracies fitting individual SNPs ranged from 0.24 to 0.32, and for fitting haplotypes, it ranged from 0.11 to 0.32 in the more (Ne ≅ 250) and less (Ne ≅ 100) genetically diverse populations, respectively, and the bias ranged between −0.36 and −0.32 and from −0.78 to −0.33 fitting individual SNPs or haplotypes, respectively. The lowest accuracies and largest biases were observed fitting only pseudo-SNPs from blocks constructed with an LD threshold of 0.3 (p < 0.05), whereas the best results were obtained using only SNPs or the combination of independent SNPs and pseudo-SNPs in one or two G matrices, in both heritability levels and all populations regardless of the level of genetic diversity. In summary, haplotype-based models did not improve the performance of genomic predictions in genetically diverse populations.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 20-20
Author(s):  
Sungbong Jang ◽  
Shogo Tsuruta ◽  
Natalia Leite ◽  
Ignacy Misztal ◽  
Daniela Lourenco

Abstract The ability to identify true-positive variants increases as more genotyped animals are available. Although thousands of animals can be genotyped, the dimensionality of the genomic information is limited. Therefore, there is a certain number of animals that represent all chromosome segments (Me) segregating in the population. The number of Me can be approximated from the eigenvalue decomposition of the genomic relationship matrix (G). Thus, the limited dimensionality may help to identify the number of animals to be used in genome-wide association (GWA). The first objective of this study was to examine different discovery set sizes for GWA, with set sizes based on the number of largest eigenvalues explaining a certain proportion of variance in G. Additionally, we investigated the impact of incorporating variants selected from different set sizes to regular SNP chip used for genomic prediction. Sequence data were simulated that contained 500k SNP and 2k QTL, where the genetic variance was fully explained by QTL. The GWA was conducted using the number of genotyped animals equal to the number of largest eigenvalues of G (EIG) explaining 50, 60, 70, 80, 90, 95, 98, and 99 percent of the variance in G. Significant SNP had a p-value lower than 0.05 with Bonferroni correction. Further, SNP with the largest effect size (top10, 100, 500, 1k, 2k, and 4k) were also selected to be incorporated into the 50k regular chip. Genomic predictions using the 50k combined with selected SNP were conducted using single-step GBLUP (ssGBLUP). Using the number of animals corresponding to at least EIG98 enabled the identification of the largest effect size QTL. The greatest accuracy of prediction was obtained when the top 2k SNP was combined to the 50k chip. The dimensionality of genomic information should be taken into account for variant selection in GWAS.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 228-229
Author(s):  
Bruna Santana ◽  
Molly Riser ◽  
Breno O Fragomeni

Abstract This study aimed to evaluate the accuracy of genomic prediction with simulated data, using SNP markers, causal quantitative trait nucleotide (QTN), and the combination of both. The methods used were the best linear unbiased prediction (GBLUP) and single-step GBLUP (ssGBLUP), with alternative SNP weights. Data were simulated using the package AlphasimR. Trait heritability of 0.3 was assumed, and genetic variance was fully accounted for by 100 or 1000 QTNs. A population with an effective size of 200 was selected, and 20 generations were simulated. The genomic information mimicked the 29 bovine chromosomes and included 50k SNP markers evenly distributed across the genome. Approximately 16800 genotypes were available from selected sires and dams in generations 16–19, and 2000 animals in generation 20. Phenotypes for young animals were not included in the analysis, as they were used in the validation. For GBLUP, three pseudo-phenotypes were considered: the raw phenotype, the true breeding value, and the true breeding value with noise added. The genomic relationship matrix was weighted using quadratic weights, calculated based on the SNP variance, and non-linear A, following different equation parameters. The scenario with exclusively causal variants presented accuracies close to 1 for 100 QTL, and slightly lower in the 1000 QTL. For the SNP + QTN scenario, quadratic weights promoted higher accuracy gains than the SNPs alone, especially in the 100 QTN trait. Accuracies converged at higher values for both quadratic and non-linear A weights in the 100 QTN scenario. For the 1000 QTN trait, quadratic weights diverged and reduced accuracy, while non-linear A maintained accuracy at their peaks, depending on the equation parameters. Parameters of non-linear A for highest accuracy were different in each scenario and type of analysis. Proportionally, gains in accuracy were more prominent with GBLUP than with ssGBLUP.


Author(s):  
Luke M Kramer ◽  
Ania Wolc ◽  
Hadi Esfandyari ◽  
Dinesh M Thekkoot ◽  
Chunyan Zhang ◽  
...  

Abstract For swine breeding programs, testing and selection programs are usually within purebred (PB) populations located in nucleus units that are generally managed differently and tend to have a higher health level than the commercial herds in which the crossbred (CB) descendants of these nucleus animals are expected to perform. This approach assumes that PB animals selected in the nucleus herd will have CB progeny that have superior performance at the commercial level. There is clear evidence that this may not be the case for all traits of economic importance and, thus, including data collected at the commercial herd level may increase the accuracy of selection for commercial CB performance at the nucleus level. The goal for this study was to estimate genetic parameters for five maternal reproductive traits between two PB maternal nucleus populations (Landrace and Yorkshire) and their CB offspring: Total Number Born (TNB), Number Born Alive (NBA), Number Born Alive > 1 kg (NBA>1kg), Total Number Weaned (TNW), and Litter Weight at Weaning (LWW). Estimates were based on single-step GBLUP by analyzing any two combinations of a PB and the CB population, and by analyzing all three populations jointly. The genomic relationship matrix between the three populations was generated by using within population allele frequencies for relationships within a population, and across population allele frequencies for relationships of the CB with the PB animals. Utilization of metafounders for the two PB populations had no effect on parameter estimates, so the two PB populations were assumed to be genetically unrelated. Joint analysis of two (one PB plus CB) versus three (both PB and CB) populations did not impact estimates of heritability, additive genetic variance, and genetic correlations. Heritabilities were generally similar between the PB and CB populations, except for LWW and TNW, for which PB populations had about four times larger estimates than CB. Purebred-crossbred genetic correlations () were larger for Landrace than for Yorkshire, except for NBA>1kg. These estimates of indicate that there is potential to improve selection of PB animals for CB performance by including CB information for all traits in the Yorkshire population, but that noticeable additional gains may only occur for NBA>1kg and TNW in the Landrace population.


Sign in / Sign up

Export Citation Format

Share Document