scholarly journals The Role of Different Linkage Disequilibrium Patterns in Genomic Prediction: The gBULP Based Exploratory Method in Tehran Cardiometabolic Genetic Study

Author(s):  
Mahdei Akbarzadeh ◽  
Saeid Dehkordi ◽  
Mahmoud Roudbar ◽  
Parisa Riahi ◽  
Mehdi Sargolzaei ◽  
...  

Abstract Background: Current GWAS discoveries have discovered novel clinical improvements in recent decades, such as estimating whole-genome risk. Genetic prediction of traits has substantial impacts on public health care and disease prevention. This study aimed to investigate the effects of different linkage disequilibrium (LD) patterns on genomic prediction accuracy and SNP-based heritability estimation for four lipid profile traits.Results: This family-based study included 11,798 individuals ranging from 3 to 80 ys, extracted from Tehran Cardiometabolic Genetic Study (TCGS). LD patterns were considered on different thresholds (0.01, 0.03, 0.05, 0.07, 0.09, 0.1, 0.2, 0.3, 0.5, 0.6, 0.7, 0.8, and 0.9) to create subsets of SNPs. We have compared the prediction accuracy and SNP-based heritability estimation of the selected SNPs within these patterns as well as randomly selected SNPs with equal sizes. Subsets of SNPs selected based on LD patterns had a higher prediction accuracy level than subsets of SNPs selected randomly, and when the LD threshold increases, the difference tends to zero. The results were consistent when the prediction accuracy of subsets were adjusted for their SNP numbers in all traits. For all traits, when the number of SNPs was adjusted, between LD threshold 0.01 and 0.2, both prediction accuracy and SNP-based heritability have a dramatic rise. After substantial growth, there was a steady decline, and they reach a peak at an LD threshold between 0.2 and 0.3.Conclusions: This research indicated that having selected subsets of SNPs based on the LD threshold always outperform randomly selected SNPs for prediction objectives. However, determining the specific LD threshold for prediction purposes might be controversial since achieving the highest level of prediction accuracy, when the number of SNPs is adjusted, prompts different results (in our case, 0.3 when the SNP number was adjusted and 0.9 when the SNP number is not adjusted). Finally, we concluded that choosing the LD threshold as a tool to boost genetic prediction accuracy should be used with intense care.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mahdi Akbarzadeh ◽  
Saeid Rasekhi Dehkordi ◽  
Mahmoud Amiri Roudbar ◽  
Mehdi Sargolzaei ◽  
Kamran Guity ◽  
...  

AbstractIn recent decades, ongoing GWAS findings discovered novel therapeutic modifications such as whole-genome risk prediction in particular. Here, we proposed a method based on integrating the traditional genomic best linear unbiased prediction (gBLUP) approach with GWAS information to boost genetic prediction accuracy and gene-based heritability estimation. This study was conducted in the framework of the Tehran Cardio-metabolic Genetic study (TCGS) containing 14,827 individuals and 649,932 SNP markers. Five SNP subsets were selected based on GWAS results: top 1%, 5%, 10%, 50% significant SNPs, and reported associated SNPs in previous studies. Furthermore, we randomly selected subsets as large as every five subsets. Prediction accuracy has been investigated on lipid profile traits with a tenfold and 10-repeat cross-validation algorithm by the gBLUP method. Our results revealed that genetic prediction based on selected subsets of SNPs obtained from the dataset outperformed the subsets from previously reported SNPs. Selected SNPs’ subsets acquired a more precise prediction than whole SNPs and much higher than randomly selected SNPs. Also, common SNPs with the most captured prediction accuracy in the selected sets caught the highest gene-based heritability. However, it is better to be mindful of the fact that a small number of SNPs obtained from GWAS results could capture a highly notable proportion of variance and prediction accuracy.


2020 ◽  
Vol 11 ◽  
Author(s):  
Sohyoung Won ◽  
Jong-Eun Park ◽  
Ju-Hwan Son ◽  
Seung-Hwan Lee ◽  
Byeong Ho Park ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Sohyoung Won ◽  
Jong-Eun Park ◽  
Ju-Hwan Son ◽  
Seung-Hwan Lee ◽  
Byeong Ho Park ◽  
...  

Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


PLoS ONE ◽  
2017 ◽  
Vol 12 (12) ◽  
pp. e0189775 ◽  
Author(s):  
S. Hong Lee ◽  
Sam Clark ◽  
Julius H. J. van der Werf

2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 245-246
Author(s):  
Cláudio U Magnabosco ◽  
Fernando Lopes ◽  
Valentina Magnabosco ◽  
Raysildo Lobo ◽  
Leticia Pereira ◽  
...  

Abstract The aim of the study was to evaluate prediction methods, validation approaches and pseudo-phenotypes for the prediction of the genomic breeding values of feed efficiency related traits in Nellore cattle. It used the phenotypic and genotypic information of 4,329 and 3,594 animals, respectively, which were tested for residual feed intake (RFI), dry matter intake (DMI), feed efficiency (FE), feed conversion ratio (FCR), residual body weight gain (RG), and residual intake and body weight gain (RIG). Six prediction methods were used: ssGBLUP, BayesA, BayesB, BayesCπ, BLASSO, and BayesR. Three validation approaches were used: 1) random: where the data was randomly divided into ten subsets and the validation was done in each subset at a time; 2) age: the division into the training (2010 to 2016) and validation population (2017) were based on the year of birth; 3) genetic breeding value (EBV) accuracy: the data was split in the training population being animals with accuracy above 0.45; and validation population those below 0.45. We checked the accuracy and bias of genomic value (GEBV). The results showed that the GEBV accuracy was the highest when the prediction is obtained with ssGBLUP (0.05 to 0.31) (Figure 1). The low heritability obtained, mainly for FE (0.07 ± 0.03) and FCR (0.09 ± 0.03), limited the GEBVs accuracy, which ranged from low to moderate. The regression coefficient estimates were close to 1, and similar between the prediction methods, validation approaches, and pseudo-phenotypes. The cross-validation presented the most accurate predictions ranging from 0.07 to 0.037. The prediction accuracy was higher for phenotype adjusted for fixed effects than for EBV and EBV deregressed (30.0 and 34.3%, respectively). Genomic prediction can provide a reliable estimate of genomic breeding values for RFI, DMI, RG and RGI, as to even say that those traits may have higher genetic gain than FE and FCR.


2017 ◽  
Vol 130 (12) ◽  
pp. 2543-2555 ◽  
Author(s):  
Adam Norman ◽  
Julian Taylor ◽  
Emi Tanaka ◽  
Paul Telfer ◽  
James Edwards ◽  
...  

Author(s):  
Stefan McKinnon Edwards ◽  
Jaap B. Buntjer ◽  
Robert Jackson ◽  
Alison R. Bentley ◽  
Jacob Lage ◽  
...  

2020 ◽  
Author(s):  
Fanny Mollandin ◽  
Andrea Rau ◽  
Pascal Croiseau

ABSTRACTTechnological advances and decreasing costs have led to the rise of increasingly dense genotyping data, making feasible the identification of potential causal markers. Custom genotyping chips, which combine medium-density genotypes with a custom genotype panel, can capitalize on these candidates to potentially yield improved accuracy and interpretability in genomic prediction. A particularly promising model to this end is BayesR, which divides markers into four effect size classes. BayesR has been shown to yield accurate predictions and promise for quantitative trait loci (QTL) mapping in real data applications, but an extensive benchmarking in simulated data is currently lacking. Based on a set of real genotypes, we generated simulated data under a variety of genetic architectures, phenotype heritabilities, and we evaluated the impact of excluding or including causal markers among the genotypes. We define several statistical criteria for QTL mapping, including several based on sliding windows to account for linkage disequilibrium. We compare and contrast these statistics and their ability to accurately prioritize known causal markers. Overall, we confirm the strong predictive performance for BayesR in moderately to highly heritable traits, particularly for 50k custom data. In cases of low heritability or weak linkage disequilibrium with the causal marker in 50k genotypes, QTL mapping is a challenge, regardless of the criterion used. BayesR is a promising approach to simultaneously obtain accurate predictions and interpretable classifications of SNPs into effect size classes. We illustrated the performance of BayesR in a variety of simulation scenarios, and compared the advantages and limitations of each.


Sign in / Sign up

Export Citation Format

Share Document