scholarly journals Single-Trait and Multiple-Trait Genomic Prediction From Multi-Class Bayesian Alphabet Models Using Biological Information

2021 ◽  
Vol 12 ◽  
Author(s):  
Zigui Wang ◽  
Hao Cheng

Genomic prediction has been widely used in multiple areas and various genomic prediction methods have been developed. The majority of these methods, however, focus on statistical properties and ignore the abundant useful biological information like genome annotation or previously discovered causal variants. Therefore, to improve prediction performance, several methods have been developed to incorporate biological information into genomic prediction, mostly in single-trait analysis. A commonly used method to incorporate biological information is allocating molecular markers into different classes based on the biological information and assigning separate priors to molecular markers in different classes. It has been shown that such methods can achieve higher prediction accuracy than conventional methods in some circumstances. However, these methods mainly focus on single-trait analysis, and available priors of these methods are limited. Thus, in both single-trait and multiple-trait analysis, we propose the multi-class Bayesian Alphabet methods, in which multiple Bayesian Alphabet priors, including RR-BLUP, BayesA, BayesB, BayesCΠ, and Bayesian LASSO, can be used for markers allocated to different classes. The superior performance of the multi-class Bayesian Alphabet in genomic prediction is demonstrated using both real and simulated data. The software tool JWAS offers open-source routines to perform these analyses.

Author(s):  
Shaohua Zhu ◽  
Tingting Guo ◽  
Chao Yuan ◽  
Jianbin Liu ◽  
Jianye Li ◽  
...  

ABSTRACT The marker density, the heritability level of trait and the statistical models adopted are critical to the accuracy of genomic prediction (GP) or selection (GS). If the potential of GP is to be fully utilized to optimize the effect of breeding and selection, in addition to incorporating the above factors into simulated data for analysis, it is essential to incorporate these factors into real data for understanding their impact on GP accuracy, more clearly and intuitively. Herein, we studied the genomic prediction of six wool traits of sheep by two different models, including Bayesian Alphabet (BayesA, BayesB, BayesC π and Bayesian LASSO) and genomic best linear unbiased prediction (GBLUP). We adopted 5-fold cross-validation to perform the accuracy evaluation based on the genotyping data of Alpine Merino sheep (n = 821). The main aim was to study the influence and interaction of different models and marker densities on GP accuracy. The GP accuracy of the six traits was found to be between 0.28 and 0.60, as demonstrated by the cross-validation results. We showed that the accuracy of GP could be improved by increasing the marker density, which is closely related to the model adopted and the heritability level of the trait. Moreover, based on two different marker densities, it was derived that the prediction effect of GBLUP model for traits with low heritability was better; while with the increase of heritability level, the advantage of Bayesian Alphabet would be more obvious, therefore, different models of GP are appropriate in different traits. These findings indicated the significance of applying appropriate models for GP which would assist in further exploring the optimization of GP.


2017 ◽  
Author(s):  
Hao Cheng ◽  
Kadir Kizilkaya ◽  
Jian Zeng ◽  
Dorian Garrick ◽  
Rohan Fernando

ABSTRACTBayesian multiple-regression methods incorporating different mixture priors for marker effects are widely used in genomic prediction. Improvement in prediction accuracies from using those methods, such as BayesB, BayesC and BayesCπ, have been shown in single-trait analyses with both simulated data and real data. These methods have been extended to multi-trait analyses, but only under a specific limited circumstance that assumes a locus affects all the traits or none of them. In this paper, we develop and implement the most general multi-trait BayesCΠ and BayesB methods allowing a broader range of mixture priors. Further, we compare them to single-trait methods and the “restricted” multi-trait formulation using real data. In those data analyses, significant higher prediction accuracies were sometimes observed from these new broad-based multi-trait Bayesian multiple-regression methods. The software tool JWAS offers routines to perform the analyses.


2019 ◽  
Vol 29 (1) ◽  
pp. 265-274
Author(s):  
Ali Kiadaliri ◽  
Monica Hernández Alava ◽  
Ewa M. Roos ◽  
Martin Englund

Abstract Purpose To develop a mapping model to estimate EQ-5D-3L from the Knee Injury and Osteoarthritis Outcome Score (KOOS). Methods The responses to EQ-5D-3L and KOOS questionnaires (n = 40,459 observations) were obtained from the Swedish National anterior cruciate ligament (ACL) Register for patients ≥ 18 years with the knee ACL injury. We used linear regression (LR) and beta-mixture (BM) for direct mapping and the generalized ordered probit model for response mapping (RM). We compared the distribution of the original data to the distributions of the data generated using the estimated models. Results Models with individual KOOS subscales performed better than those with the average of KOOS subscale scores (KOOS5, KOOS4). LR had the poorest performance overall and across the range of disease severity particularly at the extremes of the distribution of severity. Compared with the RM, the BM performed better across the entire range of disease severity except the most severe range (KOOS5 < 25). Moving from the most to the least disease severity was associated with 0.785 gain in the observed EQ-5D-3L. The corresponding value was 0.743, 0.772 and 0.782 for LR, BM and RM, respectively. LR generated simulated EQ-5D-3L values outside the feasible range. The distribution of simulated data generated from the BM model was almost identical to the original data. Conclusions We developed mapping models to estimate EQ-5D-3L from KOOS facilitating application of KOOS in cost-utility analyses. The BM showed superior performance for estimating EQ-5D-3L from KOOS. Further validation of the estimated models in different independent samples is warranted.


Genetics ◽  
2001 ◽  
Vol 157 (3) ◽  
pp. 1369-1385 ◽  
Author(s):  
Z W Luo ◽  
C A Hackett ◽  
J E Bradshaw ◽  
J W McNicol ◽  
D Milbourne

Abstract This article presents methodology for the construction of a linkage map in an autotetraploid species, using either codominant or dominant molecular markers scored on two parents and their full-sib progeny. The steps of the analysis are as follows: identification of parental genotypes from the parental and offspring phenotypes; testing for independent segregation of markers; partition of markers into linkage groups using cluster analysis; maximum-likelihood estimation of the phase, recombination frequency, and LOD score for all pairs of markers in the same linkage group using the EM algorithm; ordering the markers and estimating distances between them; and reconstructing their linkage phases. The information from different marker configurations about the recombination frequency is examined and found to vary considerably, depending on the number of different alleles, the number of alleles shared by the parents, and the phase of the markers. The methods are applied to a simulated data set and to a small set of SSR and AFLP markers scored in a full-sib population of tetraploid potato.


2020 ◽  
Author(s):  
Fanny Mollandin ◽  
Andrea Rau ◽  
Pascal Croiseau

ABSTRACTTechnological advances and decreasing costs have led to the rise of increasingly dense genotyping data, making feasible the identification of potential causal markers. Custom genotyping chips, which combine medium-density genotypes with a custom genotype panel, can capitalize on these candidates to potentially yield improved accuracy and interpretability in genomic prediction. A particularly promising model to this end is BayesR, which divides markers into four effect size classes. BayesR has been shown to yield accurate predictions and promise for quantitative trait loci (QTL) mapping in real data applications, but an extensive benchmarking in simulated data is currently lacking. Based on a set of real genotypes, we generated simulated data under a variety of genetic architectures, phenotype heritabilities, and we evaluated the impact of excluding or including causal markers among the genotypes. We define several statistical criteria for QTL mapping, including several based on sliding windows to account for linkage disequilibrium. We compare and contrast these statistics and their ability to accurately prioritize known causal markers. Overall, we confirm the strong predictive performance for BayesR in moderately to highly heritable traits, particularly for 50k custom data. In cases of low heritability or weak linkage disequilibrium with the causal marker in 50k genotypes, QTL mapping is a challenge, regardless of the criterion used. BayesR is a promising approach to simultaneously obtain accurate predictions and interpretable classifications of SNPs into effect size classes. We illustrated the performance of BayesR in a variety of simulation scenarios, and compared the advantages and limitations of each.


2019 ◽  
Author(s):  
Emily Jamieson ◽  
Roxanna Korologou-Linden ◽  
Robyn E. Wootton ◽  
Anna L. Guyatt ◽  
Thomas Battram ◽  
...  

AbstractWhether smoking-associated DNA methylation has a causal effect on lung function has not been thoroughly evaluated. We investigated the causal effects of 474 smoking-associated CpGs on forced expiratory volume in one second (FEV1) in two-sample Mendelian randomization (MR) using methylation quantitative trait loci and genome-wide association data for FEV1. We found evidence of a possible causal effect for DNA methylation on FEV1 at 18 CpGs (p<1.2×10−4). Replication analysis supported a causal effect at three CpGs (cg21201401 (ZGPAT), cg19758448 (PGAP3) and cg12616487 (AHNAK) (p<0.0028). DNA methylation did not clearly mediate the effect of smoking on FEV1, although DNA methylation at some sites may influence lung function via effects on smoking. Using multiple-trait colocalization, we found evidence of shared causal variants between lung function, gene expression and DNA methylation. Findings highlight potential therapeutic targets for improving lung function and possibly smoking cessation, although large, tissue-specific datasets are required to confirm these results.


Biostatistics ◽  
2018 ◽  
Vol 21 (3) ◽  
pp. 610-624
Author(s):  
Ziyi Li ◽  
Changgee Chang ◽  
Suprateek Kundu ◽  
Qi Long

Summary Biclustering techniques can identify local patterns of a data matrix by clustering feature space and sample space at the same time. Various biclustering methods have been proposed and successfully applied to analysis of gene expression data. While existing biclustering methods have many desirable features, most of them are developed for continuous data and few of them can efficiently handle -omics data of various types, for example, binomial data as in single nucleotide polymorphism data or negative binomial data as in RNA-seq data. In addition, none of existing methods can utilize biological information such as those from functional genomics or proteomics. Recent work has shown that incorporating biological information can improve variable selection and prediction performance in analyses such as linear regression and multivariate analysis. In this article, we propose a novel Bayesian biclustering method that can handle multiple data types including Gaussian, Binomial, and Negative Binomial. In addition, our method uses a Bayesian adaptive structured shrinkage prior that enables feature selection guided by existing biological information. Our simulation studies and application to multi-omics datasets demonstrate robust and superior performance of the proposed method, compared to other existing biclustering methods.


2010 ◽  
Vol 22 (05) ◽  
pp. 409-418 ◽  
Author(s):  
Ali Taalimi ◽  
Emad Fatemizadeh

Functional magnetic resonance imaging (fMRI) is widely-used for detection of the brain's neural activity. The signals and images acquired through this imaging technique demonstrate the human brain's response to pre-scheduled tasks. Several studies on blood oxygenation level-dependent (BOLD) signal responses demonstrate nonlinear behavior in response to a stimulus. In this paper we propose a new mathematical approach for modeling BOLD signal activity, which is able to model nonlinear and time variant behaviors of this physiological system. We employ the Nonlinear Auto Regressive Moving Average (NARMA) model to describe the mathematical relationship between output signals and predesigned tasks. The model parameters can be used to distinguish between rest and active states of a brain region. We applied our proposed method for active region detection on real as well as simulated data sets. The results show superior performance in comparison with existing methods.


Aquaculture ◽  
2021 ◽  
pp. 737069
Author(s):  
Sila Sukhavachana ◽  
Wansuk Senanan ◽  
Naruechon Pattarapanyawong ◽  
Chumpol Srithong ◽  
Weerakit Joerakate ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document