scholarly journals Interpretable Artificial Neural Networks incorporating Bayesian Alphabet Models for Genome-wide Prediction and Association Studies

Author(s):  
Tianjing Zhao ◽  
Rohan Fernando ◽  
Hao Cheng

Abstract In conventional linear models for whole-genome prediction and genome-wide association studies (GWAS), it is usually assumed that the relationship between genotypes and phenotypes is linear. Bayesian neural networks have been used to account for non-linearity such as complex genetic architectures. Here, we introduce a method named NN-Bayes, where” NN” stands for neural networks, and” Bayes” stands for Bayesian Alphabet models, including a collection of Bayesian regression models such as BayesA, BayesB, BayesC, and Bayesian LASSO. NN-Bayes incorporates Bayesian Alphabet models into non-linear neural networks via hidden layers between SNPs and observed traits. Thus, NN-Bayes attempts to improve the performance of genome-wide prediction and GWAS by accommodating non-linear relationships between the hidden nodes and the observed trait, while maintaining genomic interpretability through the Bayesian regression models that connect the SNPs to the hidden nodes. For genomic interpretability, the posterior distribution of marker effects in NN-Bayes is inferred by Markov chain Monte Carlo (MCMC) approaches and used for inference of association through posterior inclusion probabilities (PIPs) and window posterior probability of association (WPPA). In simulation studies with dominance and epistatic effects, performance of NN-Bayes was significantly better than conventional linear models for both GWAS and whole-genome prediction, and the differences on prediction accuracy were substantial in magnitude. In real data analyses, for the soy dataset, NN-Bayes achieved significantly higher prediction accuracies than conventional linear models, and results from other four different species showed that NN-Bayes had similar prediction performance to linear models, which is potentially due to the small sample size. Our NN-Bayes is optimized for high-dimensional genomic data and implemented in an open-source package called” JWAS”. NN-Bayes can lead to greater use of Bayesian neural networks to account for non-linear relationships due to its interpretability and computational performance.

2021 ◽  
Author(s):  
Tianjing Zhao ◽  
Rohan Fernando ◽  
Hao Cheng

ABSTRACTIn conventional linear models for whole-genome prediction and genome-wide association studies (GWAS), it is usually assumed that the relationship between genotypes and phenotypes is linear. Bayesian neural networks have been used to account for non-linearity such as complex genetic architectures. Here, we introduce a method named NN-Bayes, where “NN” stands for neural networks, and “Bayes” stands for Bayesian Alphabet models, including a collection of Bayesian regression models such as BayesA, BayesB, BayesC, Bayesian LASSO, and BayesR. NN-Bayes incorporates Bayesian Alphabet models into non-linear neural networks via hidden layers between SNPs and observed traits. Thus, NN-Bayes attempts to improve the performance of genome-wide prediction and GWAS by accommodating non-linear relationships between the hidden nodes and the observed trait, while maintaining genomic interpretability through the Bayesian regression models that connect the SNPs to the hidden nodes. For genomic interpretability, the posterior distribution of marker effects in NN-Bayes is inferred by Markov chain Monte Carlo (MCMC) approaches and used for inference of association through posterior inclusion probabilities (PIPs) and window posterior probability of association (WPPA). In simulation studies with dominance and epistatic effects, performance of NN-Bayes was significantly better than conventional linear models for both GWAS and whole-genome prediction, and the differences on prediction accuracy were substantial in magnitude. In real data analyses, for the soy dataset, NN-Bayes achieved significantly higher prediction accuracies than conventional linear models, and results from other four different species showed that NN-Bayes had similar prediction performance to linear models, which is potentially due to the small sample size. Our NN-Bayes is optimized for high-dimensional genomic data and implemented in an open-source package called “JWAS”. NN-Bayes can lead to greater use of Bayesian neural networks to account for non-linear relationships due to its interpretability and computational performance.


2020 ◽  
Author(s):  
Meng Luo ◽  
Shiliang Gu

AbstractAlthough genome-wide association studies have successfully identified thousands of markers associated with various complex traits and diseases, our ability to predict such phenotypes remains limited. A perhaps ignored explanation lies in the limitations of the genetic models and statistical techniques commonly used in association studies. However, using genotype data for individuals to perform accurate genetic prediction of complex traits can promote genomic selection in animal and plant breeding and can lead to the development of personalized medicine in humans. Because most complex traits have a polygenic architecture, accurate genetic prediction often requires modeling genetic variants together via polygenic methods. Here, we also utilize our proposed polygenic methods, which refer to as the iterative screen regression model (ISR) for genome prediction. We compared ISR with several commonly used prediction methods with simulations. We further applied ISR to predicting 15 traits, including the five species of cattle, rice, wheat, maize, and mice. The results of the study indicate that the ISR method performs well than several commonly used polygenic methods and stability.


2013 ◽  
Vol 113 (suppl_1) ◽  
Author(s):  
Christoph D Rau ◽  
Jessica Wang ◽  
Shuxun Ren ◽  
Zhihua Wang ◽  
Hongmei Ruan ◽  
...  

Heart failure is highly heterogeneous and as a result, relatively few insights into the pathways and drivers of heart failure have been identified using system-wide methods such as genome-wide association studies (GWAS). We have developed a resource, the Hybrid Mouse Diversity Panel (HMDP) for high resolution GWAS and systems genetics in mice. Eight week old female mice from 93 unique inbred strains of the HMDP were given 20 μg/g/day of isoproterenol through an abdominally implanted Alzet micropump. Three weeks post-implantation, all mice were sacrificed, along with age-matched controls. The mice exhibited widely varying degrees of hypertrophy and heart functioning. A portion of the left ventricle was processed and arrayed on an Illumina Mouse Ref 8.0 platform. We used Maximal Information Component Analysis, a novel method of network construction which allows for non-linear relationships between genes as well as non-binary partitioning of genes into sub-networks to subdivide the expression data into a series of modules. In order to identify modules which may contribute to Isoproterenol-induced hypertrophy and failure, we examined the correlation of each module to clinically relevant cardiac traits traits such as organ weights and echocardiographic parameters. We identified several modules with strong correlations to multiple heart failure-related clinical traits, including one module of 41 genes which contained several genes of interest, including Lgals3, a diagnostic marker for heart failure. Utilizing eQTL hotspot analysis, we have identified a locus which is involved in the regulation of this module. A gene within this locus, Magi2, regulates the turnover of the β-adrenergic receptor and represents a likely candidate for the response to isoproterenol.


2021 ◽  
Vol 18 (4) ◽  
pp. 280-296
Author(s):  
Abdel Razzaq Al Rababa’a ◽  
Zaid Saidat ◽  
Raed Hendawi

Different models have been used in the finance literature to predict the stock market returns. However, it remains an open question whether non-linear models can outperform linear models while providing accurate predictions for future returns. This study examines the prediction of the non-linear artificial neural network (ANN) models against the baseline linear regression models. This study aims specifically to compare the prediction performance of regression models with different specifications and static and dynamic ANN models. Thus, the analysis was conducted on a growing market, namely the Amman Stock Exchange. The results show that the trading volume and interest rates on loans tend to explain the monthly returns the most, compared to other predictors in the regressions. Moreover, incorporating more variables is not found to help in explaining the fluctuations in the stock market returns. More importantly, using the root mean square error (RMSE), as well as the mean absolute error statistical measures, the static ANN becomes the most preferred model for forecasting. The associated forecasting errors from these metrics become equal to 0.0021 and 0.0005, respectively. Lastly, the analysis conducted with the dynamic ANN model produced the highest RMSE value of 0.0067 since November 2018 following the amendment to the Jordanian income tax law. The same observation is also seen since the emerging of the COVID-19 outbreak (RMSE = 0.0042).


Animals ◽  
2020 ◽  
Vol 10 (11) ◽  
pp. 2009
Author(s):  
Ellen Lai ◽  
Alexa L. Danner ◽  
Thomas R. Famula ◽  
Anita M. Oberbauer

Digital dermatitis (DD) causes lameness in dairy cattle. To detect the quantitative trait loci (QTL) associated with DD, genome-wide association studies (GWAS) were performed using high-density single nucleotide polymorphism (SNP) genotypes and binary case/control, quantitative (average number of FW per hoof trimming record) and recurrent (cases with ≥2 DD episodes vs. controls) phenotypes from cows across four dairies (controls n = 129 vs. FW n = 85). Linear mixed model (LMM) and random forest (RF) approaches identified the top SNPs, which were used as predictors in Bayesian regression models to assess the SNP predictive value. The LMM and RF analyses identified QTL regions containing candidate genes on Bos taurus autosome (BTA) 2 for the binary and recurrent phenotypes and BTA7 and 20 for the quantitative phenotype that related to epidermal integrity, immune function, and wound healing. Although larger sample sizes are necessary to reaffirm these small effect loci amidst a strong environmental effect, the sample cohort used in this study was sufficient for estimating SNP effects with a high predictive value.


Sign in / Sign up

Export Citation Format

Share Document