scholarly journals Opportunities and limits of combining microbiome and genome data for complex trait prediction

2020 ◽  
Author(s):  
Miguel Pérez-Enciso ◽  
Laura M. Zingaretti ◽  
Yuliaxis Ramayo-Caldas ◽  
Gustavo de los Campos

AbstractThe analysis and prediction of complex traits using microbiome data combined with host genomic information is a topic of utmost interest. However, numerous questions remain to be answered: How useful can the microbiome be for complex trait prediction? Are microbiability estimates reliable? Can the underlying biological links between the host’s genome, microbiome, and the phenome be recovered? Here, we address these issues by (i) developing a novel simulation strategy that uses real microbiome and genotype data as input, and (ii) proposing a variance-component approach which, in the spirit of mediation analyses, quantifies the proportion of phenotypic variance explained by genome and microbiome, and dissects it into direct and indirect effects. The proposed simulation approach can mimic a genetic link between the microbiome and SNP data via a permutation procedure that retains the distributional properties of the data. Results suggest that microbiome data could significantly improve phenotype prediction accuracy, irrespective of whether some abundances are under direct genetic control by the host or not. Overall, random-effects linear methods appear robust for variance components estimation, despite the highly leptokurtic distribution of microbiota abundances. Nevertheless, we observed that accuracy depends in part on the number of microorganisms’ taxa influencing the trait of interest. While we conclude that overall genome-microbiome-links can be characterized via variance components, we are less optimistic about the possibility of identifying the causative effects, i.e., individual SNPs affecting abundances; power at this level would require much larger sample sizes than the ones typically available for genome-microbiome-phenome data.Author summaryThe microbiome consists of the microorganisms that live in a particular environment, including those in our organism. There is consistent evidence that these communities play an important role in numerous traits of relevance, including disease susceptibility or feed efficiency. Moreover, it has been shown that the microbiome can be relatively stable throughout an individual’s life and that is affected by the host genome. These reasons have prompted numerous studies to determine whether and how the microbiome can be used for prediction of complex phenotypes, either using microbiome alone or in combination with host’s genome data. However, numerous questions remain to be answered such as the reliability of parameter estimates, or which is the underlying relationship between microbiome, genome, and phenotype. The few available empirical studies do not provide a clear answer to these problems. Here we address these issues by developing a novel simulation strategy and we show that, although the microbiome can significantly help in prediction, it will be difficult to retrieve the actual biological basis of interactions between the microbiome and the trait.

2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Miguel Pérez-Enciso ◽  
Laura M. Zingaretti ◽  
Yuliaxis Ramayo-Caldas ◽  
Gustavo de los Campos

Abstract Background Analysis and prediction of complex traits using microbiome data combined with host genomic information is a topic of utmost interest. However, numerous questions remain to be answered: how useful can the microbiome be for complex trait prediction? Are estimates of microbiability reliable? Can the underlying biological links between the host’s genome, microbiome, and phenome be recovered? Methods Here, we address these issues by (i) developing a novel simulation strategy that uses real microbiome and genotype data as inputs, and (ii) using variance-component approaches (Bayesian Reproducing Kernel Hilbert Space (RKHS) and Bayesian variable selection methods (Bayes C)) to quantify the proportion of phenotypic variance explained by the genome and the microbiome. The proposed simulation approach can mimic genetic links between the microbiome and genotype data by a permutation procedure that retains the distributional properties of the data. Results Using real genotype and rumen microbiota abundances from dairy cattle, simulation results suggest that microbiome data can significantly improve the accuracy of phenotype predictions, regardless of whether some microbiota abundances are under direct genetic control by the host or not. This improvement depends logically on the microbiome being stable over time. Overall, random-effects linear methods appear robust for variance components estimation, in spite of the typically highly leptokurtic distribution of microbiota abundances. The predictive performance of Bayes C was higher but more sensitive to the number of causative effects than RKHS. Accuracy with Bayes C depended, in part, on the number of microorganisms’ taxa that influence the phenotype. Conclusions While we conclude that, overall, genome-microbiome-links can be characterized using variance component estimates, we are less optimistic about the possibility of identifying the causative host genetic effects that affect microbiota abundances, which would require much larger sample sizes than are typically available for genome-microbiome-phenome studies. The R code to replicate the analyses is in https://github.com/miguelperezenciso/simubiome.


1999 ◽  
Vol 22 (4) ◽  
pp. 577-582 ◽  
Author(s):  
Flavia França Teixeira ◽  
Magno Antonio Patto Ramalho ◽  
Ângela de Fátima Barbosa Abreu

More erect plant architecture has been a goal in the development of bean cultivars. Aiming to obtain more information about genetic control of traits related to plant architecture, this work was carried out between August 1995 and July 1997 in the southern and Alto São Francisco regions, in the State of Minas Gerais, Brazil. Initially, analyses were performed with individual plants of parents and different segregant generations from the crosses Carioca-MG x H-4 and Carioca x FT-Tarumã. In these experiments, besides degree of erectness, other traits were evaluated: ramification degree, internode length, internode diameter and height of insertion of the first pod. Mean and variance components and heritability at an individual level were estimated. Later, families derived from F2 or F3 plants of the same crosses were evaluated for degree of erectness. Genetic and phenotypic variance between family averages, heritabilities using variance components, and realized heritability were estimated. Of the morphological traits, internode length varied the most. There was a predominance of additive effect in the control of this trait. Evaluating plant architecture with individual plants for degree of erectness was not efficient. However, when families were used, genetic parameter estimates confirmed the possibility of successful selection, especially if evaluated for a few generations and/or environments, despite the strong environmental influence on trait expression.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Xuan Zhou ◽  
S. Hong Lee

AbstractComplementary to the genome, the concept of exposome has been proposed to capture the totality of human environmental exposures. While there has been some recent progress on the construction of the exposome, few tools exist that can integrate the genome and exposome for complex trait analyses. Here we propose a linear mixed model approach to bridge this gap, which jointly models the random effects of the two omics layers on phenotypes of complex traits. We illustrate our approach using traits from the UK Biobank (e.g., BMI and height for N ~ 35,000) with a small fraction of the exposome that comprises 28 lifestyle factors. The joint model of the genome and exposome explains substantially more phenotypic variance and significantly improves phenotypic prediction accuracy, compared to the model based on the genome alone. The additional phenotypic variance captured by the exposome includes its additive effects as well as non-additive effects such as genome–exposome (gxe) and exposome–exposome (exe) interactions. For example, 19% of variation in BMI is explained by additive effects of the genome, while additional 7.2% by additive effects of the exposome, 1.9% by exe interactions and 4.5% by gxe interactions. Correspondingly, the prediction accuracy for BMI, computed using Pearson’s correlation between the observed and predicted phenotypes, improves from 0.15 (based on the genome alone) to 0.35 (based on the genome and exposome). We also show, using established theories, that integrating genomic and exposomic data can be an effective way of attaining a clinically meaningful level of prediction accuracy for disease traits. In conclusion, the genomic and exposomic effects can contribute to phenotypic variation via their latent relationships, i.e. genome-exposome correlation, and gxe and exe interactions, and modelling these effects has a potential to improve phenotypic prediction accuracy and thus holds a great promise for future clinical practice.


Genetics ◽  
2019 ◽  
Vol 211 (4) ◽  
pp. 1131-1141 ◽  
Author(s):  
Naomi R. Wray ◽  
Kathryn E. Kemper ◽  
Benjamin J. Hayes ◽  
Michael E. Goddard ◽  
Peter M. Visscher

2017 ◽  
Author(s):  
Jian Zeng ◽  
Ronald de Vlaming ◽  
Yang Wu ◽  
Matthew R Robinson ◽  
Luke Lloyd-Jones ◽  
...  

AbstractEstimation of the joint distribution of effect size and minor allele frequency (MAF) for genetic variants is important for understanding the genetic basis of complex trait variation and can be used to detect signature of natural selection. We develop a Bayesian mixed linear model that simultaneously estimates SNP-based heritability, polygenicity (i.e. the proportion of SNPs with nonzero effects) and the relationship between effect size and MAF for complex traits in conventionally unrelated individuals using genome-wide SNP data. We apply the method to 28 complex traits in the UK Biobank data (N = 126,752), and show that on average across 28 traits, 6% of SNPs have nonzero effects, which in total explain 22% of phenotypic variance. We detect significant (p < 0.05/28 =1.8×10−3) signatures of natural selection for 23 out of 28 traits including reproductive, cardiovascular, and anthropometric traits, as well as educational attainment. We further apply the method to 27,869 gene expression traits (N = 1,748), and identify 30 genes that show significant (p < 2.3×10−6) evidence of natural selection. All the significant estimates of the relationship between effect size and MAF in either complex traits or gene expression traits are consistent with a model of negative selection, as confirmed by forward simulation. We conclude that natural selection acts pervasively on human complex traits shaping genetic variation in the form of negative selection.


2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Miguel Pérez-Enciso ◽  
Laura M. Zingaretti ◽  
Yuliaxis Ramayo-Caldas ◽  
Gustavo de los Campos

2019 ◽  
Vol 68 (1) ◽  
pp. 45-50
Author(s):  
Jun Tanabe ◽  
Ryota Endo ◽  
Satoru Kuroda ◽  
Futoshi Ishiguri ◽  
Tomohiro Narisawa ◽  
...  

Abstract Variance components of tree height (HT) and stem diameter at 1.3 m above the ground (DBH) were investigated for the eight open-pollinated families of Zelkova serrata (Thumb.) Makino planted with three different initial planting spacings in a progeny test site, Chiba, Japan. Parent–offspring correlations were also evaluated by using these families and their mother trees. The smallest values of HT and DBH were observed in the narrowest initial planting spacing (1.10 x 1.10 m) compared to those in medium (1.30 x 1.36 m) and wide (2.00 x 1.80 m) spacings, suggesting that adverse effects of competition with neighboring trees occurred on both height and radial growth. Similar to HT and DBH, the initial planting spacings also affected the genetic parameter estimates: the ratio of family variance component to total phenotypic variance showed the highest value in narrow initial planting spacing for both HT and DBH. Thus, family variance component might include competition effects, leading to biased genetic parameter estimates. In contrast, parent–offspring correlation coefficients showed the highest value in wide initial planting spacing where competition effect might be smaller. Therefore, the growth traits of Z. serrata might be inherited from the parent to the offspring when competition effect was small.


Author(s):  
Xuan Zhou ◽  
S. Hong Lee

AbstractComplementary to the genome, the concept of exposome has been proposed to capture the totality of human environmental exposures. While there has been some recent progress on the construction of the exposome, few tools exist that can integrate the genome and exposome for complex trait analyses. Here we propose a linear mixed model approach to bridge this gap, which jointly models the random effects of the two omics layers on phenotypes of complex traits. We illustrate our approach using traits from the UK Biobank (e.g., BMI & height for N ~ 40,000) with a small fraction of the exposome that comprises 28 lifestyle factors. The joint model of the genome and exposome explains substantially more phenotypic variance and significantly improves phenotypic prediction accuracy, compared to the model based on the genome alone. The additional phenotypic variance captured by the exposome includes its additive effects as well as non-additive effects such as genome-exposome (gxe) and exposome-exposome (exe) interactions. For example, 19% of variation in BMI is explained by additive effects of the genome, while additional 7.2% by additive effects of the exposome, 1.9% by exe interactions and 4.5% by gxe interactions. Correspondingly, the prediction accuracy for BMI, computed using Pearson’s correlation between the observed and predicted phenotypes, improves from 0.15 (based on the genome alone) to 0.35 (based on the genome & exposome). We also show, using established theories, integrating genomic and exposomic data is essential to attaining a clinically meaningful level of prediction accuracy for disease traits. In conclusion, the genomic and exposomic effects can contribute to phenotypic variation via their latent relationships, i.e. genome-exposome correlation, and gxe and exe interactions, and modelling these effects has a great potential to improve phenotypic prediction accuracy and thus holds a great promise for future clinical practice.


PLoS ONE ◽  
2015 ◽  
Vol 10 (10) ◽  
pp. e0138903 ◽  
Author(s):  
David C. Haws ◽  
Irina Rish ◽  
Simon Teyssedre ◽  
Dan He ◽  
Aurelie C. Lozano ◽  
...  

2020 ◽  
Vol 10 (12) ◽  
pp. 4599-4613
Author(s):  
Fabio Morgante ◽  
Wen Huang ◽  
Peter Sørensen ◽  
Christian Maltecca ◽  
Trudy F. C. Mackay

The ability to accurately predict complex trait phenotypes from genetic and genomic data are critical for the implementation of personalized medicine and precision agriculture; however, prediction accuracy for most complex traits is currently low. Here, we used data on whole genome sequences, deep RNA sequencing, and high quality phenotypes for three quantitative traits in the ∼200 inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP) to compare the prediction accuracies of gene expression and genotypes for three complex traits. We found that expression levels (r = 0.28 and 0.38, for females and males, respectively) provided higher prediction accuracy than genotypes (r = 0.07 and 0.15, for females and males, respectively) for starvation resistance, similar prediction accuracy for chill coma recovery (null for both models and sexes), and lower prediction accuracy for startle response (r = 0.15 and 0.14 for female and male genotypes, respectively; and r = 0.12 and 0.11, for females and male transcripts, respectively). Models including both genotype and expression levels did not outperform the best single component model. However, accuracy increased considerably for all the three traits when we included gene ontology (GO) category as an additional layer of information for both genomic variants and transcripts. We found strongly predictive GO terms for each of the three traits, some of which had a clear plausible biological interpretation. For example, for starvation resistance in females, GO:0033500 (r = 0.39 for transcripts) and GO:0032870 (r = 0.40 for transcripts), have been implicated in carbohydrate homeostasis and cellular response to hormone stimulus (including the insulin receptor signaling pathway), respectively. In summary, this study shows that integrating different sources of information improved prediction accuracy and helped elucidate the genetic architecture of three Drosophila complex phenotypes.


Sign in / Sign up

Export Citation Format

Share Document