scholarly journals Many options, few solutions: over 60 million years snakes converged on a few optimal venom formulations

2018 ◽  
Author(s):  
Agneesh Barua ◽  
Alexander S. Mikheyev

AbstractGene expression changes contribute to complex trait variations in both individuals and populations. However, how gene expression influences changes of complex traits over macroevolutionary timescales remains poorly understood. Being comprised of proteinaceous cocktails, snake venoms are unique in that the expression of each toxin can be quantified and mapped to a distinct genomic locus and traced for millions of years. Using a phylogenetic generalized linear mixed model, we analysed expression data of toxin genes from 52 snake species spanning the three venomous snake families, and estimated phylogenetic covariance, which acts as a measure of evolutionary constraint. We find that evolution of toxin combinations is not constrained. However, while all combinations are in principle possible, the actual dimensionality of phylomorphic space is low, with envenomation strategies focused around only four major toxins: metalloproteases, three-finger toxins, serine proteases, and phospholipases A2. While most extant snakes prioritize either a single or a combination of major toxins, they are repeatedly recruited and lost. We find that over macroevolutionary timescales the venom phenotypes were not shaped by phylogenetic constraints, which include important microevolutionary constraints such as epistasis and pleiotropy, but more likely by ecological filtering that permits a few optimal solutions. As a result, phenotypic optima were repeatedly attained by distantly related species. These results indicate that venoms evolve by selection on biochemistry of prey envenomation, which permit diversity though parallelism and impose strong limits, since only a few of the theoretically possible strategies seem to work well and are observed in extant snakes.


2019 ◽  
Vol 36 (9) ◽  
pp. 1964-1974 ◽  
Author(s):  
Agneesh Barua ◽  
Alexander S Mikheyev

Abstract Gene expression changes contribute to complex trait variations in both individuals and populations. However, the evolution of gene expression underlying complex traits over macroevolutionary timescales remains poorly understood. Snake venoms are proteinaceous cocktails where the expression of each toxin can be quantified and mapped to a distinct genomic locus and traced for millions of years. Using a phylogenetic generalized linear mixed model, we analyzed expression data of toxin genes from 52 snake species spanning the 3 venomous snake families and estimated phylogenetic covariance, which acts as a measure of evolutionary constraint. We find that evolution of toxin combinations is not constrained. However, although all combinations are in principle possible, the actual dimensionality of phylomorphic space is low, with envenomation strategies focused around only four major toxin families: metalloproteases, three-finger toxins, serine proteases, and phospholipases A2. Although most extant snakes prioritize either a single or a combination of major toxin families, they are repeatedly recruited and lost. We find that over macroevolutionary timescales, the venom phenotypes were not shaped by phylogenetic constraints, which include important microevolutionary constraints such as epistasis and pleiotropy, but more likely by ecological filtering that permits a small number of optimal solutions. As a result, phenotypic optima were repeatedly attained by distantly related species. These results indicate that venoms evolve by selection on biochemistry of prey envenomation, which permit diversity through parallelism, and impose strong limits, since only a few of the theoretically possible strategies seem to work well and are observed in extant snakes.



2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Xuan Zhou ◽  
S. Hong Lee

AbstractComplementary to the genome, the concept of exposome has been proposed to capture the totality of human environmental exposures. While there has been some recent progress on the construction of the exposome, few tools exist that can integrate the genome and exposome for complex trait analyses. Here we propose a linear mixed model approach to bridge this gap, which jointly models the random effects of the two omics layers on phenotypes of complex traits. We illustrate our approach using traits from the UK Biobank (e.g., BMI and height for N ~ 35,000) with a small fraction of the exposome that comprises 28 lifestyle factors. The joint model of the genome and exposome explains substantially more phenotypic variance and significantly improves phenotypic prediction accuracy, compared to the model based on the genome alone. The additional phenotypic variance captured by the exposome includes its additive effects as well as non-additive effects such as genome–exposome (gxe) and exposome–exposome (exe) interactions. For example, 19% of variation in BMI is explained by additive effects of the genome, while additional 7.2% by additive effects of the exposome, 1.9% by exe interactions and 4.5% by gxe interactions. Correspondingly, the prediction accuracy for BMI, computed using Pearson’s correlation between the observed and predicted phenotypes, improves from 0.15 (based on the genome alone) to 0.35 (based on the genome and exposome). We also show, using established theories, that integrating genomic and exposomic data can be an effective way of attaining a clinically meaningful level of prediction accuracy for disease traits. In conclusion, the genomic and exposomic effects can contribute to phenotypic variation via their latent relationships, i.e. genome-exposome correlation, and gxe and exe interactions, and modelling these effects has a potential to improve phenotypic prediction accuracy and thus holds a great promise for future clinical practice.



Author(s):  
Xuan Zhou ◽  
S. Hong Lee

AbstractComplementary to the genome, the concept of exposome has been proposed to capture the totality of human environmental exposures. While there has been some recent progress on the construction of the exposome, few tools exist that can integrate the genome and exposome for complex trait analyses. Here we propose a linear mixed model approach to bridge this gap, which jointly models the random effects of the two omics layers on phenotypes of complex traits. We illustrate our approach using traits from the UK Biobank (e.g., BMI & height for N ~ 40,000) with a small fraction of the exposome that comprises 28 lifestyle factors. The joint model of the genome and exposome explains substantially more phenotypic variance and significantly improves phenotypic prediction accuracy, compared to the model based on the genome alone. The additional phenotypic variance captured by the exposome includes its additive effects as well as non-additive effects such as genome-exposome (gxe) and exposome-exposome (exe) interactions. For example, 19% of variation in BMI is explained by additive effects of the genome, while additional 7.2% by additive effects of the exposome, 1.9% by exe interactions and 4.5% by gxe interactions. Correspondingly, the prediction accuracy for BMI, computed using Pearson’s correlation between the observed and predicted phenotypes, improves from 0.15 (based on the genome alone) to 0.35 (based on the genome & exposome). We also show, using established theories, integrating genomic and exposomic data is essential to attaining a clinically meaningful level of prediction accuracy for disease traits. In conclusion, the genomic and exposomic effects can contribute to phenotypic variation via their latent relationships, i.e. genome-exposome correlation, and gxe and exe interactions, and modelling these effects has a great potential to improve phenotypic prediction accuracy and thus holds a great promise for future clinical practice.



Author(s):  
Brenen M Wynd ◽  
Josef C Uyeda ◽  
Sterling J Nesbitt

Abstract Allometry—patterns of relative change in body parts—is a staple for examining how clades exhibit scaling patterns representative of evolutionary constraint on phenotype, or quantifying patterns of ontogenetic growth within a species. Reconstructing allometries from ontogenetic series is one of the few methods available to reconstruct growth in fossil specimens. However, many fossil specimens are deformed (twisted, flattened, displaced bones) during fossilization, changing their original morphology in unpredictable and sometimes undecipherable ways. To mitigate against post burial changes, paleontologists typically remove clearly distorted measurements from analyses. However, this can potentially remove evidence of individual variation and limits the number of samples amenable to study, which can negatively impact allometric reconstructions. Ordinary least squares regression (OLS) and major axis regression are common methods for estimating allometry, but they assume constant levels of residual variation across specimens, which is unlikely to be true when including both distorted and undistorted specimens. Alternatively, a generalized linear mixed model (GLMM) can attribute additional variation in a model (e.g., fixed or random effects). We performed a simulation study based on a empirical analysis of the extinct cynodont, Exaeretodon argentinus, to test the efficacy of a GLMM on allometric data. We found that GLMMs estimate the allometry using a full dataset better than simply using only non-distorted data. We apply our approach on two empirical datasets, cranial measurements of actual specimens of E. argentinus (n = 16) and femoral measurements of the dinosaur Tawa hallae (n = 26). Taken together, our study suggests that a GLMM is better able to reconstruct patterns of allometry over an OLS in datasets comprised of extinct forms and should be standard protocol for anyone using distorted specimens.



2019 ◽  
Author(s):  
Jan A. Freudenthal ◽  
Markus J. Ankenbrand ◽  
Dominik G. Grimm ◽  
Arthur Korte

AbstractMotivationGenome-wide association studies (GWAS) are one of the most commonly used methods to detect associations between complex traits and genomic polymorphisms. As both genotyping and phenotyping of large populations has become easier, typical modern GWAS have to cope with massive amounts of data. Thus, the computational demand for these analyses grew remarkably during the last decades. This is especially true, if one wants to implement permutation-based significance thresholds, instead of using the naïve Bonferroni threshold. Permutation-based methods have the advantage to provide an adjusted multiple hypothesis correction threshold that takes the underlying phenotypic distribution into account and will thus remove the need to find the correct transformation for non Gaussian phenotypes. To enable efficient analyses of large datasets and the possibility to compute permutation-based significance thresholds, we used the machine learning framework TensorFlow to develop a linear mixed model (GWAS-Flow) that can make use of the available CPU or GPU infrastructure to decrease the time of the analyses especially for large datasets.ResultsWe were able to show that our application GWAS-Flow outperforms custom GWAS scripts in terms of speed without loosing accuracy. Apart from p-values, GWAS-Flow also computes summary statistics, such as the effect size and its standard error for each individual marker. The CPU-based version is the default choice for small data, while the GPU-based version of GWAS-Flow is especially suited for the analyses of big data.AvailabilityGWAS-Flow is freely available on GitHub (https://github.com/Joyvalley/GWAS_Flow) and is released under the terms of the MIT-License.



2006 ◽  
Vol 18 (2) ◽  
pp. 239
Author(s):  
J. Piedrahita ◽  
S. Bischoff ◽  
J. Estrada ◽  
B. Freking ◽  
D. Nonneman ◽  
...  

Genomic imprinting arises from differential epigenetic markings including DNA methylation and histone modifications and results in one allele being expressed in a parent-of-origin specific manner. For further insight into the porcine epigenome, gene expression profiles of parthenogenetic (PRT; two maternally derived chromosome sets) and biparental embryos (BP; one maternal and one paternal set of chromosomes) were compared using microarrays. Comparison of the expression profiles of the two tissue types permits identification of both maternally and paternally imprinted genes and thus the degree of conservation of imprinted genes between swine and other mammalian species. Diploid porcine parthenogenetic fetuses were generated using follicular oocytes (BOMED, Madison, WI, USA). Oocytes with a visible polar body were activated using a single square pulse of direct current of 50 V/mm for 100 �s and diploidized by culture in 10 �g/mL cycloheximide for 6 h to limit extrusion of the second polar body. Following culture, BP embryos obtained by natural matings, and PRT embryos, were surgically transferred to oviducts on the first day of estrus. Fetuses recovered at 28-30 days of gestation were dissected to separate viscera including brain, liver, and placenta; the visceral tissues were then flash-frozen in liquid nitrogen. Porcine fibroblast tissue was obtained from the remaining carcass by mincing, trypsinization, and plating cells in �-MEM. Total RNA was extracted from frozen tissue or cell culture using RNA Aqueous kit (Ambion, Austin, TX, USA) according to the manufacturer's protocol. Gene expression differences between BP and PRT tissues were determined using the GeneChip� Porcine Genome Array (Affymetrix, Santa Clara, CA) containing 23 256 transcripts from Sus scrofa and representing 42 genes known to be imprinted in human and/or mice. Triplicate arrays were utilized for each tissue type, and for PRT versus BP combination. Significant differential gene expression was identified by a linear mixed model analysis using SAS 5.0 (SAS Institute, Cary, NC, USA). Storey's q-value method was used to correct for multiple testing at q d 0.05. The following genes were classified as imprinted on the basis of their expression profiles: In fibroblasts, ARHI, HTR2A, MEST, NDN, NNAT, PEG3, PLAGL1, PEG10, SGCE, SNRPN, and UBE3A; in liver, IGF2, PEG3, PLAGL1, PEG10, and SNRPN; in placenta, HTR2A, IGF2, MEST, NDN, NNAT, PEG3, PLAGL1, PEG10, and SNRPN; and in brain, none. Additionally, several genes not known to be imprinted in humans/mice were highly differentially expressed between the two tissue types. Overall, utilizing the PRT models and gene expression profiles, we have identified thirteen genes where imprinting is conserved between swine and humans/mice, and several candidate genes that represent potentially imprinted genes. Presently, our efforts are focused in the identification of single nucleotide polymorphisms (SNPs) to more carefully evaluate the behavior of these genes in normal and abnormal gestations and to test whether the candidate genes are indeed imprinted. This research was supported by USDA-CSREES grant 524383 to J. P. and B. F.



2019 ◽  
Vol 13 ◽  
pp. 117793221988143 ◽  
Author(s):  
Kar-Fu Yeung ◽  
Yi Yang ◽  
Can Yang ◽  
Jin Liu

Genome-wide association study (GWAS) analyses have identified thousands of associations between genetic variants and complex traits. However, it is still a challenge to uncover the mechanisms underlying the association. With the growing availability of transcriptome data sets, it has become possible to perform statistical analyses targeted at identifying influential genes whose expression levels correlate with the phenotype. Methods such as PrediXcan and transcriptome-wide association study (TWAS) use the transcriptome data set to fit a predictive model for gene expression, with genetic variants as covariates. The gene expression levels for the GWAS data set are then ‘imputed’ using the prediction model, and the imputed expression levels are tested for their association with the phenotype. These methods fail to account for the uncertainty in the GWAS imputation step, and we propose a collaborative mixed model (CoMM) that addresses this limitation by jointly modelling the multiple analysis steps. We illustrate CoMM’s ability to identify relevant genes in the Northern Finland Birth Cohort 1966 data set and extend the model to handle the more widely available GWAS summary statistics.



Author(s):  
Yang Hai ◽  
Yalu Wen

Abstract Motivation Accurate disease risk prediction is essential for precision medicine. Existing models either assume that diseases are caused by groups of predictors with small-to-moderate effects or a few isolated predictors with large effects. Their performance can be sensitive to the underlying disease mechanisms, which are usually unknown in advance. Results We developed a Bayesian linear mixed model (BLMM), where genetic effects were modelled using a hybrid of the sparsity regression and linear mixed model with multiple random effects. The parameters in BLMM were inferred through a computationally efficient variational Bayes algorithm. The proposed method can resemble the shape of the true effect size distributions, captures the predictive effects from both common and rare variants, and is robust against various disease models. Through extensive simulations and the application to a whole-genome sequencing dataset obtained from the Alzheimer’s Disease Neuroimaging Initiatives, we have demonstrated that BLMM has better prediction performance than existing methods and can detect variables and/or genetic regions that are predictive. Availability The R-package is available at https://github.com/yhai943/BLMM Supplementary information Supplementary data are available at Bioinformatics online.



Genetics ◽  
2019 ◽  
Vol 212 (3) ◽  
pp. 919-929
Author(s):  
Daniel A. Skelly ◽  
Narayanan Raghupathy ◽  
Raymond F. Robledo ◽  
Joel H. Graber ◽  
Elissa J. Chesler

Systems genetic analysis of complex traits involves the integrated analysis of genetic, genomic, and disease-related measures. However, these data are often collected separately across multiple study populations, rendering direct correlation of molecular features to complex traits impossible. Recent transcriptome-wide association studies (TWAS) have harnessed gene expression quantitative trait loci (eQTL) to associate unmeasured gene expression with a complex trait in genotyped individuals, but this approach relies primarily on strong eQTL. We propose a simple and powerful alternative strategy for correlating independently obtained sets of complex traits and molecular features. In contrast to TWAS, our approach gains precision by correlating complex traits through a common set of continuous phenotypes instead of genetic predictors, and can identify transcript–trait correlations for which the regulation is not genetic. In our approach, a set of multiple quantitative “reference” traits is measured across all individuals, while measures of the complex trait of interest and transcriptional profiles are obtained in disjoint subsamples. A conventional multivariate statistical method, canonical correlation analysis, is used to relate the reference traits and traits of interest to identify gene expression correlates. We evaluate power and sample size requirements of this methodology, as well as performance relative to other methods, via extensive simulation and analysis of a behavioral genetics experiment in 258 Diversity Outbred mice involving two independent sets of anxiety-related behaviors and hippocampal gene expression. After splitting the data set and hiding one set of anxiety-related traits in half the samples, we identified transcripts correlated with the hidden traits using the other set of anxiety-related traits and exploiting the highest canonical correlation (R = 0.69) between the trait data sets. We demonstrate that this approach outperforms TWAS in identifying associated transcripts. Together, these results demonstrate the validity, reliability, and power of reference trait analysis for identifying relations between complex traits and their molecular substrates.



2019 ◽  
Author(s):  
Yuhua Zhang ◽  
Corbin Quick ◽  
Ketian Yu ◽  
Alvaro Barbeira ◽  
Francesca Luca ◽  
...  

AbstractTranscriptome-wide association studies (TWAS), an integrative framework using expression quantitative trait loci (eQTLs) to construct proxies for gene expression, have emerged as a promising method to investigate the biological mechanisms underlying associations between genotypes and complex traits. However, challenges remain in interpreting TWAS results, especially regarding their causality implications. In this paper, we describe a new computational framework, probabilistic TWAS (PTWAS), to detect associations and investigate causal relationships between gene expression and complex traits. We use established concepts and principles from instrumental variables (IV) analysis to delineate and address the unique challenges that arise in TWAS. PTWAS utilizes probabilistic eQTL annotations derived from multi-variant Bayesian fine-mapping analysis conferring higher power to detect TWAS associations than existing methods. Additionally, PTWAS provides novel functionalities to evaluate the causal assumptions and estimate tissue- or cell-type specific causal effects of gene expression on complex traits. These features make PTWAS uniquely suited for in-depth investigations of the biological mechanisms that contribute to complex trait variation. Using eQTL data across 49 tissues from GTEx v8, we apply PTWAS to analyze 114 complex traits using GWAS summary statistics from several large-scale projects, including the UK Biobank. Our analysis reveals an abundance of genes with strong evidence of eQTL-mediated causal effects on complex traits and highlights the heterogeneity and tissue-relevance of these effects across complex traits. We distribute software and eQTL annotations to enable users performing rigorous TWAS analysis by leveraging the full potentials of the latest GTEx multi-tissue eQTL data.



Sign in / Sign up

Export Citation Format

Share Document