Many options, few solutions: over 60 million years snakes converged on a few optimal venom formulations

Mapping Intimacies ◽

10.1101/459073 ◽

2018 ◽

Author(s):

Agneesh Barua ◽

Alexander S. Mikheyev

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Serine Proteases ◽

Mixed Model ◽

Linear Mixed Model ◽

Complex Trait ◽

Evolutionary Constraint ◽

Genomic Locus ◽

Phospholipases A2 ◽

Ecological Filtering

AbstractGene expression changes contribute to complex trait variations in both individuals and populations. However, how gene expression influences changes of complex traits over macroevolutionary timescales remains poorly understood. Being comprised of proteinaceous cocktails, snake venoms are unique in that the expression of each toxin can be quantified and mapped to a distinct genomic locus and traced for millions of years. Using a phylogenetic generalized linear mixed model, we analysed expression data of toxin genes from 52 snake species spanning the three venomous snake families, and estimated phylogenetic covariance, which acts as a measure of evolutionary constraint. We find that evolution of toxin combinations is not constrained. However, while all combinations are in principle possible, the actual dimensionality of phylomorphic space is low, with envenomation strategies focused around only four major toxins: metalloproteases, three-finger toxins, serine proteases, and phospholipases A2. While most extant snakes prioritize either a single or a combination of major toxins, they are repeatedly recruited and lost. We find that over macroevolutionary timescales the venom phenotypes were not shaped by phylogenetic constraints, which include important microevolutionary constraints such as epistasis and pleiotropy, but more likely by ecological filtering that permits a few optimal solutions. As a result, phenotypic optima were repeatedly attained by distantly related species. These results indicate that venoms evolve by selection on biochemistry of prey envenomation, which permit diversity though parallelism and impose strong limits, since only a few of the theoretically possible strategies seem to work well and are observed in extant snakes.

Many Options, Few Solutions: Over 60 My Snakes Converged on a Few Optimal Venom Formulations

Molecular Biology and Evolution ◽

10.1093/molbev/msz125 ◽

2019 ◽

Vol 36 (9) ◽

pp. 1964-1974 ◽

Cited By ~ 12

Author(s):

Agneesh Barua ◽

Alexander S Mikheyev

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Serine Proteases ◽

Mixed Model ◽

Linear Mixed Model ◽

Complex Trait ◽

Evolutionary Constraint ◽

Genomic Locus ◽

Phospholipases A2 ◽

Ecological Filtering

Abstract Gene expression changes contribute to complex trait variations in both individuals and populations. However, the evolution of gene expression underlying complex traits over macroevolutionary timescales remains poorly understood. Snake venoms are proteinaceous cocktails where the expression of each toxin can be quantified and mapped to a distinct genomic locus and traced for millions of years. Using a phylogenetic generalized linear mixed model, we analyzed expression data of toxin genes from 52 snake species spanning the 3 venomous snake families and estimated phylogenetic covariance, which acts as a measure of evolutionary constraint. We find that evolution of toxin combinations is not constrained. However, although all combinations are in principle possible, the actual dimensionality of phylomorphic space is low, with envenomation strategies focused around only four major toxin families: metalloproteases, three-finger toxins, serine proteases, and phospholipases A2. Although most extant snakes prioritize either a single or a combination of major toxin families, they are repeatedly recruited and lost. We find that over macroevolutionary timescales, the venom phenotypes were not shaped by phylogenetic constraints, which include important microevolutionary constraints such as epistasis and pleiotropy, but more likely by ecological filtering that permits a small number of optimal solutions. As a result, phenotypic optima were repeatedly attained by distantly related species. These results indicate that venoms evolve by selection on biochemistry of prey envenomation, which permit diversity through parallelism, and impose strong limits, since only a few of the theoretically possible strategies seem to work well and are observed in extant snakes.

An integrative analysis of genomic and exposomic data for complex traits and phenotypic prediction

Scientific Reports ◽

10.1038/s41598-021-00427-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Xuan Zhou ◽

S. Hong Lee

Keyword(s):

Complex Traits ◽

Prediction Accuracy ◽

Mixed Model ◽

Linear Mixed Model ◽

Complex Trait ◽

Great Promise ◽

Phenotypic Variance ◽

Additive Effects ◽

Mixed Model Approach ◽

The Uk

AbstractComplementary to the genome, the concept of exposome has been proposed to capture the totality of human environmental exposures. While there has been some recent progress on the construction of the exposome, few tools exist that can integrate the genome and exposome for complex trait analyses. Here we propose a linear mixed model approach to bridge this gap, which jointly models the random effects of the two omics layers on phenotypes of complex traits. We illustrate our approach using traits from the UK Biobank (e.g., BMI and height for N ~ 35,000) with a small fraction of the exposome that comprises 28 lifestyle factors. The joint model of the genome and exposome explains substantially more phenotypic variance and significantly improves phenotypic prediction accuracy, compared to the model based on the genome alone. The additional phenotypic variance captured by the exposome includes its additive effects as well as non-additive effects such as genome–exposome (gxe) and exposome–exposome (exe) interactions. For example, 19% of variation in BMI is explained by additive effects of the genome, while additional 7.2% by additive effects of the exposome, 1.9% by exe interactions and 4.5% by gxe interactions. Correspondingly, the prediction accuracy for BMI, computed using Pearson’s correlation between the observed and predicted phenotypes, improves from 0.15 (based on the genome alone) to 0.35 (based on the genome and exposome). We also show, using established theories, that integrating genomic and exposomic data can be an effective way of attaining a clinically meaningful level of prediction accuracy for disease traits. In conclusion, the genomic and exposomic effects can contribute to phenotypic variation via their latent relationships, i.e. genome-exposome correlation, and gxe and exe interactions, and modelling these effects has a potential to improve phenotypic prediction accuracy and thus holds a great promise for future clinical practice.

An integrative analysis of genomic and exposomic data for complex traits and phenotypic prediction

10.1101/2020.11.09.373704 ◽

2020 ◽

Cited By ~ 1

Author(s):

Xuan Zhou ◽

S. Hong Lee

Keyword(s):

Complex Traits ◽

Prediction Accuracy ◽

Mixed Model ◽

Linear Mixed Model ◽

Complex Trait ◽

Great Promise ◽

Phenotypic Variance ◽

Additive Effects ◽

Mixed Model Approach ◽

The Uk

AbstractComplementary to the genome, the concept of exposome has been proposed to capture the totality of human environmental exposures. While there has been some recent progress on the construction of the exposome, few tools exist that can integrate the genome and exposome for complex trait analyses. Here we propose a linear mixed model approach to bridge this gap, which jointly models the random effects of the two omics layers on phenotypes of complex traits. We illustrate our approach using traits from the UK Biobank (e.g., BMI & height for N ~ 40,000) with a small fraction of the exposome that comprises 28 lifestyle factors. The joint model of the genome and exposome explains substantially more phenotypic variance and significantly improves phenotypic prediction accuracy, compared to the model based on the genome alone. The additional phenotypic variance captured by the exposome includes its additive effects as well as non-additive effects such as genome-exposome (gxe) and exposome-exposome (exe) interactions. For example, 19% of variation in BMI is explained by additive effects of the genome, while additional 7.2% by additive effects of the exposome, 1.9% by exe interactions and 4.5% by gxe interactions. Correspondingly, the prediction accuracy for BMI, computed using Pearson’s correlation between the observed and predicted phenotypes, improves from 0.15 (based on the genome alone) to 0.35 (based on the genome & exposome). We also show, using established theories, integrating genomic and exposomic data is essential to attaining a clinically meaningful level of prediction accuracy for disease traits. In conclusion, the genomic and exposomic effects can contribute to phenotypic variation via their latent relationships, i.e. genome-exposome correlation, and gxe and exe interactions, and modelling these effects has a great potential to improve phenotypic prediction accuracy and thus holds a great promise for future clinical practice.

Including distorted specimens in allometric studies: linear mixed models account for deformation

Integrative Organismal Biology ◽

10.1093/iob/obab017 ◽

2021 ◽

Author(s):

Brenen M Wynd ◽

Josef C Uyeda ◽

Sterling J Nesbitt

Keyword(s):

Mixed Model ◽

Linear Mixed Model ◽

Major Axis ◽

Ordinary Least Squares ◽

Evolutionary Constraint ◽

Body Parts ◽

Least Squares Regression ◽

Full Dataset ◽

Residual Variation ◽

Cranial Measurements

Abstract Allometry—patterns of relative change in body parts—is a staple for examining how clades exhibit scaling patterns representative of evolutionary constraint on phenotype, or quantifying patterns of ontogenetic growth within a species. Reconstructing allometries from ontogenetic series is one of the few methods available to reconstruct growth in fossil specimens. However, many fossil specimens are deformed (twisted, flattened, displaced bones) during fossilization, changing their original morphology in unpredictable and sometimes undecipherable ways. To mitigate against post burial changes, paleontologists typically remove clearly distorted measurements from analyses. However, this can potentially remove evidence of individual variation and limits the number of samples amenable to study, which can negatively impact allometric reconstructions. Ordinary least squares regression (OLS) and major axis regression are common methods for estimating allometry, but they assume constant levels of residual variation across specimens, which is unlikely to be true when including both distorted and undistorted specimens. Alternatively, a generalized linear mixed model (GLMM) can attribute additional variation in a model (e.g., fixed or random effects). We performed a simulation study based on a empirical analysis of the extinct cynodont, Exaeretodon argentinus, to test the efficacy of a GLMM on allometric data. We found that GLMMs estimate the allometry using a full dataset better than simply using only non-distorted data. We apply our approach on two empirical datasets, cranial measurements of actual specimens of E. argentinus (n = 16) and femoral measurements of the dinosaur Tawa hallae (n = 26). Taken together, our study suggests that a GLMM is better able to reconstruct patterns of allometry over an OLS in datasets comprised of extinct forms and should be standard protocol for anyone using distorted specimens.

GWAS-Flow: A GPU accelerated framework for efficient permutation based genome-wide association studies

10.1101/783100 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jan A. Freudenthal ◽

Markus J. Ankenbrand ◽

Dominik G. Grimm ◽

Arthur Korte

Keyword(s):

Complex Traits ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Large Datasets ◽

Genome Wide Association ◽

Small Data ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Non Gaussian

AbstractMotivationGenome-wide association studies (GWAS) are one of the most commonly used methods to detect associations between complex traits and genomic polymorphisms. As both genotyping and phenotyping of large populations has become easier, typical modern GWAS have to cope with massive amounts of data. Thus, the computational demand for these analyses grew remarkably during the last decades. This is especially true, if one wants to implement permutation-based significance thresholds, instead of using the naïve Bonferroni threshold. Permutation-based methods have the advantage to provide an adjusted multiple hypothesis correction threshold that takes the underlying phenotypic distribution into account and will thus remove the need to find the correct transformation for non Gaussian phenotypes. To enable efficient analyses of large datasets and the possibility to compute permutation-based significance thresholds, we used the machine learning framework TensorFlow to develop a linear mixed model (GWAS-Flow) that can make use of the available CPU or GPU infrastructure to decrease the time of the analyses especially for large datasets.ResultsWe were able to show that our application GWAS-Flow outperforms custom GWAS scripts in terms of speed without loosing accuracy. Apart from p-values, GWAS-Flow also computes summary statistics, such as the effect size and its standard error for each individual marker. The CPU-based version is the default choice for small data, while the GPU-based version of GWAS-Flow is especially suited for the analyses of big data.AvailabilityGWAS-Flow is freely available on GitHub (https://github.com/Joyvalley/GWAS_Flow) and is released under the terms of the MIT-License.

263 USE OF PORCINE PARTHENOTES AND GENE EXPRESSION PROFILING USING MICROARRAYS FOR IDENTIFICATION OF IMPRINTED GENES

Reproduction Fertility and Development ◽

10.1071/rdv18n2ab263 ◽

2006 ◽

Vol 18 (2) ◽

pp. 239

Author(s):

J. Piedrahita ◽

S. Bischoff ◽

J. Estrada ◽

B. Freking ◽

D. Nonneman ◽

...

Keyword(s):

Gene Expression ◽

Candidate Genes ◽

Mixed Model ◽

Linear Mixed Model ◽

Expression Profiles ◽

Mammalian Species ◽

Polar Body ◽

Gene Expression Profiles ◽

Tissue Type ◽

Imprinted Genes

Genomic imprinting arises from differential epigenetic markings including DNA methylation and histone modifications and results in one allele being expressed in a parent-of-origin specific manner. For further insight into the porcine epigenome, gene expression profiles of parthenogenetic (PRT; two maternally derived chromosome sets) and biparental embryos (BP; one maternal and one paternal set of chromosomes) were compared using microarrays. Comparison of the expression profiles of the two tissue types permits identification of both maternally and paternally imprinted genes and thus the degree of conservation of imprinted genes between swine and other mammalian species. Diploid porcine parthenogenetic fetuses were generated using follicular oocytes (BOMED, Madison, WI, USA). Oocytes with a visible polar body were activated using a single square pulse of direct current of 50 V/mm for 100 �s and diploidized by culture in 10 �g/mL cycloheximide for 6 h to limit extrusion of the second polar body. Following culture, BP embryos obtained by natural matings, and PRT embryos, were surgically transferred to oviducts on the first day of estrus. Fetuses recovered at 28-30 days of gestation were dissected to separate viscera including brain, liver, and placenta; the visceral tissues were then flash-frozen in liquid nitrogen. Porcine fibroblast tissue was obtained from the remaining carcass by mincing, trypsinization, and plating cells in �-MEM. Total RNA was extracted from frozen tissue or cell culture using RNA Aqueous kit (Ambion, Austin, TX, USA) according to the manufacturer's protocol. Gene expression differences between BP and PRT tissues were determined using the GeneChip� Porcine Genome Array (Affymetrix, Santa Clara, CA) containing 23 256 transcripts from Sus scrofa and representing 42 genes known to be imprinted in human and/or mice. Triplicate arrays were utilized for each tissue type, and for PRT versus BP combination. Significant differential gene expression was identified by a linear mixed model analysis using SAS 5.0 (SAS Institute, Cary, NC, USA). Storey's q-value method was used to correct for multiple testing at q d 0.05. The following genes were classified as imprinted on the basis of their expression profiles: In fibroblasts, ARHI, HTR2A, MEST, NDN, NNAT, PEG3, PLAGL1, PEG10, SGCE, SNRPN, and UBE3A; in liver, IGF2, PEG3, PLAGL1, PEG10, and SNRPN; in placenta, HTR2A, IGF2, MEST, NDN, NNAT, PEG3, PLAGL1, PEG10, and SNRPN; and in brain, none. Additionally, several genes not known to be imprinted in humans/mice were highly differentially expressed between the two tissue types. Overall, utilizing the PRT models and gene expression profiles, we have identified thirteen genes where imprinting is conserved between swine and humans/mice, and several candidate genes that represent potentially imprinted genes. Presently, our efforts are focused in the identification of single nucleotide polymorphisms (SNPs) to more carefully evaluate the behavior of these genes in normal and abnormal gestations and to test whether the candidate genes are indeed imprinted. This research was supported by USDA-CSREES grant 524383 to J. P. and B. F.

CoMM: A Collaborative Mixed Model That Integrates GWAS and eQTL Data Sets to Investigate the Genetic Architecture of Complex Traits

Bioinformatics and Biology Insights ◽

10.1177/1177932219881435 ◽

2019 ◽

Vol 13 ◽

pp. 117793221988143 ◽

Cited By ~ 1

Author(s):

Kar-Fu Yeung ◽

Yi Yang ◽

Can Yang ◽

Jin Liu

Keyword(s):

Gene Expression ◽

Association Study ◽

Genetic Variants ◽

Complex Traits ◽

Mixed Model ◽

Genome Wide Association Study ◽

Data Sets ◽

Transcriptome Data ◽

Data Set ◽

Expression Levels

Genome-wide association study (GWAS) analyses have identified thousands of associations between genetic variants and complex traits. However, it is still a challenge to uncover the mechanisms underlying the association. With the growing availability of transcriptome data sets, it has become possible to perform statistical analyses targeted at identifying influential genes whose expression levels correlate with the phenotype. Methods such as PrediXcan and transcriptome-wide association study (TWAS) use the transcriptome data set to fit a predictive model for gene expression, with genetic variants as covariates. The gene expression levels for the GWAS data set are then ‘imputed’ using the prediction model, and the imputed expression levels are tested for their association with the phenotype. These methods fail to account for the uncertainty in the GWAS imputation step, and we propose a collaborative mixed model (CoMM) that addresses this limitation by jointly modelling the multiple analysis steps. We illustrate CoMM’s ability to identify relevant genes in the Northern Finland Birth Cohort 1966 data set and extend the model to handle the more widely available GWAS summary statistics.

A Bayesian linear mixed model for prediction of complex traits

Bioinformatics ◽

10.1093/bioinformatics/btaa1023 ◽

2020 ◽

Author(s):

Yang Hai ◽

Yalu Wen

Keyword(s):

Complex Traits ◽

Mixed Model ◽

Linear Mixed Model ◽

Rare Variants ◽

Disease Risk ◽

R Package ◽

Underlying Disease ◽

Supplementary Information ◽

True Effect Size ◽

Bayes Algorithm

Abstract Motivation Accurate disease risk prediction is essential for precision medicine. Existing models either assume that diseases are caused by groups of predictors with small-to-moderate effects or a few isolated predictors with large effects. Their performance can be sensitive to the underlying disease mechanisms, which are usually unknown in advance. Results We developed a Bayesian linear mixed model (BLMM), where genetic effects were modelled using a hybrid of the sparsity regression and linear mixed model with multiple random effects. The parameters in BLMM were inferred through a computationally efficient variational Bayes algorithm. The proposed method can resemble the shape of the true effect size distributions, captures the predictive effects from both common and rare variants, and is robust against various disease models. Through extensive simulations and the application to a whole-genome sequencing dataset obtained from the Alzheimer’s Disease Neuroimaging Initiatives, we have demonstrated that BLMM has better prediction performance than existing methods and can detect variables and/or genetic regions that are predictive. Availability The R-package is available at https://github.com/yhai943/BLMM Supplementary information Supplementary data are available at Bioinformatics online.

Reference Trait Analysis Reveals Correlations Between Gene Expression and Quantitative Traits in Disjoint Samples

Genetics ◽

10.1534/genetics.118.301865 ◽

2019 ◽

Vol 212 (3) ◽

pp. 919-929

Author(s):

Daniel A. Skelly ◽

Narayanan Raghupathy ◽

Raymond F. Robledo ◽

Joel H. Graber ◽

Elissa J. Chesler

Keyword(s):

Gene Expression ◽

Canonical Correlation ◽

Complex Traits ◽

Behavioral Genetics ◽

Association Studies ◽

Complex Trait ◽

Integrated Analysis ◽

Data Set ◽

Trait Analysis ◽

Molecular Features

Systems genetic analysis of complex traits involves the integrated analysis of genetic, genomic, and disease-related measures. However, these data are often collected separately across multiple study populations, rendering direct correlation of molecular features to complex traits impossible. Recent transcriptome-wide association studies (TWAS) have harnessed gene expression quantitative trait loci (eQTL) to associate unmeasured gene expression with a complex trait in genotyped individuals, but this approach relies primarily on strong eQTL. We propose a simple and powerful alternative strategy for correlating independently obtained sets of complex traits and molecular features. In contrast to TWAS, our approach gains precision by correlating complex traits through a common set of continuous phenotypes instead of genetic predictors, and can identify transcript–trait correlations for which the regulation is not genetic. In our approach, a set of multiple quantitative “reference” traits is measured across all individuals, while measures of the complex trait of interest and transcriptional profiles are obtained in disjoint subsamples. A conventional multivariate statistical method, canonical correlation analysis, is used to relate the reference traits and traits of interest to identify gene expression correlates. We evaluate power and sample size requirements of this methodology, as well as performance relative to other methods, via extensive simulation and analysis of a behavioral genetics experiment in 258 Diversity Outbred mice involving two independent sets of anxiety-related behaviors and hippocampal gene expression. After splitting the data set and hiding one set of anxiety-related traits in half the samples, we identified transcripts correlated with the hidden traits using the other set of anxiety-related traits and exploiting the highest canonical correlation (R = 0.69) between the trait data sets. We demonstrate that this approach outperforms TWAS in identifying associated transcripts. Together, these results demonstrate the validity, reliability, and power of reference trait analysis for identifying relations between complex traits and their molecular substrates.

Investigating tissue-relevant causal molecular mechanisms of complex traits using probabilistic TWAS analysis

10.1101/808295 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yuhua Zhang ◽

Corbin Quick ◽

Ketian Yu ◽

Alvaro Barbeira ◽

Francesca Luca ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Large Scale ◽

Molecular Mechanisms ◽

Association Studies ◽

Complex Trait ◽

Causal Effects ◽

Biological Mechanisms ◽

Integrative Framework ◽

Eqtl Data

AbstractTranscriptome-wide association studies (TWAS), an integrative framework using expression quantitative trait loci (eQTLs) to construct proxies for gene expression, have emerged as a promising method to investigate the biological mechanisms underlying associations between genotypes and complex traits. However, challenges remain in interpreting TWAS results, especially regarding their causality implications. In this paper, we describe a new computational framework, probabilistic TWAS (PTWAS), to detect associations and investigate causal relationships between gene expression and complex traits. We use established concepts and principles from instrumental variables (IV) analysis to delineate and address the unique challenges that arise in TWAS. PTWAS utilizes probabilistic eQTL annotations derived from multi-variant Bayesian fine-mapping analysis conferring higher power to detect TWAS associations than existing methods. Additionally, PTWAS provides novel functionalities to evaluate the causal assumptions and estimate tissue- or cell-type specific causal effects of gene expression on complex traits. These features make PTWAS uniquely suited for in-depth investigations of the biological mechanisms that contribute to complex trait variation. Using eQTL data across 49 tissues from GTEx v8, we apply PTWAS to analyze 114 complex traits using GWAS summary statistics from several large-scale projects, including the UK Biobank. Our analysis reveals an abundance of genes with strong evidence of eQTL-mediated causal effects on complex traits and highlights the heterogeneity and tissue-relevance of these effects across complex traits. We distribute software and eQTL annotations to enable users performing rigorous TWAS analysis by leveraging the full potentials of the latest GTEx multi-tissue eQTL data.