scholarly journals Prediction of gene expression from regulatory sequence composition enhances transcriptome-wide association studies

2021 ◽  
Author(s):  
Federico Marotta ◽  
Reza Mozafari ◽  
Elena Grassi ◽  
Alessandro Lussana ◽  
Elisa Mariella ◽  
...  

Transcriptome-wide association studies (TWAS) can prioritize trait-associated genes by finding correlations between a trait and the genetically regulated component of gene expression. A basic ingredient of a TWAS is a regression model, typically trained in an external reference data set, used to impute the genetically-regulated expression. We devised a model that improves the accuracy of the imputation by using, as predictors, not the genotypes directly but rather the sequence composition of the proximal gene regulatory region, expressed as its profile of affinities for a set of position weight matrices. When trained on 48 tissues from GTEx, the regression model showed improved performance compared with models regressing expression directly on the genotype. We imputed the expression levels in genotyped individuals from the ADNI data set, and used the imputed expression to perform a TWAS. We also developed a method to perform the TWAS based on summary statistics from genome-wide association studies, and applied it to 11 complex traits from the UK Biobank. The greater accuracy in the prediction of gene expression allowed us to report hundreds of new gene-phenotype association candidates.

2019 ◽  
Author(s):  
Tom G Richardson ◽  
Gibran Hemani ◽  
Tom R Gaunt ◽  
Caroline L Relton ◽  
George Davey Smith

AbstractBackgroundDeveloping insight into tissue-specific transcriptional mechanisms can help improve our understanding of how genetic variants exert their effects on complex traits and disease. By applying the principles of Mendelian randomization, we have undertaken a systematic analysis to evaluate transcriptome-wide associations between gene expression across 48 different tissue types and 395 complex traits.ResultsOverall, we identified 100,025 gene-trait associations based on conventional genome-wide corrections (P < 5 × 10−08) that also provided evidence of genetic colocalization. These results indicated that genetic variants which influence gene expression levels in multiple tissues are more likely to influence multiple complex traits. We identified many examples of tissue-specific effects, such as genetically-predicted TPO, NR3C2 and SPATA13 expression only associating with thyroid disease in thyroid tissue. Additionally, FBN2 expression was associated with both cardiovascular and lung function traits, but only when analysed in heart and lung tissue respectively.We also demonstrate that conducting phenome-wide evaluations of our results can help flag adverse on-target side effects for therapeutic intervention, as well as propose drug repositioning opportunities. Moreover, we find that exploring the tissue-dependency of associations identified by genome-wide association studies (GWAS) can help elucidate the causal genes and tissues responsible for effects, as well as uncover putative novel associations.ConclusionsThe atlas of tissue-dependent associations we have constructed should prove extremely valuable to future studies investigating the genetic determinants of complex disease. The follow-up analyses we have performed in this study are merely a guide for future research. Conducting similar evaluations can be undertaken systematically at http://mrcieu.mrsoftware.org/Tissue_MR_atlas/.


Genetics ◽  
2019 ◽  
Vol 212 (3) ◽  
pp. 919-929
Author(s):  
Daniel A. Skelly ◽  
Narayanan Raghupathy ◽  
Raymond F. Robledo ◽  
Joel H. Graber ◽  
Elissa J. Chesler

Systems genetic analysis of complex traits involves the integrated analysis of genetic, genomic, and disease-related measures. However, these data are often collected separately across multiple study populations, rendering direct correlation of molecular features to complex traits impossible. Recent transcriptome-wide association studies (TWAS) have harnessed gene expression quantitative trait loci (eQTL) to associate unmeasured gene expression with a complex trait in genotyped individuals, but this approach relies primarily on strong eQTL. We propose a simple and powerful alternative strategy for correlating independently obtained sets of complex traits and molecular features. In contrast to TWAS, our approach gains precision by correlating complex traits through a common set of continuous phenotypes instead of genetic predictors, and can identify transcript–trait correlations for which the regulation is not genetic. In our approach, a set of multiple quantitative “reference” traits is measured across all individuals, while measures of the complex trait of interest and transcriptional profiles are obtained in disjoint subsamples. A conventional multivariate statistical method, canonical correlation analysis, is used to relate the reference traits and traits of interest to identify gene expression correlates. We evaluate power and sample size requirements of this methodology, as well as performance relative to other methods, via extensive simulation and analysis of a behavioral genetics experiment in 258 Diversity Outbred mice involving two independent sets of anxiety-related behaviors and hippocampal gene expression. After splitting the data set and hiding one set of anxiety-related traits in half the samples, we identified transcripts correlated with the hidden traits using the other set of anxiety-related traits and exploiting the highest canonical correlation (R = 0.69) between the trait data sets. We demonstrate that this approach outperforms TWAS in identifying associated transcripts. Together, these results demonstrate the validity, reliability, and power of reference trait analysis for identifying relations between complex traits and their molecular substrates.


2018 ◽  
Author(s):  
Xuanyao Liu ◽  
Yang I Li ◽  
Jonathan K Pritchard

Early genome-wide association studies (GWAS) led to the surprising discovery that, for typical complex traits, the most significant genetic variants contribute only a small fraction of the estimated heritability. Instead, it has become clear that a huge number of common variants, each with tiny effects, explain most of the heritability. Previously, we argued that these patterns conflict with standard conceptual models, and that new models are needed. Here we provide a formal model in which genetic contributions to complex traits can be partitioned into direct effects from core genes, and indirect effects from peripheral genes acting as trans-regulators. We argue that the central importance of peripheral genes is a direct consequence of the large contribution of trans-acting variation to gene expression variation. In particular, we propose that if the core genes for a trait are co-regulated – as seems likely – then the effects of peripheral variation can be amplified by these co-regulated networks such that nearly all of the genetic variance is driven by peripheral genes. Thus our model proposes a framework for understanding key features of the architecture of complex traits.


2016 ◽  
Author(s):  
Jimmy Z Liu ◽  
Yaniv Erlich ◽  
Joseph K Pickrell

AbstractThe case-control association study is a powerful method for identifying genetic variants that influence disease risk. However, the collection of cases can be time-consuming and expensive; if a disease occurs late in life or is rapidly lethal, it may be more practical to identify family members of cases. Here, we show that replacing cases with their first-degree relatives enables genome-wide association studies by proxy (GWAX). In randomly-ascertained cohorts, this approach enables previously infeasible studies of diseases that are absent (or nearly absent) in the cohort. As an illustration, we performed GWAX of 12 common diseases in 116,196 individuals from the UK Biobank. By combining these results with published GWAS summary statistics in a meta-analysis, we replicated established risk loci and identified 17 newly associated risk loci: four in Alzheimer’s disease, eight in coronary artery disease, and five in type 2 diabetes. In addition to informing disease biology, our results demonstrate the utility of association mapping using family history of disease as a phenotype to be mapped. We anticipate that this approach will prove useful in future genetic studies of complex traits in large population cohorts.


2020 ◽  
Author(s):  
Xi Xia ◽  
Mei Ding ◽  
Jin-feng Xuan ◽  
Jia-xin Xing ◽  
Jun Yao ◽  
...  

Abstract Background The 5-hydroxytryptamine 1B receptor (5-HT1B) plays an essential role in the serotonin (5-HT) system and is widely involved in a variety of brain activities. HTR1B is the gene encoding 5-HT1B. Genome-wide association studies have shown that HTR1B polymorphisms are closely related to multiple mental and behavioral disorders; however, the functional mechanisms underlying these associations are unknown. This study investigated the effect of several HTR1B haplotypes on regulation of gene expression in vitro and the functional sequences in the 5' regulatory region of HTR1B to determine their potential association with mental and behavioral disorders.MethodsSix haplotypes consisting of rs4140535, rs1778258, rs17273700, rs1228814, rs11568817, and rs130058 and several truncated fragments of the 5' regulatory region of HTR1B were transfected into SK-N-SH and HEK-293 cells. The relative fluorescence intensities of the different haplotypes and truncated fragments were detected using a dual-luciferase reporter assay system.Results Compared to the major haplotype T-G-T-C-T-A, the relative fluorescence intensities of haplotypes C-A-T-C-T-A, C-G-T-C-T-A, C-G-C-A-G-T, and C-G-T-A-T-A were significantly lower, and that of haplotype C-G-C-A-G-A was significantly higher. Furthermore, the effects of the rs4140535T allele, the rs17273700C-rs11568817G linkage combination, and the rs1228814A allele made their relative fluorescence intensities significantly higher than their counterparts at each locus. Conversely, the rs1778258A and rs130058T alleles decreased the relative fluorescence intensities. In addition, we found that regions from -1587 to -1371 bp (TSS, +1), -1149 to -894 bp, -39 to +130 bp, +130 to +341 bp, and +341 to +505 bp upregulated gene expression. In contrast, regions -603 to -316 bp and +130 to +341 bp downregulated gene expression. Region +341 to +505 bp played a decisive role in gene transcription.Conclusions HTR1B 5' regulatory region polymorphisms have regulatory effects on gene expression and potential correlate with several pathology and physiology conditions. This study suggests that a crucial sequence for transcription is located in region +341~+505 bp. Regions -1587 to -1371 bp, -1149 to -894 bp, -603 to -316 bp, -39 to +130 bp, and +130 to +341 bp contain functional sequences that can promote or suppress the HTR1B gene expression.


Circulation ◽  
2020 ◽  
Vol 141 (Suppl_1) ◽  
Author(s):  
Anna Miller ◽  
Anlu Chen ◽  
David Buchner ◽  
Scott Williams

The genetic contribution of additive versus non-additive (epistasis) effects in the regulation of hematologic and other complex traits is unclear. Although many variants have been associated with a range of complex traits via genome wide association studies (GWAS), these loci combined in additive models do not account for most of the trait heritability. GWAS-type analyses typically ignore gene-gene interactions, in part because of the difficulty in detecting them in complex multicellular organisms, especially humans. We have previously shown that mouse chromosome substitution strains (CSSs) are a powerful model for detecting epistasis, and that for certain complex traits the relative contribution of epistasis to heritability is as important as additivity. We have now applied the use of these CSSs to identify and map additive and epistatic loci that regulate a range of hematological-related traits and hepatic gene expression levels. A modified backcross was performed with CSS strains carrying the A/J-derived substituted chromosomes 4 and 6 on an otherwise C57BL/6J genetic background. By analyzing the transcriptomes of offspring from this cross, we identified and mapped additive quantitative trait loci (QTLs) that regulated the expression of 770 genes, and epistatic QTLs for 802 genes. Similarly we performed a complete blood analysis of offspring from the cross and identified additive QTLs for platelets and percentage of granulocyte in the blood as well as epistatic QTLs controlling the percentage of lymphocytes in the blood (rs13477644, rs13478739; LOD = 3.4) and red cell distribution width (rs13477864, rs13478802; LOD = 3.7). The variance attributable to the epistatic QTLs was approximately equal to that of the additive QTLs, highlighting the importance of identifying genetic interactions. Of note, even the SNPs associated with the most significant epistatic interactions were undetected in our single loci GWAS-like association analyses, demonstrating the need to specifically test for gene-gene interactions in studies of complex traits. In summary, our studies identified epistatic loci in mice that are important regulators of hematological-related traits and gene expression. Additionally, our studies call attention to the importance of extending single loci GWAS-type analyses to include analyses of gene-gene interactions to improve our ability to identify genetic variants that regulate complex traits.


2021 ◽  
Author(s):  
Roshni A. Patel ◽  
Shaila A. Musharoff ◽  
Jeffrey P. Spence ◽  
Harold Pimentel ◽  
Catherine Tcheandjieu ◽  
...  

Despite the growing number of genome-wide association studies (GWAS) for complex traits, it remains unclear whether effect sizes of causal genetic variants differ between populations. In principle, effect sizes of causal variants could differ between populations due to gene-by-gene or gene-by-environment interactions. However, comparing causal variant effect sizes is challenging: it is difficult to know which variants are causal, and comparisons of variant effect sizes are confounded by differences in linkage disequilibrium (LD) structure between ancestries. Here, we develop a method to assess causal variant effect size differences that overcomes these limitations. Specifically, we leverage the fact that segments of European ancestry shared between European-American and admixed African-American individuals have similar LD structure, allowing for unbiased comparisons of variant effect sizes in European ancestry segments. We apply our method to two types of traits: gene expression and low-density lipoprotein cholesterol (LDL-C). We find that causal variant effect sizes for gene expression are significantly different between European-Americans and African-Americans; for LDL-C, we observe a similar point estimate although this is not significant, likely due to lower statistical power. Cross-population differences in variant effect sizes highlight the role of genetic interactions in trait architecture and will contribute to the poor portability of polygenic scores across populations, reinforcing the importance of conducting GWAS on individuals of diverse ancestries and environments.


2020 ◽  
Author(s):  
Yanyu Liang ◽  
François Aguet ◽  
Alvaro Barbeira ◽  
Kristin Ardlie ◽  
Hae Kyung Im

AbstractGenome-wide association studies (GWAS) have been highly successful in identifying genomic loci associated with complex traits. However, identification of the causal genes that mediate these associations remains challenging, and many approaches integrating transcriptomic data with GWAS have been proposed. However, there currently exist no computationally scalable methods that integrate total and allele-specific gene expression to maximize power to detect genetic effects on gene expression. Here, we describe a unified framework that is scalable to studies with thousands of samples. Using simulations and data from GTEx, we demonstrate an average power gain equivalent to a 29% increase in sample size for genes with sufficient allele-specific read coverage. We provide a suite of freely available tools, mixQTL, mixFine, and mixPred, that apply this framework for mapping of quantitative trait loci, fine-mapping, and prediction.


2018 ◽  
Author(s):  
Karl A. G. Kremling ◽  
Christine H. Diepenbrock ◽  
Michael A. Gore ◽  
Edward S. Buckler ◽  
Nonoy B. Bandillo

AbstractModern improvement of complex traits in agricultural species relies on successful associations of heritable molecular variation with observable phenotypes. Historically, this pursuit has primarily been based on easily measurable genetic markers. The recent advent of new technologies allows assaying and quantifying biological intermediates (hereafter endophenotypes) which are now readily measurable at a large scale across diverse individuals. The potential of using endophenotypes for dissecting traits of interest remains underexplored in plants. The work presented here illustrated the utility of a large-scale (299 genotype and 7 tissue) gene expression resource to dissect traits across multiple levels of biological organization. Using single-tissue- and multi-tissue-based transcriptome-wide association studies (TWAS), we revealed that about half of the functional variation for agronomic and seed quality (carotenoid, tocochromanol) traits is regulatory. Comparing the efficacy of TWAS with genome-wide association studies (GWAS) and an ensemble approach that combines both GWAS and TWAS, we demonstrated that results of TWAS in combination with GWAS increase the power to detect known genes and aid in prioritizing likely causal genes. Using a variance partitioning approach in the independent maize Nested Association Mapping (NAM) population, we also showed that the most strongly associated genes identified by combining GWAS and TWAS explain more heritable variance for a majority of traits, beating the heritability captured by the random genes and the genes identified by GWAS or TWAS alone. This improves not only the ability to link genes to phenotypes, but also highlights the phenotypic consequences of regulatory variation in plants.Author summaryWe examined the ability to associate variability in gene expression directly with terminal phenotypes of interest, as a supplement linking genotype to phenotype. We found that transcriptome-wide association studies (TWAS) are a useful accessory to genome-wide association studies (GWAS). In a combined test with GWAS results, TWAS improves the capacity to re-detect genes known to underlie quantitative trait loci for kernel and agronomic phenotypes. This improves not only the capacity to link genes to phenotypes, but also illustrates the widespread importance of regulation for phenotype.


2017 ◽  
Author(s):  
Jeremy J. Berg ◽  
Xinjun Zhang ◽  
Graham Coop

AbstractOur understanding of the genetic basis of human adaptation is biased toward loci of large pheno-typic effect. Genome wide association studies (GWAS) now enable the study of genetic adaptation in polygenic phenotypes. We test for polygenic adaptation among 187 world-wide human populations using polygenic scores constructed from GWAS of 34 complex traits. We identify signals of polygenic adaptation for anthropometric traits including height, infant head circumference (IHC), hip circumference and waist-to-hip ratio (WHR). Analysis of ancient DNA samples indicates that a north-south cline of height within Europe and and a west-east cline across Eurasia can be traced to selection for increased height in two late Pleistocene hunter gatherer populations living in western and west-central Eurasia. Our observation that IHC and WHR follow a latitudinal cline in Western Eurasia support the role of natural selection driving Bergmann’s Rule in humans, consistent with thermoregulatory adaptation in response to latitudinal temperature variation.Author’s Note on Failure to ReplicateAfter this preprint was posted, the UK Biobank dataset was released, providing a new and open GWAS resource. When attempting to replicate the height selection results from this preprint using GWAS data from the UK Biobank, we discovered that we could not. In subsequent analyses, we determined that both the GIANT consortium height GWAS data, as well as another dataset that was used for replication, were impacted by stratification issues that created or at a minimum substantially inflated the height selection signals reported here. The results of this second investigation, written together with additional coauthors, have now been published (https://elifesciences.org/articles/39725 along with another paper by a separate group of authors, showing similar issues https://elifesciences.org/articles/39702). A preliminary investigation shows that the other non-height based results may suffer from similar issues. We stand by the theory and statistical methods reported in this paper, and the paper can be cited for these results. However, we have shown that the data on which the major empirical results were based are not sound, and so should be treated with caution until replicated.


Sign in / Sign up

Export Citation Format

Share Document