scholarly journals Projecting genetic associations through gene expression patterns highlights disease etiology and drug mechanisms

2021 ◽  
Author(s):  
Milton Pividori ◽  
Sumei Lu ◽  
Binglan Li ◽  
Chun Su ◽  
Matthew E. Johnson ◽  
...  

Understanding how dysregulated transcriptional processes result in tissue-specific pathology requires a mechanistic interpretation of expression regulation across different cell types. It has been shown that this insight is key for the development of new therapies. These mechanisms can be identified with transcriptome-wide association studies (TWAS), which have represented an important step forward to test the mediating role of gene expression in GWAS associations. However, due to pervasive eQTL sharing across tissues, TWAS has not been successful in identifying causal tissues, and other methods generally do not take advantage of the large amounts of RNA-seq data publicly available. Here we introduce a polygenic approach that leverages gene modules (genes with similar co-expression patterns) to project both gene-trait associations and pharmacological perturbation data into a common latent representation for a joint analysis. We observed that diseases were significantly associated with gene modules expressed in relevant cell types, such as hypothyroidism with T cells and thyroid, hypertension and lipids with adipose tissue, and coronary artery disease with cardiomyocytes. Our approach was more accurate in predicting known drug-disease pairs and revealed stable trait clusters, including a complex branch involving lipids with cardiovascular, autoimmune, and neuropsychiatric disorders. Furthermore, using a CRISPR-screen, we show that genes involved in lipid regulation exhibit more consistent trait associations through gene modules than individual genes. Our results suggest that a gene module perspective can contextualize genetic associations and prioritize alternative treatment targets when GWAS hits are not druggable.

2019 ◽  
Author(s):  
Wen Zhang ◽  
Georgios Voloudakis ◽  
Veera M. Rajagopal ◽  
Ben Reahead ◽  
Joel T. Dudley ◽  
...  

AbstractTranscriptome-wide association studies integrate gene expression data with common risk variation to identify gene-trait associations. By incorporating epigenome data to estimate the functional importance of genetic variation on gene expression, we improve the accuracy of transcriptome prediction and the power to detect significant expression-trait associations. Joint analysis of 14 large-scale transcriptome datasets and 58 traits identify 13,724 significant expression-trait associations that converge to biological processes and relevant phenotypes in human and mouse phenotype databases. We perform drug repurposing analysis and identify known and novel compounds that mimic or reverse trait-specific changes. We identify genes that exhibit agonistic pleiotropy for genetically correlated traits that converge on shared biological pathways and elucidate distinct processes in disease etiopathogenesis. Overall, this comprehensive analysis provides insight into the specificity and convergence of gene expression on susceptibility to complex traits.


2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Yuanyuan Li ◽  
Ping Luo ◽  
Yi Lu ◽  
Fang-Xiang Wu

Abstract Background With the development of the technology of single-cell sequence, revealing homogeneity and heterogeneity between cells has become a new area of computational systems biology research. However, the clustering of cell types becomes more complex with the mutual penetration between different types of cells and the instability of gene expression. One way of overcoming this problem is to group similar, related single cells together by the means of various clustering analysis methods. Although some methods such as spectral clustering can do well in the identification of cell types, they only consider the similarities between cells and ignore the influence of dissimilarities on clustering results. This methodology may limit the performance of most of the conventional clustering algorithms for the identification of clusters, it needs to develop special methods for high-dimensional sparse categorical data. Results Inspired by the phenomenon that same type cells have similar gene expression patterns, but different types of cells evoke dissimilar gene expression patterns, we improve the existing spectral clustering method for clustering single-cell data that is based on both similarities and dissimilarities between cells. The method first measures the similarity/dissimilarity among cells, then constructs the incidence matrix by fusing similarity matrix with dissimilarity matrix, and, finally, uses the eigenvalues of the incidence matrix to perform dimensionality reduction and employs the K-means algorithm in the low dimensional space to achieve clustering. The proposed improved spectral clustering method is compared with the conventional spectral clustering method in recognizing cell types on several real single-cell RNA-seq datasets. Conclusions In summary, we show that adding intercellular dissimilarity can effectively improve accuracy and achieve robustness and that improved spectral clustering method outperforms the traditional spectral clustering method in grouping cells.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
John A. Halsall ◽  
Simon Andrews ◽  
Felix Krueger ◽  
Charlotte E. Rutledge ◽  
Gabriella Ficz ◽  
...  

AbstractChromatin configuration influences gene expression in eukaryotes at multiple levels, from individual nucleosomes to chromatin domains several Mb long. Post-translational modifications (PTM) of core histones seem to be involved in chromatin structural transitions, but how remains unclear. To explore this, we used ChIP-seq and two cell types, HeLa and lymphoblastoid (LCL), to define how changes in chromatin packaging through the cell cycle influence the distributions of three transcription-associated histone modifications, H3K9ac, H3K4me3 and H3K27me3. We show that chromosome regions (bands) of 10–50 Mb, detectable by immunofluorescence microscopy of metaphase (M) chromosomes, are also present in G1 and G2. They comprise 1–5 Mb sub-bands that differ between HeLa and LCL but remain consistent through the cell cycle. The same sub-bands are defined by H3K9ac and H3K4me3, while H3K27me3 spreads more widely. We found little change between cell cycle phases, whether compared by 5 Kb rolling windows or when analysis was restricted to functional elements such as transcription start sites and topologically associating domains. Only a small number of genes showed cell-cycle related changes: at genes encoding proteins involved in mitosis, H3K9 became highly acetylated in G2M, possibly because of ongoing transcription. In conclusion, modified histone isoforms H3K9ac, H3K4me3 and H3K27me3 exhibit a characteristic genomic distribution at resolutions of 1 Mb and below that differs between HeLa and lymphoblastoid cells but remains remarkably consistent through the cell cycle. We suggest that this cell-type-specific chromosomal bar-code is part of a homeostatic mechanism by which cells retain their characteristic gene expression patterns, and hence their identity, through multiple mitoses.


2020 ◽  
Author(s):  
Devanshi Patel ◽  
Xiaoling Zhang ◽  
John J. Farrell ◽  
Jaeyoon Chung ◽  
Thor D. Stein ◽  
...  

ABSTRACTBecause regulation of gene expression is heritable and context-dependent, we investigated AD-related gene expression patterns in cell-types in blood and brain. Cis-expression quantitative trait locus (eQTL) mapping was performed genome-wide in blood from 5,257 Framingham Heart Study (FHS) participants and in brain donated by 475 Religious Orders Study/Memory & Aging Project (ROSMAP) participants. The association of gene expression with genotypes for all cis SNPs within 1Mb of genes was evaluated using linear regression models for unrelated subjects and linear mixed models for related subjects. Cell type-specific eQTL (ct-eQTL) models included an interaction term for expression of “proxy” genes that discriminate particular cell type. Ct-eQTL analysis identified 11,649 and 2,533 additional significant gene-SNP eQTL pairs in brain and blood, respectively, that were not detected in generic eQTL analysis. Of note, 386 unique target eGenes of significant eQTLs shared between blood and brain were enriched in apoptosis and Wnt signaling pathways. Five of these shared genes are established AD loci. The potential importance and relevance to AD of significant results in myeloid cell-types is supported by the observation that a large portion of GWS ct-eQTLs map within 1Mb of established AD loci and 58% (23/40) of the most significant eGenes in these eQTLs have previously been implicated in AD. This study identified cell-type specific expression patterns for established and potentially novel AD genes, found additional evidence for the role of myeloid cells in AD risk, and discovered potential novel blood and brain AD biomarkers that highlight the importance of cell-type specific analysis.


Author(s):  
Arjun Bhattacharya ◽  
Yun Li ◽  
Michael I. Love

ABSTRACTTraditional predictive models for transcriptome-wide association studies (TWAS) consider only single nucleotide polymorphisms (SNPs) local to genes of interest and perform parameter shrinkage with a regularization process. These approaches ignore the effect of distal-SNPs or other molecular effects underlying the SNP-gene association. Here, we outline multi-omics strategies for transcriptome imputation from germline genetics to allow more powerful testing of gene-trait associations by prioritizing distal-SNPs to the gene of interest. In one extension, we identify mediating biomarkers (CpG sites, microRNAs, and transcription factors) highly associated with gene expression and train predictive models for these mediators using their local SNPs. Imputed values for mediators are then incorporated into the final predictive model of gene expression, along with local SNPs. In the second extension, we assess distal-eQTLs (SNPs associated with genes not in a local window around it) for their mediation effect through mediating biomarkers local to these distal-eSNPs. Distal-eSNPs with large indirect mediation effects are then included in the transcriptomic prediction model with the local SNPs around the gene of interest. Using simulations and real data from ROS/MAP brain tissue and TCGA breast tumors, we show considerable gains of percent variance explained (1-2% additive increase) of gene expression and TWAS power to detect gene-trait associations. This integrative approach to transcriptome-wide imputation and association studies aids in identifying the complex interactions underlying genetic regulation within a tissue and important risk genes for various traits and disorders.AUTHOR SUMMARYTranscriptome-wide association studies (TWAS) are a powerful strategy to study gene-trait associations by integrating genome-wide association studies (GWAS) with gene expression datasets. TWAS increases study power and interpretability by mapping genetic variants to genes. However, traditional TWAS consider only variants that are close to a gene and thus ignores important variants far away from the gene that may be involved in complex regulatory mechanisms. Here, we present MOSTWAS (Multi-Omic Strategies for TWAS), a suite of tools that extends the TWAS framework to include these distal variants. MOSTWAS leverages multi-omic data of regulatory biomarkers (transcription factors, microRNAs, epigenetics) and borrows from techniques in mediation analysis to prioritize distal variants that are around these regulatory biomarkers. Using simulations and real public data from brain tissue and breast tumors, we show that MOSTWAS improves upon traditional TWAS in both predictive performance and power to detect gene-trait associations. MOSTWAS also aids in identifying possible mechanisms for gene regulation using a novel added-last test that assesses the added information gained from the distal variants beyond the local association. In conclusion, our method aids in detecting important risk genes for traits and disorders and the possible complex interactions underlying genetic regulation within a tissue.


Genetics ◽  
2020 ◽  
Vol 216 (4) ◽  
pp. 891-903
Author(s):  
Ishara S. Ariyapala ◽  
Jessica M. Holsopple ◽  
Ellen M. Popodi ◽  
Dalton G. Hartwick ◽  
Lily Kahsai ◽  
...  

The Drosophila adult midgut is a model epithelial tissue composed of a few major cell types with distinct regional identities. One of the limitations to its analysis is the lack of tools to manipulate gene expression based on these regional identities. To overcome this obstacle, we applied the intersectional split-GAL4 system to the adult midgut and report 653 driver combinations that label cells by region and cell type. We first identified 424 split-GAL4 drivers with midgut expression from ∼7300 drivers screened, and then evaluated the expression patterns of each of these 424 when paired with three reference drivers that report activity specifically in progenitor cells, enteroendocrine cells, or enterocytes. We also evaluated a subset of the drivers expressed in progenitor cells for expression in enteroblasts using another reference driver. We show that driver combinations can define novel cell populations by identifying a driver that marks a distinct subset of enteroendocrine cells expressing genes usually associated with progenitor cells. The regional cell type patterns associated with the entire set of driver combinations are documented in a freely available website, providing information for the design of thousands of additional driver combinations to experimentally manipulate small subsets of intestinal cells. In addition, we show that intestinal enhancers identified with the split-GAL4 system can confer equivalent expression patterns on other transgenic reporters. Altogether, the resource reported here will enable more precisely targeted gene expression for studying intestinal processes, epithelial cell functions, and diseases affecting self-renewing tissues.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Reagon Karki ◽  
Alpha Tom Kodamullil ◽  
Charles Tapley Hoyt ◽  
Martin Hofmann-Apitius

Abstract Background Literature derived knowledge assemblies have been used as an effective way of representing biological phenomenon and understanding disease etiology in systems biology. These include canonical pathway databases such as KEGG, Reactome and WikiPathways and disease specific network inventories such as causal biological networks database, PD map and NeuroMMSig. The represented knowledge in these resources delineates qualitative information focusing mainly on the causal relationships between biological entities. Genes, the major constituents of knowledge representations, tend to express differentially in different conditions such as cell types, brain regions and disease stages. A classical approach of interpreting a knowledge assembly is to explore gene expression patterns of the individual genes. However, an approach that enables quantification of the overall impact of differentially expressed genes in the corresponding network is still lacking. Results Using the concept of heat diffusion, we have devised an algorithm that is able to calculate the magnitude of regulation of a biological network using expression datasets. We have demonstrated that molecular mechanisms specific to Alzheimer (AD) and Parkinson Disease (PD) regulate with different intensities across spatial and temporal resolutions. Our approach depicts that the mitochondrial dysfunction in PD is severe in cortex and advanced stages of PD patients. Similarly, we have shown that the intensity of aggregation of neurofibrillary tangles (NFTs) in AD increases as the disease progresses. This finding is in concordance with previous studies that explain the burden of NFTs in stages of AD. Conclusions This study is one of the first attempts that enable quantification of mechanisms represented as biological networks. We have been able to quantify the magnitude of regulation of a biological network and illustrate that the magnitudes are different across spatial and temporal resolution.


2020 ◽  
Vol 21 (23) ◽  
pp. 9052
Author(s):  
Indrek Teino ◽  
Antti Matvere ◽  
Martin Pook ◽  
Inge Varik ◽  
Laura Pajusaar ◽  
...  

Aryl hydrocarbon receptor (AHR) is a ligand-activated transcription factor, which mediates the effects of a variety of environmental stimuli in multiple tissues. Recent advances in AHR biology have underlined its importance in cells with high developmental potency, including pluripotent stem cells. Nonetheless, there is little data on AHR expression and its role during the initial stages of stem cell differentiation. The purpose of this study was to investigate the temporal pattern of AHR expression during directed differentiation of human embryonic stem cells (hESC) into neural progenitor, early mesoderm and definitive endoderm cells. Additionally, we investigated the effect of the AHR agonist 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) on the gene expression profile in hESCs and differentiated cells by RNA-seq, accompanied by identification of AHR binding sites by ChIP-seq and epigenetic landscape analysis by ATAC-seq. We showed that AHR is differentially regulated in distinct lineages. We provided evidence that TCDD alters gene expression patterns in hESCs and during early differentiation. Additionally, we identified novel potential AHR target genes, which expand our understanding on the role of this protein in different cell types.


Blood ◽  
2008 ◽  
Vol 112 (11) ◽  
pp. 2453-2453
Author(s):  
Nicholas A. Watkins ◽  
Marloes R. Tijssen ◽  
Arief Gusnanto ◽  
Bernard de Bono ◽  
Subhajyoti De ◽  
...  

Abstract Haematopoiesis is a carefully controlled process that is regulated by complex networks of transcription factors that are, in part, controlled by signals resulting from ligand binding to cell surface receptors. In order to further understand haematopoiesis, we have compared gene expression profiles of human erythroblasts, megakaryocytes, B-cells, cytotoxic and helper T-cells, Natural Killer cells, granulocytes and monocytes using whole genome microarrays. A bioinformatics analysis of this data was performed focusing on transcription factors, immunoglobulin superfamily members and lineage specific transcripts. We observed that the numbers of lineage specific genes varies by two orders of magnitude, ranging from five for cytotoxic T cells to 878 for granulocytes. In addition, we have identified novel co-expression patterns for key transcription factors involved in haematopoiesis (eg. GATA3–GFI1 and GATA2–KLF1). This study represents the most comprehensive analysis of gene expression in haematopoietic cells to date and has identified genes that play key roles in lineage commitment and cell function. The data, which is freely accessible, will be invaluable for future studies on haematopoiesis and the role of specific genes and will also aid the understanding of the recent genome-wide association studies.


2003 ◽  
Vol 4 (2) ◽  
pp. 208-215 ◽  
Author(s):  
David W. Galbraith

The tissues and organs of multicellular eukaryotes are frequently observed to comprise complex three-dimensional interspersions of different cell types. It is a reasonable assumption that different global patterns of gene expression are found within these different cell types. This review outlines general experimental strategies designed to characterize these global gene expression patterns, based on a combination of methods of transgenic fluorescent protein (FP) expression and targeting, of flow cytometry and sorting and of high-throughput gene expression analysis.


Sign in / Sign up

Export Citation Format

Share Document