scholarly journals Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritis

2019 ◽  
Author(s):  
Jing Yang ◽  
Amanda McGovern ◽  
Paul Martin ◽  
Kate Duffus ◽  
Xiangyu Ge ◽  
...  

AbstractGenome-wide association studies have identified genetic variation contributing to complex disease risk. However, assigning causal genes and mechanisms has been more challenging because disease-associated variants are often found in distal regulatory regions with cell-type specific behaviours. Here, we collect ATAC-seq, Hi-C, Capture Hi-C and nuclear RNA-seq data in stimulated CD4+ T-cells over 24 hours, to identify functional enhancers regulating gene expression. We characterise changes in DNA interaction and activity dynamics that correlate with changes gene expression, and find that the strongest correlations are observed within 200 kb of promoters. Using rheumatoid arthritis as an example of T-cell mediated disease, we demonstrate interactions of expression quantitative trait loci with target genes, and confirm assigned genes or show complex interactions for 20% of disease associated loci, including FOXO1, which we confirm using CRISPR/Cas9.

2016 ◽  
Author(s):  
Farhad Hormozdiari ◽  
Martijn van de Bunt ◽  
Ayellet V. Segrè ◽  
Xiao Li ◽  
Jong Wha J Joo ◽  
...  

AbstractThe vast majority of genome-wide association studies (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual’s disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue may play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWAS and eQTL studies is challenging due to the uncertainty induced by linkage disequilibrium (LD) and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present a new method, eCAVIAR, that is capable of accounting for LD while computing the quantity we refer to as the colocalization posterior probability (CLPP). The CLPP is the probability that the same variant is responsible for both the GWAS and eQTL signal. eCAVIAR has several key advantages. First, our method can account for more than one causal variant in any loci. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Utilizing publicly available eQTL data on 45 different tissues, we demonstrate that computing CLPP can prioritize likely relevant tissues and target genes for a set of Glucose and Insulin-related traits loci. eCAVIAR is available at http://genetics.cs.ucla.edu/caviar/


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xiangyu Ge ◽  
Mojca Frank-Bertoncelj ◽  
Kerstin Klein ◽  
Amanda McGovern ◽  
Tadeja Kuret ◽  
...  

Abstract Background Genome-wide association studies have reported more than 100 risk loci for rheumatoid arthritis (RA). These loci are shown to be enriched in immune cell-specific enhancers, but the analysis so far has excluded stromal cells, such as synovial fibroblasts (FLS), despite their crucial involvement in the pathogenesis of RA. Here we integrate DNA architecture, 3D chromatin interactions, DNA accessibility, and gene expression in FLS, B cells, and T cells with genetic fine mapping of RA loci. Results We identify putative causal variants, enhancers, genes, and cell types for 30–60% of RA loci and demonstrate that FLS account for up to 24% of RA heritability. TNF stimulation of FLS alters the organization of topologically associating domains, chromatin state, and the expression of putative causal genes such as TNFAIP3 and IFNAR1. Several putative causal genes constitute RA-relevant functional networks in FLS with roles in cellular proliferation and activation. Finally, we demonstrate that risk variants can have joint-specific effects on target gene expression in RA FLS, which may contribute to the development of the characteristic pattern of joint involvement in RA. Conclusion Overall, our research provides the first direct evidence for a causal role of FLS in the genetic susceptibility for RA accounting for up to a quarter of RA heritability.


2020 ◽  
Author(s):  
Xiangyu Ge ◽  
Mojca Frank-Bertoncelj ◽  
Kerstin Klein ◽  
Amanda Mcgovern ◽  
Tadeja Kuret ◽  
...  

AbstractGenome-wide association studies have reported >100 risk loci for rheumatoid arthritis (RA). These loci have been shown to be enriched in immune cell-specific enhancers, but analysis so far has excluded stromal cells, such as synovial fibroblasts (FLS), despite their crucial involvement in the pathogenesis of RA. Here we integrated DNA architecture (ChIP-seq), 3D chromatin interactions (HiC, capture HiC), DNA accessibility (ATAC-seq) and gene expression (RNA-seq) in FLS, B cells and T cells with genetic fine mapping of RA loci. We identified putative causal variants, enhancers, genes, and cell types for 30 - 60% of RA loci and demonstrated that FLS account for up to 24% of RA heritability. TNF stimulation of FLS altered the organization of topologically associating domains (TADs), chromatin state and the expression of putative causal genes (e.g. TNFAIP3, IFNAR1). Several putative causal genes constituted RA-relevant functional networks in FLS with roles in cellular proliferation and activation. Finally, we demonstrated that risk variants can have joint-specific effects on target gene expression in RA FLS, which may contribute to the development of the characteristic pattern of joint involvement in RA. Overall, our research provides the first direct evidence for a causal role of FLS in the genetic susceptibility for RA accounting for up to a quarter of RA heritability.


2021 ◽  
Vol 12 ◽  
Author(s):  
Martina Rauner ◽  
Ines Foessl ◽  
Melissa M. Formosa ◽  
Erika Kague ◽  
Vid Prijatelj ◽  
...  

The availability of large human datasets for genome-wide association studies (GWAS) and the advancement of sequencing technologies have boosted the identification of genetic variants in complex and rare diseases in the skeletal field. Yet, interpreting results from human association studies remains a challenge. To bridge the gap between genetic association and causality, a systematic functional investigation is necessary. Multiple unknowns exist for putative causal genes, including cellular localization of the molecular function. Intermediate traits (“endophenotypes”), e.g. molecular quantitative trait loci (molQTLs), are needed to identify mechanisms of underlying associations. Furthermore, index variants often reside in non-coding regions of the genome, therefore challenging for interpretation. Knowledge of non-coding variance (e.g. ncRNAs), repetitive sequences, and regulatory interactions between enhancers and their target genes is central for understanding causal genes in skeletal conditions. Animal models with deep skeletal phenotyping and cell culture models have already facilitated fine mapping of some association signals, elucidated gene mechanisms, and revealed disease-relevant biology. However, to accelerate research towards bridging the current gap between association and causality in skeletal diseases, alternative in vivo platforms need to be used and developed in parallel with the current -omics and traditional in vivo resources. Therefore, we argue that as a field we need to establish resource-sharing standards to collectively address complex research questions. These standards will promote data integration from various -omics technologies and functional dissection of human complex traits. In this mission statement, we review the current available resources and as a group propose a consensus to facilitate resource sharing using existing and future resources. Such coordination efforts will maximize the acquisition of knowledge from different approaches and thus reduce redundancy and duplication of resources. These measures will help to understand the pathogenesis of osteoporosis and other skeletal diseases towards defining new and more efficient therapeutic targets.


2020 ◽  
Vol 117 (26) ◽  
pp. 15028-15035 ◽  
Author(s):  
Ronald Yurko ◽  
Max G’Sell ◽  
Kathryn Roeder ◽  
Bernie Devlin

To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptiveP-value thresholding (AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS associationPvalues play the role of the primary data for AdaPT; single-nucleotide polymorphisms (SNPs) are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene–gene coexpression, captured by subnetwork (module) membership. In all, 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefrontal cortex. We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.


Blood ◽  
2008 ◽  
Vol 112 (11) ◽  
pp. 2453-2453
Author(s):  
Nicholas A. Watkins ◽  
Marloes R. Tijssen ◽  
Arief Gusnanto ◽  
Bernard de Bono ◽  
Subhajyoti De ◽  
...  

Abstract Haematopoiesis is a carefully controlled process that is regulated by complex networks of transcription factors that are, in part, controlled by signals resulting from ligand binding to cell surface receptors. In order to further understand haematopoiesis, we have compared gene expression profiles of human erythroblasts, megakaryocytes, B-cells, cytotoxic and helper T-cells, Natural Killer cells, granulocytes and monocytes using whole genome microarrays. A bioinformatics analysis of this data was performed focusing on transcription factors, immunoglobulin superfamily members and lineage specific transcripts. We observed that the numbers of lineage specific genes varies by two orders of magnitude, ranging from five for cytotoxic T cells to 878 for granulocytes. In addition, we have identified novel co-expression patterns for key transcription factors involved in haematopoiesis (eg. GATA3–GFI1 and GATA2–KLF1). This study represents the most comprehensive analysis of gene expression in haematopoietic cells to date and has identified genes that play key roles in lineage commitment and cell function. The data, which is freely accessible, will be invaluable for future studies on haematopoiesis and the role of specific genes and will also aid the understanding of the recent genome-wide association studies.


2017 ◽  
Vol 242 (13) ◽  
pp. 1325-1334 ◽  
Author(s):  
Yizhou Zhu ◽  
Cagdas Tazearslan ◽  
Yousin Suh

Genome-wide association studies have shown that the far majority of disease-associated variants reside in the non-coding regions of the genome, suggesting that gene regulatory changes contribute to disease risk. To identify truly causal non-coding variants and their affected target genes remains challenging but is a critical step to translate the genetic associations to molecular mechanisms and ultimately clinical applications. Here we review genomic/epigenomic resources and in silico tools that can be used to identify causal non-coding variants and experimental strategies to validate their functionalities. Impact statement Most signals from genome-wide association studies (GWASs) map to the non-coding genome, and functional interpretation of these associations remained challenging. We reviewed recent progress in methodologies of studying the non-coding genome and argued that no single approach allows one to effectively identify the causal regulatory variants from GWAS results. By illustrating the advantages and limitations of each method, our review potentially provided a guideline for taking a combinatorial approach to accurately predict, prioritize, and eventually experimentally validate the causal variants.


2021 ◽  
Author(s):  
Steven Gazal ◽  
Omer Weissbrod ◽  
Farhad Hormozdiari ◽  
Kushal Dey ◽  
Joseph Nasser ◽  
...  

Although genome-wide association studies (GWAS) have identified thousands of disease-associated common SNPs, these SNPs generally do not implicate the underlying target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis, but it is unclear how these strategies should be applied in the context of interpreting common disease risk variants. We developed a framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk, leveraging polygenic analyses of disease heritability to define and estimate their precision and recall. We applied our framework to GWAS summary statistics for 63 diseases and complex traits (average N=314K), evaluating 50 S2G strategies. Our optimal combined S2G strategy (cS2G) included 7 constituent S2G strategies (Exon, Promoter, 2 fine-mapped cis-eQTL strategies, EpiMap enhancer-gene linking, Activity-By-Contact (ABC), and Cicero), and achieved a precision of 0.75 and a recall of 0.33, more than doubling the precision and/or recall of any individual strategy; this implies that 33% of SNP-heritability can be linked to causal genes with 75% confidence. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 7,111 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. Finally, we applied cS2G to genome-wide fine-mapping results for these traits (not restricted to GWAS loci) to rank genes by the heritability linked to each gene, providing an empirical assessment of disease omnigenicity; averaging across traits, we determined that the top 200 (1%) of ranked genes explained roughly half of the heritability linked to all genes. Our results highlight the benefits of our cS2G strategy in providing functional interpretation of GWAS findings; we anticipate that precision and recall will increase further under our framework as improved functional assays lead to improved S2G strategies. 


2019 ◽  
Author(s):  
James Boocock ◽  
Megan Leask ◽  
Yukinori Okada ◽  
Hirotaka Matsuo ◽  
Yusuke Kawamura ◽  
...  

AbstractSerum urate is the end-product of purine metabolism. Elevated serum urate is causal of gout and a predictor of renal disease, cardiovascular disease and other metabolic conditions. Genome-wide association studies (GWAS) have reported dozens of loci associated with serum urate control, however there has been little progress in understanding the molecular basis of the associated loci. Here we employed trans-ancestral meta-analysis using data from European and East Asian populations to identify ten new loci for serum urate levels. Genome-wide colocalization with cis-expression quantitative trait loci (eQTL) identified a further five new loci. By cis- and trans-eQTL colocalization analysis we identified 24 and 20 genes respectively where the causal eQTL variant has a high likelihood that it is shared with the serum urate-associated locus. One new locus identified was SLC22A9 that encodes organic anion transporter 7 (OAT7). We demonstrate that OAT7 is a very weak urate-butyrate exchanger. Newly implicated genes identified in the eQTL analysis include those encoding proteins that make up the dystrophin complex, a scaffold for signaling proteins and transporters at the cell membrane; MLXIP that, with the previously identified MLXIPL, is a transcription factor that may regulate serum urate via the pentose-phosphate pathway; and MRPS7 and IDH2 that encode proteins necessary for mitochondrial function. Trans-ancestral functional fine-mapping identified six loci (RREB1, INHBC, HLF, UBE2Q2, SFMBT1, HNF4G) with colocalized eQTL that contained putative causal SNPs (posterior probability of causality > 0.8). This systematic analysis of serum urate GWAS loci has identified candidate causal genes at 19 loci and a network of previously unidentified genes likely involved in control of serum urate levels, further illuminating the molecular mechanisms of urate control.Author SummaryHigh serum urate is a prerequisite for gout and a risk factor for metabolic disease. Previous GWAS have identified numerous loci that are associated with serum urate control, however, only a small handful of these loci have known molecular consequences. The majority of loci are within the non-coding regions of the genome and therefore it is difficult to ascertain how these variants might influence serum urate levels without tangible links to gene expression and / or protein function. We have applied a novel bioinformatic pipeline where we combined population-specific GWAS data with gene expression and genome connectivity information to identify putative causal genes for serum urate associated loci. Overall, we identified 15 novel serum urate loci and show that these loci along with previously identified loci are linked to the expression of 44 genes. We show that some of the variants within these loci have strong predicted regulatory function which can be further tested in functional analyses. This study expands on previous GWAS by identifying further loci implicated in serum urate control and new causal mechanisms supported by gene expression changes.


Sign in / Sign up

Export Citation Format

Share Document