Integrating regulatory DNA sequence and gene expression to predict genome-wide chromatin accessibility across cellular contexts

AbstractMotivationGenome-wide profiles of chromatin accessibility and gene expression in diverse cellular contexts are critical to decipher the dynamics of transcriptional regulation. Recently, convolutional neural networks (CNNs) have been used to learn predictive cis-regulatory DNA sequence models of context-specific chromatin accessibility landscapes. However, these context-specific regulatory sequence models cannot generalize predictions across cell types.ResultsWe introduce multi-modal, residual neural network architectures that integrate cis-regulatory sequence and context-specific expression of trans-regulators to predict genome-wide chromatin accessibility profiles across cellular contexts. We show that the average accessibility of a genomic region across training contexts can be a surprisingly powerful predictor. We leverage this feature and employ novel strategies for training models to enhance genome-wide prediction of shared and context-specific chromatin accessible sites across cell types. We interpret the models to reveal insights into cis and trans regulation of chromatin dynamics across 123 diverse cellular contexts.AvailabilityThe code is available athttps://github.com/kundajelab/[email protected]

Download Full-text

Integrating regulatory DNA sequence and gene expression to predict genome-wide chromatin accessibility across cellular contexts

Bioinformatics ◽

10.1093/bioinformatics/btz352 ◽

2019 ◽

Vol 35 (14) ◽

pp. i108-i116 ◽

Cited By ~ 10

Author(s):

Surag Nair ◽

Daniel S Kim ◽

Jacob Perricone ◽

Anshul Kundaje

Keyword(s):

Gene Expression ◽

Dna Sequence ◽

Cell Types ◽

Chromatin Accessibility ◽

Supplementary Information ◽

Regulatory Sequence ◽

Specific Expression ◽

Genome Wide ◽

Context Specific ◽

Regulatory Dna

Abstract Motivation Genome-wide profiles of chromatin accessibility and gene expression in diverse cellular contexts are critical to decipher the dynamics of transcriptional regulation. Recently, convolutional neural networks have been used to learn predictive cis-regulatory DNA sequence models of context-specific chromatin accessibility landscapes. However, these context-specific regulatory sequence models cannot generalize predictions across cell types. Results We introduce multi-modal, residual neural network architectures that integrate cis-regulatory sequence and context-specific expression of trans-regulators to predict genome-wide chromatin accessibility profiles across cellular contexts. We show that the average accessibility of a genomic region across training contexts can be a surprisingly powerful predictor. We leverage this feature and employ novel strategies for training models to enhance genome-wide prediction of shared and context-specific chromatin accessible sites across cell types. We interpret the models to reveal insights into cis- and trans-regulation of chromatin dynamics across 123 diverse cellular contexts. Availability and implementation The code is available at https://github.com/kundajelab/ChromDragoNN. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Deciphering programs of transcriptional regulation by combined deconvolution of multiple omics layers

10.1101/199547 ◽

2017 ◽

Cited By ~ 5

Author(s):

Daniel Hüebschmann ◽

Nils Kurzawa ◽

Sebastian Steinhauser ◽

Philipp Rentzsch ◽

Stephen Krämer ◽

...

Keyword(s):

Gene Expression ◽

Developmental Stages ◽

Cell Types ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Cell Type ◽

Specific Expression ◽

Input Size ◽

Human Blood Cells ◽

Cell Type Specific Expression

AbstractMetazoans are crucially dependent on multiple layers of gene regulatory mechanisms which allow them to control gene expression across developmental stages, tissues and cell types. Multiple recent research consortia have aimed to generate comprehensive datasets to profile the activity of these cell type- and condition-specific regulatory landscapes across many different cell lines and primary cells. However, extraction of genes or regulatory elements specific to certain entities from these datasets remains challenging. We here propose a novel method based on non-negative matrix factorization for disentangling and associating huge multi-assay datasets including chromatin accessibility and gene expression data. Taking advantage of implementations of NMF algorithms in the GPU CUDA environment full datasets composed of tens of thousands of genes as well as hundreds of samples can be processed without the need for prior feature selection to reduce the input size. Applying this framework to multiple layers of genomic data derived from human blood cells we unravel mechanisms of regulation of cell type-specific expression in T-cells and monocytes.

Download Full-text

Remodeling of the H3 nucleosomal landscape during mouse aging

10.1101/769489 ◽

2019 ◽

Author(s):

Yilin Chen ◽

Juan I. Bravo ◽

Bérénice A. Benayoun

Keyword(s):

Gene Expression ◽

Primary Cultures ◽

Biological Significance ◽

Cell Types ◽

Chromatin Accessibility ◽

Control Of Gene Expression ◽

Age Related ◽

Genome Wide ◽

Transcriptional Alterations ◽

Regulation Mechanisms

AbstractIn multi-cellular organisms, the control of gene expression is key not only for development, but also for adult cellular homeostasis, and deregulation of gene expression correlates with aging. A key layer in the study of gene regulation mechanisms lies at the level of chromatin: cellular chromatin states (i.e. the ‘epigenome’) can tune transcriptional profiles, and, in line with the prevalence of transcriptional alterations with aging, accumulating evidence suggests that the chromatin landscape is altered with aging across cell types and species. However, though alterations in the chromatin make-up of cells are considered to be a hallmark of aging, little is known of the genomic loci that are specifically affected by age-related chromatin state remodeling and of their biological significance. Here, we report the analysis of genome-wide profiles of core histone H3 occupancy in aging male mice tissues (i.e. heart, liver, cerebellum and olfactory bulb) and primary cultures of neural stem cells. We find that, although no drastic changes in H3 levels are observed, local changes in H3 occupancy occur with aging across tissues and cells with both regions of increased or decreased occupancy. These changes are compatible with a general increase in chromatin accessibility at pro-inflammatory genes and may thus mechanistically underlie known shift in gene expression programs with aging.

Download Full-text

Behavior-dependent cis regulation reveals genes and pathways associated with bower building in cichlid fishes

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1810140115 ◽

2018 ◽

Vol 115 (47) ◽

pp. E11081-E11090 ◽

Cited By ~ 15

Author(s):

Ryan A. York ◽

Chinar Patil ◽

Kawther Abdilleh ◽

Zachary V. Johnson ◽

Matthew A. Conte ◽

...

Keyword(s):

Gene Expression ◽

Neural Plasticity ◽

Genetic Basis ◽

Specific Expression ◽

Preferential Expression ◽

Cichlid Fishes ◽

Brain Gene Expression ◽

Genome Wide ◽

Allele Specific ◽

Genomic Regions

Many behaviors are associated with heritable genetic variation [Kendler and Greenspan (2006) Am J Psychiatry 163:1683–1694]. Genetic mapping has revealed genomic regions or, in a few cases, specific genes explaining part of this variation [Bendesky and Bargmann (2011) Nat Rev Gen 12:809–820]. However, the genetic basis of behavioral evolution remains unclear. Here we investigate the evolution of an innate extended phenotype, bower building, among cichlid fishes of Lake Malawi. Males build bowers of two types, pits or castles, to attract females for mating. We performed comparative genome-wide analyses of 20 bower-building species and found that these phenotypes have evolved multiple times with thousands of genetic variants strongly associated with this behavior, suggesting a polygenic architecture. Remarkably, F1 hybrids of a pit-digging and a castle-building species perform sequential construction of first a pit and then a castle bower. Analysis of brain gene expression in these hybrids showed that genes near behavior-associated variants display behavior-dependent allele-specific expression with preferential expression of the pit-digging species allele during pit digging and of the castle-building species allele during castle building. These genes are highly enriched for functions related to neurodevelopment and neural plasticity. Our results suggest that natural behaviors are associated with complex genetic architectures that alter behavior via cis-regulatory differences whose effects on gene expression are specific to the behavior itself.

Download Full-text

Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types

10.1101/103069 ◽

2017 ◽

Cited By ~ 19

Author(s):

Hilary K. Finucane ◽

Yakir A. Reshef ◽

Verneri Anttila ◽

Kamil Slowikowski ◽

Alexander Gusev ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Complex Disease ◽

Genome Wide Association Study ◽

Ex Vivo ◽

Cell Types ◽

Inhibitory Neurons ◽

Biliary Cirrhosis ◽

Expression Data ◽

Specific Expression

ABSTRACTGenetics can provide a systematic approach to discovering the tissues and cell types relevant for a complex disease or trait. Identifying these tissues and cell types is critical for following up on non-coding allelic function, developing ex-vivo models, and identifying therapeutic targets. Here, we analyze gene expression data from several sources, including the GTEx and PsychENCODE consortia, together with genome-wide association study (GWAS) summary statistics for 48 diseases and traits with an average sample size of 169,331, to identify disease-relevant tissues and cell types. We develop and apply an approach that uses stratified LD score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We detect tissue-specific enrichments at FDR < 5% for 34 diseases and traits across a broad range of tissues that recapitulate known biology. In our analysis of traits with observed central nervous system enrichment, we detect an enrichment of neurons over other brain cell types for several brain-related traits, enrichment of inhibitory over excitatory neurons for bipolar disorder but excitatory over inhibitory neurons for schizophrenia and body mass index, and enrichments in the cortex for schizophrenia and in the striatum for migraine. In our analysis of traits with observed immunological enrichment, we identify enrichments of T cells for asthma and eczema, B cells for primary biliary cirrhosis, and myeloid cells for Alzheimer's disease, which we validated with independent chromatin data. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signal.

Download Full-text

HCR-FlowFISH: A flexible CRISPR screening method to identify cis-regulatory elements and their target genes

10.1101/2020.05.11.078675 ◽

2020 ◽

Author(s):

SK Reilly ◽

SJ Gosai ◽

A Gutierrez ◽

JC Ulirsch ◽

M Kanai ◽

...

Keyword(s):

Gene Expression ◽

Target Genes ◽

Screening Method ◽

Cell Types ◽

Regulatory Elements ◽

Hybridization Chain Reaction ◽

Genome Wide ◽

Wide Range ◽

Causal Variants ◽

Endogenous Loci

AbstractCRISPR screens for cis-regulatory elements (CREs) have shown unprecedented power to endogenously characterize the non-coding genome. To characterize CREs we developed HCR-FlowFISH (Hybridization Chain Reaction Fluorescent In-Situ Hybridization coupled with Flow Cytometry), which directly quantifies native transcripts within their endogenous loci following CRISPR perturbations of regulatory elements, eliminating the need for restrictive phenotypic assays such as growth or transcript-tagging. HCR-FlowFISH accurately quantifies gene expression across a wide range of transcript levels and cell types. We also developed CASA (CRISPR Activity Screen Analysis), a hierarchical Bayesian model to identify and quantify CRE activity. Using >270,000 perturbations, we identified CREs for GATA1, HDAC6, ERP29, LMO2, MEF2C, CD164, NMU, FEN1 and the FADS gene cluster. Our methods detect subtle gene expression changes and identify CREs regulating multiple genes, sometimes at different magnitudes and directions. We demonstrate the power of HCR-FlowFISH to parse genome-wide association signals by nominating causal variants and target genes.

Download Full-text

Cell-type Specific Expression Quantitative Trait Loci Associated with Alzheimer Disease in Blood and Brain Tissue

10.1101/2020.11.23.20237008 ◽

2020 ◽

Author(s):

Devanshi Patel ◽

Xiaoling Zhang ◽

John J. Farrell ◽

Jaeyoon Chung ◽

Thor D. Stein ◽

...

Keyword(s):

Gene Expression ◽

Quantitative Trait ◽

Expression Patterns ◽

Regulation Of Gene Expression ◽

Cell Types ◽

Eqtl Analysis ◽

Cell Type ◽

Specific Expression ◽

Cell Type Specific Expression ◽

Cell Type Specific

ABSTRACTBecause regulation of gene expression is heritable and context-dependent, we investigated AD-related gene expression patterns in cell-types in blood and brain. Cis-expression quantitative trait locus (eQTL) mapping was performed genome-wide in blood from 5,257 Framingham Heart Study (FHS) participants and in brain donated by 475 Religious Orders Study/Memory & Aging Project (ROSMAP) participants. The association of gene expression with genotypes for all cis SNPs within 1Mb of genes was evaluated using linear regression models for unrelated subjects and linear mixed models for related subjects. Cell type-specific eQTL (ct-eQTL) models included an interaction term for expression of “proxy” genes that discriminate particular cell type. Ct-eQTL analysis identified 11,649 and 2,533 additional significant gene-SNP eQTL pairs in brain and blood, respectively, that were not detected in generic eQTL analysis. Of note, 386 unique target eGenes of significant eQTLs shared between blood and brain were enriched in apoptosis and Wnt signaling pathways. Five of these shared genes are established AD loci. The potential importance and relevance to AD of significant results in myeloid cell-types is supported by the observation that a large portion of GWS ct-eQTLs map within 1Mb of established AD loci and 58% (23/40) of the most significant eGenes in these eQTLs have previously been implicated in AD. This study identified cell-type specific expression patterns for established and potentially novel AD genes, found additional evidence for the role of myeloid cells in AD risk, and discovered potential novel blood and brain AD biomarkers that highlight the importance of cell-type specific analysis.

Download Full-text

Dynamic Epigenetic Landscapes Define Multiple Myeloma Progression and Drug Resistance

Blood ◽

10.1182/blood-2020-142872 ◽

2020 ◽

Vol 136 (Supplement 1) ◽

pp. 32-33

Author(s):

Rafael Renatino-Canevarolo ◽

Mark B. Meads ◽

Maria Silva ◽

Praneeth Reddy Sudalagunta ◽

Christopher Cubitt ◽

...

Keyword(s):

Gene Expression ◽

Drug Resistance ◽

Disease Progression ◽

Board Of Directors ◽

Research Funding ◽

Chromatin Accessibility ◽

Refractory Disease ◽

Advisory Committees ◽

Cytogenetic Abnormalities ◽

Genome Wide

Multiple myeloma (MM) is an incurable cancer of bone marrow-resident plasma cells, which evolves from a premalignant state, MGUS, to a form of active disease characterized by an initial response to therapy, followed by cycles of therapeutic successes and failures, culminating in a fatal multi-drug resistant cancer. The molecular mechanisms leading to disease progression and refractory disease in MM remain poorly understood. To address this question, we have generated a new database, consisting of 1,123 MM biopsies from patients treated at the H. Lee Moffitt Cancer Center. These samples ranged from MGUS to late relapsed/refractory (LR) disease, and were comprehensively characterized genetically (844 RNAseq, 870 WES, 7 scRNAseq), epigenetically (10 single-cell chromatin accessibility, scATAC-seq) and phenotypically (537 samples assessed for ex vivo drug resistance). Mutational analysis identified putative driver genes (e.g. NRAS, KRAS) among the highest frequent mutations, as well as a steady increase in mutational load across progression from MGUS to LR samples. However, with the exception of KRAS, these genes did not reach statistical significance according to FISHER's exact test between different disease stages, suggesting that no single mutation is necessary or sufficient to drive MM progression or refractory disease, but rather a common "driver" biology is critical. Pathway analysis of differentially expressed genes identified cell adhesion, inflammatory cytokines and hematopoietic cell identify as under-expressed in active MM vs. MGUS, while cell cycle, metabolism, DNA repair, protein/RNA synthesis and degradation were over-expressed in LR. Using an unsupervised systems biology approach, we reconstructed a gene expression map to identify transcriptomic reprogramming events associated with disease progression and evolution of drug resistance. At an epigenetic regulatory level, these genes were enriched for histone modifications (e.g. H3k27me3 and H3k27ac). Furthermore, scATAC-seq confirmed genome-wide alterations in chromatin accessibility across MM progression, involving shifts in chromatin accessibility of the binding motifs of epigenetic regulator complexes, known to mediate formation of 3D structures (CTCF/YY1) of super enhancers (SE) and cell identity reprograming (POU5F1/SOX2). Additionally, we have identified SE-regulated genes under- (EBF1, RB1, SPI1, KLF6) and over-expressed (PRDM1, IRF4) in MM progression, as well as over-expressed in LR (RFX5, YY1, NBN, CTCF, BCOR). We have found a correlation between cytogenetic abnormalities and mutations with differential gene expression observed in MM progression, suggesting groups of genetic events with equivalent transcriptomic effect: e.g. NRAS, KRAS, DIS3 and del13q are associated with transcriptomic changes observed during MGUS/SMOL=>active MM transition (Figure 1). Taken together, our preliminary data suggests that multiple independent combinations of genetic and epigenetic events (e.g. mutations, cytogenetics, SE dysregulation) alter the balance of master epigenetic regulatory circuitry, leading to genome-wide transcriptional reprogramming, facilitating disease progression and emergence of drug resistance. Figure 1: Topology of transcriptional regulation in MM depicts 16,738 genes whose expression is increased (red) or decreased (green) in presence of genetic abnormality. Differential expression associated with (A) hotspot mutations and (B) cytogenetic abnormalities confirms equivalence of expected pairs (e.g. NRAS and KRAS, BRAF and RAF1), but also proposes novel transcriptomic dysregulation effect of clinically relevant cytogenetic abnormalities, with yet uncharacterized molecular role in MM. Figure 1 Disclosures Kulkarni: M2GEN: Current Employment. Zhang:M2GEN: Current Employment. Hampton:M2GEN: Current Employment. Shain:GlaxoSmithKline: Speakers Bureau; Amgen: Speakers Bureau; Karyopharm: Research Funding, Speakers Bureau; AbbVie: Research Funding; Takeda: Honoraria, Speakers Bureau; Sanofi/Genzyme: Honoraria, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Janssen: Honoraria, Speakers Bureau; Celgene: Honoraria, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Adaptive: Consultancy, Honoraria; BMS: Honoraria, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau. Siqueira Silva:AbbVie: Research Funding; Karyopharm: Research Funding; NIH/NCI: Research Funding.

Download Full-text

A compendium of uniformly processed human gene expression and splicing quantitative trait loci

Nature Genetics ◽

10.1038/s41588-021-00924-w ◽

2021 ◽

Vol 53 (9) ◽

pp. 1290-1299

Author(s):

Nurlan Kerimov ◽

James D. Hayhurst ◽

Kateryna Peikova ◽

Jonathan R. Manning ◽

Peter Walter ◽

...

Keyword(s):

Gene Expression ◽

Quantitative Trait ◽

Target Genes ◽

Genome Wide Association Study ◽

Cell Types ◽

Summary Statistics ◽

Genome Wide ◽

Cell Type Specific ◽

Trait Locus ◽

Complex Human Traits

AbstractMany gene expression quantitative trait locus (eQTL) studies have published their summary statistics, which can be used to gain insight into complex human traits by downstream analyses, such as fine mapping and co-localization. However, technical differences between these datasets are a barrier to their widespread use. Consequently, target genes for most genome-wide association study (GWAS) signals have still not been identified. In the present study, we present the eQTL Catalogue (https://www.ebi.ac.uk/eqtl), a resource of quality-controlled, uniformly re-computed gene expression and splicing QTLs from 21 studies. We find that, for matching cell types and tissues, the eQTL effect sizes are highly reproducible between studies. Although most QTLs were shared between most bulk tissues, we identified a greater diversity of cell-type-specific QTLs from purified cell types, a subset of which also manifested as new disease co-localizations. Our summary statistics are freely available to enable the systematic interpretation of human GWAS associations across many cell types and tissues.

Download Full-text

Cell-Selective Regulation of CFTR Gene Expression: Relevance to Gene Editing Therapeutics

Genes ◽

10.3390/genes10030235 ◽

2019 ◽

Vol 10 (3) ◽

pp. 235 ◽

Cited By ~ 6

Author(s):

Hannah Swahn ◽

Ann Harris

Keyword(s):

Gene Expression ◽

Cystic Fibrosis ◽

Gene Editing ◽

Genetic Diseases ◽

3D Structure ◽

Three Dimensional ◽

Cell Types ◽

Regulatory Elements ◽

Cftr Gene ◽

Specific Expression

The cystic fibrosis transmembrane conductance regulator (CFTR) gene is an attractive target for gene editing approaches, which may yield novel therapeutic approaches for genetic diseases such as cystic fibrosis (CF). However, for gene editing to be effective, aspects of the three-dimensional (3D) structure and cis-regulatory elements governing the dynamic expression of CFTR need to be considered. In this review, we focus on the higher order chromatin organization required for normal CFTR locus function, together with the complex mechanisms controlling expression of the gene in different cell types impaired by CF pathology. Across all cells, the CFTR locus is organized into an invariant topologically associated domain (TAD) established by the architectural proteins CCCTC-binding factor (CTCF) and cohesin complex. Additional insulator elements within the TAD also recruit these factors. Although the CFTR promoter is required for basal levels of expression, cis-regulatory elements (CREs) in intergenic and intronic regions are crucial for cell-specific and temporal coordination of CFTR transcription. These CREs are recruited to the promoter through chromatin looping mechanisms and enhance cell-type-specific expression. These features of the CFTR locus should be considered when designing gene-editing approaches, since failure to recognize their importance may disrupt gene expression and reduce the efficacy of therapies.

Download Full-text