scholarly journals Comprehensive enhancer-target gene assignments improve gene set level interpretation of genome-wide regulatory data

2020 ◽  
Author(s):  
Tingting Qin ◽  
Christopher Lee ◽  
Raymond Cavalcante ◽  
Peter Orchard ◽  
Heming Yao ◽  
...  

AbstractRevealing the gene targets of distal regulatory elements is challenging yet critical for interpreting regulome data. Experiment-derived enhancer-gene links are restricted to a small set of enhancers and/or cell types, while the accuracy of genome-wide approaches remains elusive due to the lack of a systematic evaluation. We combined multiple spatial and in silico approaches for defining enhancer locations and linking them to their target genes aggregated across >500 cell types, generating 1,860 human genome-wide distal Enhancer to Target gene Definitions (EnTDefs). To evaluate performance, we used gene set enrichment testing on 87 independent ENCODE ChIP-seq datasets of 34 transcription factors (TFs) and assessed concordance of results with known TF Gene Ontology (GO) annotations., assuming that greater concordance with TF-GO annotation signifies better enrichment results and thus more accurate enhancer-to-gene assignments. Notably, the top ranked 741 (40%) EnTDefs significantly outperformed the common, naïve approach of linking distal regions to the nearest genes (FDR < 0.05), and the top 10 ranked EnTDefs performed well when applied to ChIP-seq data of other cell types. These general EnTDefs also showed comparable performance to EnTDefs generated using cell-type-specific data. Our findings illustrate the power of our approach to provide genome-wide interpretation regardless of cell type.

2020 ◽  
Author(s):  
SK Reilly ◽  
SJ Gosai ◽  
A Gutierrez ◽  
JC Ulirsch ◽  
M Kanai ◽  
...  

AbstractCRISPR screens for cis-regulatory elements (CREs) have shown unprecedented power to endogenously characterize the non-coding genome. To characterize CREs we developed HCR-FlowFISH (Hybridization Chain Reaction Fluorescent In-Situ Hybridization coupled with Flow Cytometry), which directly quantifies native transcripts within their endogenous loci following CRISPR perturbations of regulatory elements, eliminating the need for restrictive phenotypic assays such as growth or transcript-tagging. HCR-FlowFISH accurately quantifies gene expression across a wide range of transcript levels and cell types. We also developed CASA (CRISPR Activity Screen Analysis), a hierarchical Bayesian model to identify and quantify CRE activity. Using >270,000 perturbations, we identified CREs for GATA1, HDAC6, ERP29, LMO2, MEF2C, CD164, NMU, FEN1 and the FADS gene cluster. Our methods detect subtle gene expression changes and identify CREs regulating multiple genes, sometimes at different magnitudes and directions. We demonstrate the power of HCR-FlowFISH to parse genome-wide association signals by nominating causal variants and target genes.


2021 ◽  
Author(s):  
Yunhee Jeong ◽  
Reka Toth ◽  
Marlene Ganslmeier ◽  
Kersten Breuer ◽  
Christoph Plass ◽  
...  

DNA methylation sequencing is becoming increasingly popular, yielding genome-wide methylome data at single-base pair resolution through the novel cost- and labor-optimized protocols. It has tremendous potential for cell-type heterogeneity analysis, particularly in tumors, due to intrinsic read-level information. Although diverse deconvolution methods were developed to infer cell-type composition based on bulk sequencing-based methylomes, their systematic evaluation has not been performed so far. Here, we thoroughly review and evaluate five previously published deconvolution methods: Bayesian epiallele detection (BED), PRISM, csmFinder + coMethy, ClubCpG and MethylPurify, together with two array-based methods, MeDeCom and Houseman as a comparison group. Sequencing-based deconvolution methods consist of two main steps, informative region selection and cell-type composition estimation. Accordingly, we individually assessed the performance of each step and demonstrated the impact of the former step upon the performance of the following one. In conclusion, we demonstrate the best method showing the highest accuracy in different samples, and infer factors affecting cell-type deconvolution performance according to the number of cell types in the mixture. We found that cell-type deconvolution performance is influenced by different factors according to the number of components in the mixture. Whereas selecting similar genomic regions to DMRs generally contributed to increasing the performance in bi-component mixtures, the uniformity of cell-type distribution showed a high correlation with the performance in five cell-type bulk analyses.


Author(s):  
Tiit Örd ◽  
Kadri Õunap ◽  
Lindsey Stolze ◽  
Rédouane Aherrahrou ◽  
Valtteri Nurminen ◽  
...  

Rationale: Genome-wide association studies (GWAS) have identified hundreds of loci associated with coronary artery disease (CAD). Many of these loci are enriched in cis-regulatory elements (CREs) but not linked to cardiometabolic risk factors nor to candidate causal genes, complicating their functional interpretation. Objective: Single nucleus chromatin accessibility profiling of the human atherosclerotic lesions was used to investigate cell type-specific patterns of CREs, to understand transcription factors establishing cell identity and to interpret CAD-relevant, non-coding genetic variation. Methods and Results: We used single nucleus ATAC-seq to generate DNA accessibility maps in > 7,000 cells derived from human atherosclerotic lesions. We identified five major lesional cell types including endothelial cells, smooth muscle cells, monocyte/macrophages, NK/T-cells and B-cells and further investigated subtype characteristics of macrophages and smooth muscle cells transitioning into fibromyocytes. We demonstrated that CAD associated genetic variants are particularly enriched in endothelial and smooth muscle cell-specific open chromatin. Using single cell co-accessibility and cis-eQTL information, we prioritized putative target genes and candidate regulatory elements for ~30% of all known CAD loci. Finally, we performed genome-wide experimental fine-mapping of the CAD GWAS variants using epigenetic QTL analysis in primary human aortic endothelial cells and STARR-Seq massively parallel reporter assay in smooth muscle cells. This analysis identified potential causal SNP(s) and the associated target gene for over 30 CAD loci. We present several examples where the chromatin accessibility and gene expression could be assigned to one cell type predicting the cell type of action for CAD loci. Conclusions: These findings highlight the potential of applying snATAC-seq to human tissues in revealing relative contributions of distinct cell types to diseases and in identifying genes likely to be influenced by non-coding GWAS variants.


Author(s):  
Tianshun Gao ◽  
Jiang Qian

Abstract Enhancers are distal cis-regulatory elements that activate the transcription of their target genes. They regulate a wide range of important biological functions and processes, including embryogenesis, development, and homeostasis. As more and more large-scale technologies were developed for enhancer identification, a comprehensive database is highly desirable for enhancer annotation based on various genome-wide profiling datasets across different species. Here, we present an updated database EnhancerAtlas 2.0 (http://www.enhanceratlas.org/indexv2.php), covering 586 tissue/cell types that include a large number of normal tissues, cancer cell lines, and cells at different development stages across nine species. Overall, the database contains 13 494 603 enhancers, which were obtained from 16 055 datasets using 12 high-throughput experiment methods (e.g. H3K4me1/H3K27ac, DNase-seq/ATAC-seq, P300, POLR2A, CAGE, ChIA-PET, GRO-seq, STARR-seq and MPRA). The updated version is a huge expansion of the first version, which only contains the enhancers in human cells. In addition, we predicted enhancer–target gene relationships in human, mouse and fly. Finally, the users can search enhancers and enhancer–target gene relationships through five user-friendly, interactive modules. We believe the new annotation of enhancers in EnhancerAtlas 2.0 will facilitate users to perform useful functional analysis of enhancers in various genomes.


2019 ◽  
Author(s):  
Jill E. Moore ◽  
Henry Pratt ◽  
Michael Purcaro ◽  
Zhiping Weng

ABSTRACTMany genome-wide collections of candidate cis-regulatory elements (cCREs) have been defined using genomic and epigenomic data, but it remains a major challenge to connect these elements to their target genes. To facilitate the development of computational methods for predicting target genes, we developed a Benchmark of candidate Enhancer-Gene Interactions (BENGI) by integrating the Registry of cCREs we developed recently with experimentally-derived genomic interactions. We used BENGI to test several published computational methods for linking enhancers with genes, including signal correlation and the supervised learning methods TargetFinder and PEP. We found that while TargetFinder was the best performing method, it was modestly better than a baseline distance method for most benchmark datasets while trained and tested within the same cell type and that TargetFinder often did not outperform the distance method when applied across cell types. Our results suggest that current computational methods need to be improved and that BENGI presents a useful framework for method development and testing.


2020 ◽  
Author(s):  
Yang Eric Li ◽  
Sebastian Preissl ◽  
Xiaomeng Hou ◽  
Ziyang Zhang ◽  
Kai Zhang ◽  
...  

ABSTRACTThe mammalian cerebrum performs high level sensory, motor control and cognitive functions through highly specialized cortical networks and subcortical nuclei. Recent surveys of mouse and human brains with single cell transcriptomics1–3 and high-throughput imaging technologies4,5 have uncovered hundreds of neuronal cell types and a variety of non-neuronal cell types distributed in different brain regions, but the cell-type-specific transcriptional regulatory programs responsible for the unique identity and function of each brain cell type have yet to be elucidated. Here, we probe the accessible chromatin in >800,000 individual nuclei from 45 regions spanning the adult mouse isocortex, olfactory bulb, hippocampus and cerebral nuclei, and use the resulting data to define 491,818 candidate cis regulatory DNA elements in 160 distinct sub-types. We link a significant fraction of them to putative target genes expressed in diverse cerebral cell types and uncover transcriptional regulators involved in a broad spectrum of molecular and cellular pathways in different neuronal and glial cell populations. Our results provide a foundation for comprehensive analysis of gene regulatory programs of the mammalian brain and assist in the interpretation of non-coding risk variants associated with various neurological disease and traits in humans. To facilitate the dissemination of information, we have set up a web portal (http://catlas.org/mousebrain).


2016 ◽  
Author(s):  
Shashank Singh ◽  
Yang Yang ◽  
Barnabás Póczos ◽  
Jian Ma

AbstractIn the human genome, distal enhancers are involved in regulating target genes through proxi-mal promoters by forming enhancer-promoter interactions. Although recently developed high-throughput experimental approaches have allowed us to recognize potential enhancer-promoter interactions genome-wide, it is still largely unclear to what extent the sequence-level information encoded in our genome help guide such interactions. Here we report a new computational method (named “SPEID”) using deep learning models to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given. Our results across six different cell types demonstrate that SPEID is effective in predicting enhancer-promoter interactions as compared to state-of-the-art methods that only use information from a single cell type. As a proof-of-principle, we also applied SPEID to identify somatic non-coding mutations in melanoma samples that may have reduced enhancer-promoter interactions in tumor genomes. This work demonstrates that deep learning models can help reveal that sequence-based features alone are sufficient to reliably predict enhancer-promoter interactions genome-wide.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Sinisa Hrvatin ◽  
Christopher P Tzeng ◽  
M Aurel Nagy ◽  
Hume Stroud ◽  
Charalampia Koutsioumpa ◽  
...  

Enhancers are the primary DNA regulatory elements that confer cell type specificity of gene expression. Recent studies characterizing individual enhancers have revealed their potential to direct heterologous gene expression in a highly cell-type-specific manner. However, it has not yet been possible to systematically identify and test the function of enhancers for each of the many cell types in an organism. We have developed PESCA, a scalable and generalizable method that leverages ATAC- and single-cell RNA-sequencing protocols, to characterize cell-type-specific enhancers that should enable genetic access and perturbation of gene function across mammalian cell types. Focusing on the highly heterogeneous mammalian cerebral cortex, we apply PESCA to find enhancers and generate viral reagents capable of accessing and manipulating a subset of somatostatin-expressing cortical interneurons with high specificity. This study demonstrates the utility of this platform for developing new cell-type-specific viral reagents, with significant implications for both basic and translational research.


2021 ◽  
Author(s):  
Sneha Gopalan ◽  
Yuqing Wang ◽  
Nicholas W. Harper ◽  
Manuel Garber ◽  
Thomas G Fazzio

Methods derived from CUT&RUN and CUT&Tag enable genome-wide mapping of the localization of proteins on chromatin from as few as one cell. These and other mapping approaches focus on one protein at a time, preventing direct measurements of co-localization of different chromatin proteins in the same cells and requiring prioritization of targets where samples are limiting. Here we describe multi-CUT&Tag, an adaptation of CUT&Tag that overcomes these hurdles by using antibody-specific barcodes to simultaneously map multiple proteins in the same cells. Highly specific multi-CUT&Tag maps of histone marks and RNA Polymerase II uncovered sites of co-localization in the same cells, active and repressed genes, and candidate cis-regulatory elements. Single-cell multi-CUT&Tag profiling facilitated identification of distinct cell types from a mixed population and characterization of cell type-specific chromatin architecture. In sum, multi-CUT&Tag increases the information content per cell of epigenomic maps, facilitating direct analysis of the interplay of different proteins on chromatin.


Author(s):  
Jieru Li ◽  
Alexandros Pertsinidis

Establishing cell-type-specific gene expression programs relies on the action of distal enhancers, cis-regulatory elements that can activate target genes over large genomic distances — up to Mega-bases away. How distal enhancers physically relay regulatory information to target promoters has remained a mystery. Here, we review the latest developments and insights into promoter–enhancer communication mechanisms revealed by live-cell, real-time single-molecule imaging approaches.


Sign in / Sign up

Export Citation Format

Share Document