scholarly journals OCHROdb: a comprehensive, quality checked database of open chromatin regions from sequencing data

2018 ◽  
Author(s):  
Parisa Shooshtari ◽  
Samantha Feng ◽  
Viswateja Nelakuditi ◽  
Justin Foong ◽  
Michael Brudno ◽  
...  

ABSTRACTInternational consortia, including ENCODE, Roadmap Epigenomics, Genomics of Gene Regulation and Blueprint Epigenome have made large-scale datasets of open chromatin regions publicly available. While these datasets are extremely useful for studying mechanisms of gene regulation in disease and cell development, they only identify open chromatin regions in individual samples. A uniform comparison of accessibility of the same regulatory sites across multiple samples is necessary to correlate open chromatin accessibility and expression of target genes across matched cell types. Additionally, although replicate samples are available for majority of cell types, a comprehensive replication-based quality checking of individual regulatory sites is still lacking. We have integrated 828 DNase-I hypersensitive sequencing samples, which we have uniformly processed and then clustered their regulatory regions across all samples. We checked the quality of open-chromatin regions using our replication test. This has resulted in a comprehensive, quality-checked database of Open CHROmatin (OCHROdb) regions for 194 unique human cell types and cell lines which can serve as a reference for gene regulatory studies involving open chromatin. We have made this resource publicly available: users can download the whole database, or query it for their genomic regions of interest and visualize the results in an interactive genome browser.


2020 ◽  
Vol 48 (W1) ◽  
pp. W193-W199 ◽  
Author(s):  
Nina Baumgarten ◽  
Dennis Hecker ◽  
Sivarajan Karunanithi ◽  
Florian Schmidt ◽  
Markus List ◽  
...  

Abstract A current challenge in genomics is to interpret non-coding regions and their role in transcriptional regulation of possibly distant target genes. Genome-wide association studies show that a large part of genomic variants are found in those non-coding regions, but their mechanisms of gene regulation are often unknown. An additional challenge is to reliably identify the target genes of the regulatory regions, which is an essential step in understanding their impact on gene expression. Here we present the EpiRegio web server, a resource of regulatory elements (REMs). REMs are genomic regions that exhibit variations in their chromatin accessibility profile associated with changes in expression of their target genes. EpiRegio incorporates both epigenomic and gene expression data for various human primary cell types and tissues, providing an integrated view of REMs in the genome. Our web server allows the analysis of genes and their associated REMs, including the REM’s activity and its estimated cell type-specific contribution to its target gene’s expression. Further, it is possible to explore genomic regions for their regulatory potential, investigate overlapping REMs and by that the dissection of regions of large epigenomic complexity. EpiRegio allows programmatic access through a REST API and is freely available at https://epiregio.de/.



2019 ◽  
Author(s):  
Eirene Markenscoff-Papadimitriou ◽  
Sean Whalen ◽  
Pawel Przytycki ◽  
Reuben Thomas ◽  
Fadya Binyameen ◽  
...  

AbstractGene expression differs between cell types and regions within complex tissues such as the developing brain. To discover regulatory elements underlying this specificity, we generated genome-wide maps of chromatin accessibility in eleven anatomically-defined regions of the developing human telencephalon, including upper and deep layers of the prefrontal cortex. We predicted a subset of open chromatin regions (18%) that are most likely to be active enhancers, many of which are dynamic with 26% differing between early and late mid-gestation and 28% present in only one brain region. These region-specific predicted regulatory elements (pREs) are enriched proximal to genes with expression differences across regions and developmental stages and harbor distinct sequence motifs that suggest potential upstream regulators of regional and temporal transcription. We leverage this atlas to identify regulators of genes associated with autism spectrum disorder (ASD) including an enhancer of BCL11A, validated in mouse, and two functional de novo mutations in individuals with ASD in an enhancer of SLC6A1, validated in neuroblastoma cells. These applications demonstrate the utility of this atlas for decoding neurodevelopmental gene regulation in health and disease.SummaryTo discover regulatory elements driving the specificity of gene expression in different cell types and regions of the developing human brain, we generated an atlas of open chromatin from eleven dissected regions of the mid-gestation human telencephalon, including upper and deep layers of the prefrontal cortex. We identified a subset of open chromatin regions (OCRs), termed predicted regulatory elements (pREs), that are likely to function as developmental brain enhancers. pREs showed regional differences in chromatin accessibility, including many specific to one brain region, and were correlated with gene expression differences across the same regions and gestational ages. pREs allowed us to map neurodevelopmental disorder risk genes to developing telencephalic regions, and we identified three functional de novo noncoding variants in pREs that alter enhancer function. In addition, transgenic experiments in mouse validated enhancer activity for a pRE proximal to BCL11A, showing how this atlas serves as a resource for decoding neurodevelopmental gene regulation in health and disease.



eLife ◽  
2017 ◽  
Vol 6 ◽  
Author(s):  
Colleen E Hannon ◽  
Shelby A Blythe ◽  
Eric F Wieschaus

In Drosophila, graded expression of the maternal transcription factor Bicoid (Bcd) provides positional information to activate target genes at different positions along the anterior-posterior axis. We have measured the genome-wide binding profile of Bcd using ChIP-seq in embryos expressing single, uniform levels of Bcd protein, and grouped Bcd-bound targets into four classes based on occupancy at different concentrations. By measuring the biochemical affinity of target enhancers in these classes in vitro and genome-wide chromatin accessibility by ATAC-seq, we found that the occupancy of target sequences by Bcd is not primarily determined by Bcd binding sites, but by chromatin context. Bcd drives an open chromatin state at a subset of its targets. Our data support a model where Bcd influences chromatin structure to gain access to concentration-sensitive targets at high concentrations, while concentration-insensitive targets are found in more accessible chromatin and are bound at low concentrations. This may be a common property of developmental transcription factors that must gain early access to their target enhancers while the chromatin state of the genome is being remodeled during large-scale transitions in the gene regulatory landscape.



Science ◽  
2018 ◽  
Vol 361 (6409) ◽  
pp. 1380-1385 ◽  
Author(s):  
Junyue Cao ◽  
Darren A. Cusanovich ◽  
Vijay Ramani ◽  
Delasa Aghamirzaie ◽  
Hannah A. Pliner ◽  
...  

Although we can increasingly measure transcription, chromatin, methylation, and other aspects of molecular biology at single-cell resolution, most assays survey only one aspect of cellular biology. Here we describe sci-CAR, a combinatorial indexing–based coassay that jointly profiles chromatin accessibility and mRNA (CAR) in each of thousands of single cells. As a proof of concept, we apply sci-CAR to 4825 cells, including a time series of dexamethasone treatment, as well as to 11,296 cells from the adult mouse kidney. With the resulting data, we compare the pseudotemporal dynamics of chromatin accessibility and gene expression, reconstruct the chromatin accessibility profiles of cell types defined by RNA profiles, and link cis-regulatory sites to their target genes on the basis of the covariance of chromatin accessibility and transcription across large numbers of single cells.



Viruses ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1235
Author(s):  
Leah D. Brandt ◽  
Shuang Guo ◽  
Kevin W. Joseph ◽  
Jana L. Jacobs ◽  
Asma Naqvi ◽  
...  

Efforts to cure HIV-1 infection require better quantification of the HIV-1 reservoir, particularly the clones of cells harboring replication-competent (intact) proviruses, termed repliclones. The digital droplet PCR assays commonly used to quantify intact proviruses do not differentiate among specific repliclones, thus the dynamics of repliclones are not well defined. The major challenge in tracking repliclones is the relative rarity of the cells carrying specific intact proviruses. To date, detection and accurate quantification of repliclones requires in-depth integration site sequencing. Here, we describe a simplified workflow using integration site-specific qPCR (IS-qPCR) to determine the frequencies of the proviruses integrated in individual repliclones. We designed IS-qPCR to determine the frequencies of repliclones and clones of cells that carry defective proviruses in samples from three donors. Comparing the results of IS-qPCR with deep integration site sequencing data showed that the two methods yielded concordant estimates of clone frequencies (r = 0.838). IS-qPCR is a potentially valuable tool that can be applied to multiple samples and cell types over time to measure the dynamics of individual repliclones and the efficacy of treatments designed to eliminate them.



eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Elliott Swanson ◽  
Cara Lord ◽  
Julian Reading ◽  
Alexander T Heubeck ◽  
Palak C Genge ◽  
...  

Single-cell measurements of cellular characteristics have been instrumental in understanding the heterogeneous pathways that drive differentiation, cellular responses to signals, and human disease. Recent advances have allowed paired capture of protein abundance and transcriptomic state, but a lack of epigenetic information in these assays has left a missing link to gene regulation. Using the heterogeneous mixture of cells in human peripheral blood as a test case, we developed a novel scATAC-seq workflow that increases signal-to-noise and allows paired measurement of cell surface markers and chromatin accessibility: integrated cellular indexing of chromatin landscape and epitopes, called ICICLE-seq. We extended this approach using a droplet-based multiomics platform to develop a trimodal assay that simultaneously measures transcriptomics (scRNA-seq), epitopes, and chromatin accessibility (scATAC-seq) from thousands of single cells, which we term TEA-seq. Together, these multimodal single-cell assays provide a novel toolkit to identify type-specific gene regulation and expression grounded in phenotypically defined cell types.



2018 ◽  
Author(s):  
Gary LeRoy ◽  
Ozgur Oksuz ◽  
Nicolas Descostes ◽  
Yuki Aoi ◽  
Rais Ganai ◽  
...  

FACT (facilitates chromatin transcription) is a protein complex that allows RNAPII to overcome the nucleosome-induced barrier to transcription. While abundant in undifferentiated cells and many cancers, FACT is not abundant or is absent in most tissues. Therefore, we screened for additional proteins that might replace FACT upon differentiation. Here we report the identification of two such proteins, LEDGF and HDGF2, each containing two HMGA-like AT-hooks and a PWWP methyl-lysine reading domain known to bind to H3K36me2 and H3K36me3. LEDGF and HDGF2 localize with H3K36me2/3 at genomic regions containing active genes, usually adjacent to H3K27me3 domains. In myoblasts, where FACT expression is low, LEDGF and HDGF2 are enriched on most active genes. Upon differentiation to myotubes, LEDGF levels decrease across the genome while HDGF2 levels are maintained. Moreover, HDGF2 is recruited to the majority of myotube up-regulated genes and is required for their proper expression. HDGF2 knockout myoblasts exhibit an accumulation of paused RNAPII proximal to the first nucleosome within the transcribed region of many HDGF2 target genes, indicating a defect in early elongation. We propose that LEDGF and HDGF2 substitute for FACT in differentiated tissues and that their distribution on the genome helps maintain transcriptional programs unique to particular cell types.



Genes ◽  
2019 ◽  
Vol 10 (7) ◽  
pp. 536 ◽  
Author(s):  
Xiaobo Zhao ◽  
Liming Gan ◽  
Caixia Yan ◽  
Chunjuan Li ◽  
Quanxi Sun ◽  
...  

Long non-coding RNAs (lncRNAs) are involved in various regulatory processes although they do not encode protein. Presently, there is little information regarding the identification of lncRNAs in peanut (Arachis hypogaea Linn.). In this study, 50,873 lncRNAs of peanut were identified from large-scale published RNA sequencing data that belonged to 124 samples involving 15 different tissues. The average lengths of lncRNA and mRNA were 4335 bp and 954 bp, respectively. Compared to the mRNAs, the lncRNAs were shorter, with fewer exons and lower expression levels. The 4713 co-expression lncRNAs (expressed in all samples) were used to construct co-expression networks by using the weighted correlation network analysis (WGCNA). LncRNAs correlating with the growth and development of different peanut tissues were obtained, and target genes for 386 hub lncRNAs of all lncRNAs co-expressions were predicted. Taken together, these findings can provide a comprehensive identification of lncRNAs in peanut.



2020 ◽  
Vol 16 (11) ◽  
pp. e1008422
Author(s):  
Azusa Tanaka ◽  
Yasuhiro Ishitsuka ◽  
Hiroki Ohta ◽  
Akihiro Fujimoto ◽  
Jun-ichirou Yasunaga ◽  
...  

The huge amount of data acquired by high-throughput sequencing requires data reduction for effective analysis. Here we give a clustering algorithm for genome-wide open chromatin data using a new data reduction method. This method regards the genome as a string of 1s and 0s based on a set of peaks and calculates the Hamming distances between the strings. This algorithm with the systematically optimized set of peaks enables us to quantitatively evaluate differences between samples of hematopoietic cells and classify cell types, potentially leading to a better understanding of leukemia pathogenesis.



2019 ◽  
Author(s):  
M-M Aynaud ◽  
O Mirabeau ◽  
N Gruel ◽  
S Grossetête ◽  
V Boeva ◽  
...  

SummaryEWSR1-FLI1, the chimeric oncogene specific for Ewing sarcoma (EwS), induces a cascade of signaling events leading to cell transformation. However, it remains elusive how genetically homogeneous EwS cells can drive heterogeneity of transcriptional programs. Here, we combined independent component analysis of single cell RNA-sequencing data from diverse cell types and model systems with time-resolved mapping of EWSR1-FLI1 binding sites and of open chromatin regions to characterize dynamic cellular processes associated with EWSR1-FLI1 activity. We thus defined an exquisitely specific and direct, super-enhancer-driven EWSR1-FLI1 program. In EwS tumors, cell proliferation was associated with a well-defined range of EWSR1-FLI1 activity; moreover, cells with a high EWSR1-FLI1 activity presented a strong oxidative phosphorylation metabolism. In contrast, a subpopulation of cells from below and above optimal EWSR1-FLI1 activity was characterized by increased hypoxia. Overall, our study reveals sources of intratumoral heterogeneity within Ewing tumors.



Sign in / Sign up

Export Citation Format

Share Document