TRlnc: a comprehensive database for human transcriptional regulatory information of lncRNAs

Author(s):  
Yanyu Li ◽  
Xuecang Li ◽  
Yongsan Yang ◽  
Meng Li ◽  
Fengcui Qian ◽  
...  

Abstract Long noncoding RNAs (lncRNAs) have been proven to play important roles in transcriptional processes and biological functions. With the increasing study of human diseases and biological processes, information in human H3K27ac ChIP-seq, ATAC-seq and DNase-seq datasets is accumulating rapidly, resulting in an urgent need to collect and process data to identify transcriptional regulatory regions of lncRNAs. We therefore developed a comprehensive database for human regulatory information of lncRNAs (TRlnc, http://bio.licpathway.net/TRlnc), which aimed to collect available resources of transcriptional regulatory regions of lncRNAs and to annotate and illustrate their potential roles in the regulation of lncRNAs in a cell type-specific manner. The current version of TRlnc contains 8 683 028 typical enhancers/super-enhancers and 32 348 244 chromatin accessibility regions associated with 91 906 human lncRNAs. These regions are identified from over 900 human H3K27ac ChIP-seq, ATAC-seq and DNase-seq samples. Furthermore, TRlnc provides the detailed genetic and epigenetic annotation information within transcriptional regulatory regions (promoter, enhancer/super-enhancer and chromatin accessibility regions) of lncRNAs, including common SNPs, risk SNPs, eQTLs, linkage disequilibrium SNPs, transcription factors, methylation sites, histone modifications and 3D chromatin interactions. It is anticipated that the use of TRlnc will help users to gain in-depth and useful insights into the transcriptional regulatory mechanisms of lncRNAs.

2020 ◽  
Vol 49 (D1) ◽  
pp. D1431-D1444 ◽  
Author(s):  
Qi Pan ◽  
Yue-Juan Liu ◽  
Xue-Feng Bai ◽  
Xiao-Le Han ◽  
Yong Jiang ◽  
...  

Abstract With the study of human diseases and biological processes increasing, a large number of non-coding variants have been identified and facilitated. The rapid accumulation of genetic and epigenomic information has resulted in an urgent need to collect and process data to explore the regulation of non-coding variants. Here, we developed a comprehensive variation annotation database for human (VARAdb, http://www.licpathway.net/VARAdb/), which specifically considers non-coding variants. VARAdb provides annotation information for 577,283,813 variations and novel variants, prioritizes variations based on scores using nine annotation categories, and supports pathway downstream analysis. Importantly, VARAdb integrates a large amount of genetic and epigenomic data into five annotation sections, which include ‘Variation information’, ‘Regulatory information’, ‘Related genes’, ‘Chromatin accessibility’ and ‘Chromatin interaction’. The detailed annotation information consists of motif changes, risk SNPs, LD SNPs, eQTLs, clinical variant-drug-gene pairs, sequence conservation, somatic mutations, enhancers, super enhancers, promoters, transcription factors, chromatin states, histone modifications, chromatin accessibility regions and chromatin interactions. This database is a user-friendly interface to query, browse and visualize variations and related annotation information. VARAdb is a useful resource for selecting potential functional variations and interpreting their effects on human diseases and biological processes.


2018 ◽  
Author(s):  
Wei Vivian Li ◽  
Shan Li ◽  
Xin Tong ◽  
Ling Deng ◽  
Hubing Shi ◽  
...  

AbstractGenome-wide accurate identification and quantification of full-length mRNA isoforms is crucial for investigating transcriptional and post-transcriptional regulatory mechanisms of biological phenomena. Despite continuing efforts in developing effective computational tools to identify or assemble full-length mRNA isoforms from second-generation RNA-seq data, it remains a challenge to accurately identify mRNA isoforms from short sequence reads due to the substantial information loss in RNA-seq experiments. Here we introduce a novel statistical method, AIDE (Annotation-assisted Isoform DiscovEry), the first approach that directly controls false isoform discoveries by implementing the testing-based model selection principle. Solving the isoform discovery problem in a stepwise and conservative manner, AIDE prioritizes the annotated isoforms and precisely identifies novel isoforms whose addition significantly improves the explanation of observed RNA-seq reads. We evaluate the performance of AIDE based on multiple simulated and real RNA-seq datasets followed by a PCR-Sanger sequencing validation. Our results show that AIDE effectively leverages the annotation information to compensate the information loss due to short read lengths. AIDE achieves the highest precision in isoform discovery and the lowest error rates in isoform abundance estimation, compared with three state-of-the-art methods Cufflinks, SLIDE, and StringTie. As a robust bioinformatics tool for transcriptome analysis, AIDE will enable researchers to discover novel transcripts with high confidence.


2021 ◽  
Author(s):  
Susu Zhang ◽  
Jing Wang ◽  
Qi Liu ◽  
W Hayes McDonald ◽  
Monica Bomber ◽  
...  

Transcriptional control is a highly dynamic process that changes rapidly in response to various cellular and extracellular cues1. Thus, it is difficult to achieve a mechanistic understanding of transcription factor function using traditional genetic deletion or RNAi methods, because these slow approaches make it challenging to distinguish direct from indirect transcriptional effects. Here, we used a chemical-genetic approach to rapidly degrade a canonical transcriptional activator, PAX3-FOXO12-6 to define how the t(2;13)(q35;q14) disrupts normal gene expression programs to trigger cancer. By coupling rapid protein degradation with the analysis of nascent transcription over short time courses, we identified a core transcriptional network that rapidly collapsed upon PAX3-FOXO1 degradation. Moreover, loss of PAX3-FOXO1 impaired RNA polymerase pause release and transcription elongation at regulated gene targets. The activity of PAX3-FOXO1 at enhancers controlling this core network was surprisingly selective and often only a single element within a complex super-enhancer was affected. In addition, fusion of the endogenous PAX3-FOXO1 with APEX2 identified proteins in close proximity with PAX3-FOXO1, including ARID1A and MYOD1. We found that continued expression of PAX3-FOXO1 was required to maintain chromatin accessibility and allow neighboring DNA binding proteins and chromatin remodeling complexes to associate with this small number of regulated enhancers. Overall, this work provides a detailed mechanism by which PAX3-FOXO1 maintains an oncogenic transcriptional regulatory network.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Mingsen Li ◽  
Huaxing Huang ◽  
Lingyu Li ◽  
Chenxi He ◽  
Liqiong Zhu ◽  
...  

AbstractAdult stem cell identity, plasticity, and homeostasis are precisely orchestrated by lineage-restricted epigenetic and transcriptional regulatory networks. Here, by integrating super-enhancer and chromatin accessibility landscapes, we delineate core transcription regulatory circuitries (CRCs) of limbal stem/progenitor cells (LSCs) and find that RUNX1 and SMAD3 are required for maintenance of corneal epithelial identity and homeostasis. RUNX1 or SMAD3 depletion inhibits PAX6 and induces LSCs to differentiate into epidermal-like epithelial cells. RUNX1, PAX6, and SMAD3 (RPS) interact with each other and synergistically establish a CRC to govern the lineage-specific cis-regulatory atlas. Moreover, RUNX1 shapes LSC chromatin architecture via modulating H3K27ac deposition. Disturbance of RPS cooperation results in cell identity switching and dysfunction of the corneal epithelium, which is strongly linked to various human corneal diseases. Our work highlights CRC TF cooperativity for establishment of stem cell identity and lineage commitment, and provides comprehensive regulatory principles for human stratified epithelial homeostasis and pathogenesis.


2020 ◽  
Author(s):  
Minjun Park ◽  
Salvi Singh ◽  
Francisco Jose Grisanti Canozo ◽  
Md. Abul Hassan Samee

AbstractMassively parallel reporter assays (MPRAs) have enabled the study of transcriptional regulatory mechanisms at an unprecedented scale and with high quantitative resolution. However, this realm lacks models that can discover sequence-specific signals de novo from the data and integrate them in a mechanistic way. We present MuSeAM (Multinomial CNNs for Sequence Activity Modeling), a convolutional neural network that overcomes this gap. MuSeAM utilizes multinomial convolutions that directly model sequence-specific motifs of protein-DNA binding. We demonstrate that MuSeAM fits MPRA data with high accuracy and generalizes over other tasks such as predicting chromatin accessibility and prioritizing potentially functional variants.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Frederique Ruf-Zamojski ◽  
Zidong Zhang ◽  
Michel Zamojski ◽  
Gregory R. Smith ◽  
Natalia Mendelev ◽  
...  

AbstractTo provide a multi-omics resource and investigate transcriptional regulatory mechanisms, we profile the transcriptome, chromatin accessibility, and methylation status of over 70,000 single nuclei (sn) from adult mouse pituitaries. Paired snRNAseq and snATACseq datasets from individual animals highlight a continuum between developmental epigenetically-encoded cell types and transcriptionally-determined transient cell states. Co-accessibility analysis-based identification of a putative Fshb cis-regulatory domain that overlaps the fertility-linked rs11031006 human polymorphism, followed by experimental validation illustrate the use of this resource for hypothesis generation. We also identify transcriptional and chromatin accessibility programs distinguishing each major cell type. Regulons, which are co-regulated gene sets sharing binding sites for a common transcription factor driver, recapitulate cell type clustering. We identify both cell type-specific and sex-specific regulons that are highly correlated with promoter accessibility, but not with methylation state, supporting the centrality of chromatin accessibility in shaping cell-defining transcriptional programs. The sn multi-omics atlas is accessible at snpituitaryatlas.princeton.edu.


2006 ◽  
Vol 04 (02) ◽  
pp. 469-482 ◽  
Author(s):  
NATALIA POLOULIAKH ◽  
TOHRU NATSUME ◽  
HAJIME HARADA ◽  
WATARU FUJIBUCHI ◽  
PAUL HORTON

The identification of cis-elements (motifs) in the regulatory regions of higher eukaryotes is an important and challenging problem in computational biology. Eukaryotic transcriptional regulatory mechanisms pose several difficulties for promoter analysis: including a high variance in the motif locations, frequently large divergence from motif consensus patterns, and a large amount of repetitive elements (confusing to many motif finding procedures). One promising approach to this difficult problem involves cross-species comparison. In this work we analyzed the full-length regulatory regions of genes involved in the G-protein coupling MAP kinase pathway and compared the results with ribosomal genes using human, mouse and rat genomic data. We found 19 high likely transcription factors (TFs) candidates for MAPK and 12 TFs for the ribosomal dataset. In the case of the MAPK dataset, regulatory regions of genes functionally grouped as receptors and MAP-core genes were found mostly highly conserved across the three species.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Sarah E. Pierce ◽  
Jeffrey M. Granja ◽  
William J. Greenleaf

AbstractChromatin accessibility profiling can identify putative regulatory regions genome wide; however, pooled single-cell methods for assessing the effects of regulatory perturbations on accessibility are limited. Here, we report a modified droplet-based single-cell ATAC-seq protocol for perturbing and evaluating dynamic single-cell epigenetic states. This method (Spear-ATAC) enables simultaneous read-out of chromatin accessibility profiles and integrated sgRNA spacer sequences from thousands of individual cells at once. Spear-ATAC profiling of 104,592 cells representing 414 sgRNA knock-down populations reveals the temporal dynamics of epigenetic responses to regulatory perturbations in cancer cells and the associations between transcription factor binding profiles.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Yapeng Li ◽  
Junfeng Gao ◽  
Mohammad Kamran ◽  
Laura Harmacek ◽  
Thomas Danhorn ◽  
...  

AbstractMast cells are critical effectors of allergic inflammation and protection against parasitic infections. We previously demonstrated that transcription factors GATA2 and MITF are the mast cell lineage-determining factors. However, it is unclear whether these lineage-determining factors regulate chromatin accessibility at mast cell enhancer regions. In this study, we demonstrate that GATA2 promotes chromatin accessibility at the super-enhancers of mast cell identity genes and primes both typical and super-enhancers at genes that respond to antigenic stimulation. We find that the number and densities of GATA2- but not MITF-bound sites at the super-enhancers are several folds higher than that at the typical enhancers. Our studies reveal that GATA2 promotes robust gene transcription to maintain mast cell identity and respond to antigenic stimulation by binding to super-enhancer regions with dense GATA2 binding sites available at key mast cell genes.


Sign in / Sign up

Export Citation Format

Share Document