HRT Atlas v1.1 database: redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasets

Mapping Intimacies ◽

10.1101/787150 ◽

2019 ◽

Author(s):

Bidossessi Wilfried Hounkpe ◽

Francine Chenou ◽

Franciele Lima ◽

Erich Vinicius de Paula

Keyword(s):

Gene Expression ◽

Wild Type Mouse ◽

Housekeeping Genes ◽

Regulatory Elements ◽

Data Sets ◽

Rna Seq ◽

Cellular Functions ◽

Evolutionary Features ◽

Small Device ◽

Human And Mouse

AbstractHousekeeping (HK) genes are constitutively expressed genes that are required for the maintenance of basic cellular functions. Despite their importance in the calibration of gene expression, as well as the understanding of many genomic and evolutionary features, important discrepancies have been observed in studies that previously identified these genes. Here, we present Housekeeping Transcript Atlas (HRT Atlas v1.0, www.housekeeping.unicamp.br) a web-based database which addresses some of the previously observed limitations in the identification of these genes, and offers a more accurate database of human and mouse HK genes and transcripts. The database was generated by mining massive human and mouse RNA-seq data sets, including 12,482 and 507 high-quality RNA-seq samples from 82 human non-disease tissues/cells and 15 healthy tissues/cells of C57BL/6 wild type mouse, respectively. User can visualize the expression and download lists of 2,158 human HK transcripts from 2,176 HK genes and 3,024 mouse HK transcripts from 3,277 mouse HK genes. HRT Atlas also offers the most stable and suitable tissue selective candidate reference transcripts for normalization of qPCR experiments. Specific primers and predicted modifiers of gene expression for some of these HK transcripts are also proposed. HRT Atlas has also been integrated with regulatory elements from Epiregio server. All of these resources can be accessed and downloaded from any computer or small device web browsers.

HRT Atlas v1.0 database: redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasets

Nucleic Acids Research ◽

10.1093/nar/gkaa609 ◽

2020 ◽

Cited By ~ 3

Author(s):

Bidossessi Wilfried Hounkpe ◽

Francine Chenou ◽

Franciele de Lima ◽

Erich Vinicius De Paula

Keyword(s):

Gene Expression ◽

Wild Type Mouse ◽

Housekeeping Genes ◽

Regulatory Elements ◽

Data Sets ◽

Rna Seq ◽

Cellular Functions ◽

Evolutionary Features ◽

Reference Transcript ◽

Human And Mouse

Abstract Housekeeping (HK) genes are constitutively expressed genes that are required for the maintenance of basic cellular functions. Despite their importance in the calibration of gene expression, as well as the understanding of many genomic and evolutionary features, important discrepancies have been observed in studies that previously identified these genes. Here, we present Housekeeping and Reference Transcript Atlas (HRT Atlas v1.0, www.housekeeping.unicamp.br) a web-based database which addresses some of the previously observed limitations in the identification of these genes, and offers a more accurate database of human and mouse HK genes and transcripts. The database was generated by mining massive human and mouse RNA-seq data sets, including 11 281 and 507 high-quality RNA-seq samples from 52 human non-disease tissues/cells and 14 healthy tissues/cells of C57BL/6 wild type mouse, respectively. User can visualize the expression and download lists of 2158 human HK transcripts from 2176 HK genes and 3024 mouse HK transcripts from 3277 mouse HK genes. HRT Atlas also offers the most stable and suitable tissue selective candidate reference transcripts for normalization of qPCR experiments. Specific primers and predicted modifiers of gene expression for some of these HK transcripts are also proposed. HRT Atlas has also been integrated with a regulatory elements resource from Epiregio server.

A Global Vista of the Epigenomic State of the Mouse Submandibular Gland

Journal of Dental Research ◽

10.1177/00220345211012000 ◽

2021 ◽

pp. 002203452110120

Author(s):

C. Gluck ◽

S. Min ◽

A. Oyelakin ◽

M. Che ◽

E. Horeth ◽

...

Keyword(s):

Gene Expression ◽

Transcriptional Control ◽

Expression Patterns ◽

Global Gene Expression ◽

Regulatory Elements ◽

Specific Gene ◽

Submandibular Salivary Gland ◽

Data Sets ◽

Control Mechanisms ◽

Chromatin Immunoprecipitation Sequencing

The parotid, submandibular, and sublingual glands represent a trio of oral secretory glands whose primary function is to produce saliva, facilitate digestion of food, provide protection against microbes, and maintain oral health. While recent studies have begun to shed light on the global gene expression patterns and profiles of salivary glands, particularly those of mice, relatively little is known about the location and identity of transcriptional control elements. Here we have established the epigenomic landscape of the mouse submandibular salivary gland (SMG) by performing chromatin immunoprecipitation sequencing experiments for 4 key histone marks. Our analysis of the comprehensive SMG data sets and comparisons with those from other adult organs have identified critical enhancers and super-enhancers of the mouse SMG. By further integrating these findings with complementary RNA-sequencing based gene expression data, we have unearthed a number of molecular regulators such as members of the Fox family of transcription factors that are enriched and likely to be functionally relevant for SMG biology. Overall, our studies provide a powerful atlas of cis-regulatory elements that can be leveraged for better understanding the transcriptional control mechanisms of the mouse SMG, discovery of novel genetic switches, and modulating tissue-specific gene expression in a targeted fashion.

MUREN: a robust and multi-reference approach of RNA-seq transcript normalization

BMC Bioinformatics ◽

10.1186/s12859-021-04288-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yance Feng ◽

Lei M. Li

Keyword(s):

Biological Significance ◽

Housekeeping Genes ◽

R Package ◽

Data Sets ◽

Statistical Regression ◽

Rna Seq ◽

Least Trimmed Squares ◽

Standard Data ◽

Wide Range ◽

Multiple References

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.

Application of qRT-PCR and RNA-Seq analysis for the identification of housekeeping genes useful for normalization of gene expression values during Striga hermonthica development

Molecular Biology Reports ◽

10.1007/s11033-012-2417-y ◽

2012 ◽

Vol 40 (4) ◽

pp. 3395-3407 ◽

Cited By ~ 22

Author(s):

M. Fernández-Aparicio ◽

K. Huang ◽

E. K. Wafula ◽

L. A. Honaas ◽

N. J. Wickett ◽

...

Keyword(s):

Gene Expression ◽

Housekeeping Genes ◽

Striga Hermonthica ◽

Rna Seq ◽

Qrt Pcr

Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM

10.1101/786285 ◽

2019 ◽

Cited By ~ 4

Author(s):

Marcus Alvarez ◽

Elior Rahmani ◽

Brandon Jew ◽

Kristina M. Garske ◽

Zong Miao ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Supervised Machine Learning ◽

Data Sets ◽

Rna Seq ◽

Novel Approach ◽

Single Nucleus ◽

Downstream Analysis

AbstractSingle-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. Contrary to single-cell RNA seq (scRNA-seq), we observe that snRNA-seq is commonly subject to contamination by high amounts of extranuclear background RNA, which can lead to identification of spurious cell types in downstream clustering analyses if overlooked. We present a novel approach to remove debris-contaminated droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: 1) human differentiating preadipocytes in vitro, 2) fresh mouse brain tissue, and 3) human frozen adipose tissue (AT) from six individuals. All three data sets showed various degrees of extranuclear RNA contamination. We observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq data, we also successfully applied DIEM to single-cell data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.

Identification and classification of cis-regulatory elements in the amphipod crustacean Parhyale hawaiensis

10.1101/2021.09.16.460328 ◽

2021 ◽

Author(s):

Dennis A Sun ◽

Nipam H Patel

Keyword(s):

Gene Expression ◽

Genome Annotation ◽

Time Course ◽

Regulatory Elements ◽

Rna Seq ◽

Genome Wide ◽

Long Read ◽

Amphipod Crustacean ◽

Accessible Chromatin

AbstractEmerging research organisms enable the study of biology that cannot be addressed using classical “model” organisms. The development of novel data resources can accelerate research in such animals. Here, we present new functional genomic resources for the amphipod crustacean Parhyale hawaiensis, facilitating the exploration of gene regulatory evolution using this emerging research organism. We use Omni-ATAC-Seq, an improved form of the Assay for Transposase-Accessible Chromatin coupled with next-generation sequencing (ATAC-Seq), to identify accessible chromatin genome-wide across a broad time course of Parhyale embryonic development. This time course encompasses many major morphological events, including segmentation, body regionalization, gut morphogenesis, and limb development. In addition, we use short- and long-read RNA-Seq to generate an improved Parhyale genome annotation, enabling deeper classification of identified regulatory elements. We leverage a variety of bioinformatic tools to discover differential accessibility, predict nucleosome positioning, infer transcription factor binding, cluster peaks based on accessibility dynamics, classify biological functions, and correlate gene expression with accessibility. Using a Minos transposase reporter system, we demonstrate the potential to identify novel regulatory elements using this approach, including distal regulatory elements. This work provides a platform for the identification of novel developmental regulatory elements in Parhyale, and offers a framework for performing such experiments in other emerging research organisms.Primary Findings-Omni-ATAC-Seq identifies cis-regulatory elements genome-wide during crustacean embryogenesis-Combined short- and long-read RNA-Seq improves the Parhyale genome annotation-ImpulseDE2 analysis identifies dynamically regulated candidate regulatory elements-NucleoATAC and HINT-ATAC enable inference of nucleosome occupancy and transcription factor binding-Fuzzy clustering reveals peaks with distinct accessibility and chromatin dynamics-Integration of accessibility and gene expression reveals possible enhancers and repressors-Omni-ATAC can identify known and novel regulatory elements

Gene Expression Does Not Support the Developmental Hourglass Model in Three Animals with Spiralian Development

Molecular Biology and Evolution ◽

10.1093/molbev/msz065 ◽

2019 ◽

Vol 36 (7) ◽

pp. 1373-1383 ◽

Cited By ~ 1

Author(s):

Longjun Wu ◽

Kailey E Ferger ◽

J David Lambert

Keyword(s):

Gene Expression ◽

Large Fraction ◽

Molecular Data ◽

Development Stage ◽

Data Sets ◽

Rna Seq ◽

Developmental Evolution ◽

Phylotypic Stage ◽

Hourglass Model ◽

Almost All

Abstract It has been proposed that animals have a pattern of developmental evolution resembling an hourglass because the most conserved development stage—often called the phylotypic stage—is always in midembryonic development. Although the topic has been debated for decades, recent studies using molecular data such as RNA-seq gene expression data sets have largely supported the existence of periods of relative evolutionary conservation in middevelopment, consistent with the phylotypic stage and the hourglass concepts. However, so far this approach has only been applied to a limited number of taxa across the tree of life. Here, using established phylotranscriptomic approaches, we found a surprising reverse hourglass pattern in two molluscs and a polychaete annelid, representatives of the Spiralia, an understudied group that contains a large fraction of metazoan body plan diversity. These results suggest that spiralians have a divergent midembryonic stage, with more conserved early and late development, which is the inverse of the pattern seen in almost all other organisms where these phylotranscriptomic approaches have been reported. We discuss our findings in light of proposed reasons for the phylotypic stage and hourglass model in other systems.

Plant Soft Rot Development and Regulation from the Viewpoint of Transcriptomic Profiling

Plants ◽

10.3390/plants9091176 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1176

Author(s):

Ivan Tsers ◽

Vladimir Gorshkov ◽

Natalia Gogoleva ◽

Olga Parfirova ◽

Olga Petrova ◽

...

Keyword(s):

Gene Expression ◽

Plant Cell Wall ◽

Soft Rot ◽

Regulatory Elements ◽

Plant Responses ◽

Gene Promoters ◽

Rna Seq ◽

Induced Plant Responses ◽

Regulatory Systems ◽

Genome Level

Soft rot caused by Pectobacterium species is a devastating plant disease poorly characterized in terms of host plant responses. In this study, changes in the transcriptome of tobacco plants after infection with Pectobacterium atrosepticum (Pba) were analyzed using RNA-Seq. To draw a comprehensive and nontrivially itemized picture of physiological events in Pba-infected plants and to reveal novel potential molecular “players” in plant–Pba interactions, an original functional gene classification was performed. The classifications present in various databases were merged, enriched by “missed” genes, and divided into subcategories. Particular changes in plant cell wall-related processes, perturbations in hormonal and other regulatory systems, and alterations in primary, secondary, and redox metabolism were elucidated in terms of gene expression. Special attention was paid to the prediction of transcription factors (TFs) involved in the disease’s development. Herewith, gene expression was analyzed within the predicted TF regulons assembled at the whole-genome level based on the presence of particular cis-regulatory elements (CREs) in gene promoters. Several TFs, whose regulons were enriched by differentially expressed genes, were considered to be potential master regulators of Pba-induced plant responses. Differential regulation of genes belonging to a particular multigene family and encoding cognate proteins was explained by the presence/absence of the particular CRE in gene promoters.

Cross-platform Data Analysis Reveals a Generic Gene Expression Signature for Microsatellite Instability in Colorectal Cancer

BioMed Research International ◽

10.1155/2019/6763596 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Anna Pačínková ◽

Vlad Popovici

Keyword(s):

Gene Expression ◽

Colorectal Cancer ◽

Colon Cancer ◽

Microsatellite Instability ◽

Gene Expression Signature ◽

Data Sets ◽

Rna Seq ◽

Cancer Data ◽

Expression Signature ◽

Endometrial Cancers

The dysfunction of the DNA mismatch repair system results in microsatellite instability (MSI). MSI plays a central role in the development of multiple human cancers. In colon cancer, despite being associated with resistance to 5-fluorouracil treatment, MSI is a favourable prognostic marker. In gastric and endometrial cancers, its prognostic value is not so well established. Nevertheless, recognising the MSI tumours may be important for predicting the therapeutic effect of immune checkpoint inhibitors. Several gene expression signatures were trained on microarray data sets to understand the regulatory mechanisms underlying microsatellite instability in colorectal cancer. A wealth of expression data already exists in the form of microarray data sets. However, the RNA-seq has become a routine for transcriptome analysis. A new MSI gene expression signature presented here is the first to be valid across two different platforms, microarrays and RNA-seq. In the case of colon cancer, its estimated performance was (i) AUC = 0.94, 95% CI = (0.90 – 0.97) on RNA-seq and (ii) AUC = 0.95, 95% CI = (0.92 – 0.97) on microarray. The 25-gene expression signature was also validated in two independent microarray colon cancer data sets. Despite being derived from colorectal cancer, the signature maintained good performance on RNA-seq and microarray gastric cancer data sets (AUC = 0.90, 95% CI = (0.85 – 0.94) and AUC = 0.83, 95% CI = (0.69 – 0.97), respectively). Furthermore, this classifier retained high concordance even when classifying RNA-seq endometrial cancers (AUC = 0.71, 95% CI = (0.62 – 0.81). These results indicate that the new signature was able to remove the platform-specific differences while preserving the underlying biological differences between MSI/MSS phenotypes in colon cancer samples.

Identifying Transcription Regulatory Elements in the Human and Mouse Genomes Using Tissue-specific Gene Expression Profiles

Journal of Integrative Bioinformatics ◽

10.1515/jib-2007-59 ◽

2007 ◽

Vol 4 (2) ◽

pp. 1-23

Author(s):

Amitava Karmaker ◽

Kihoon Yoon ◽

Mark Doderer ◽

Russell Kruzelock ◽

Stephen Kwek

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Transcription Factors ◽

Binding Sites ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Regulatory Elements ◽

Specific Gene ◽

Tissue Specific Gene ◽

Human And Mouse

Summary Revealing the complex interaction between trans- and cis-regulatory elements and identifying these potential binding sites are fundamental problems in understanding gene expression. The progresses in ChIP-chip technology facilitate identifying DNA sequences that are recognized by a specific transcription factor. However, protein-DNA binding is a necessary, but not sufficient, condition for transcription regulation. We need to demonstrate that their gene expression levels are correlated to further confirm regulatory relationship. Here, instead of using a linear correlation coefficient, we used a non-linear function that seems to better capture possible regulatory relationships. By analyzing tissue-specific gene expression profiles of human and mouse, we delineate a list of pairs of transcription factor and gene with highly correlated expression levels, which may have regulatory relationships. Using two closely-related species (human and mouse), we perform comparative genome analysis to cross-validate the quality of our prediction. Our findings are confirmed by matching publicly available TFBS databases (like TRANFAC and ConSite) and by reviewing biological literature. For example, according to our analysis, 80% and 85.71% of the targets genes associated with E2F5 and RELB transcription factors have the corresponding known binding sites. We also substantiated our results on some oncogenes with the biomedical literature. Moreover, we performed further analysis on them and found that BCR and DEK may be regulated by some common transcription factors. Similar results for BTG1, FCGR2B and LCK genes were also reported.