Comparative transcriptomics reveal distinct patterns of gene expression conservation through vertebrate embryogenesis

Mapping Intimacies ◽

10.1101/840801 ◽

2019 ◽

Author(s):

Megan E. Chan ◽

Pranav S. Bhamidipati ◽

Heather J. Goldsby ◽

Arend Hintze ◽

Hans A. Hofmann ◽

...

Keyword(s):

Gene Expression ◽

Regulatory Networks ◽

Comparative Approach ◽

Data Sets ◽

Rna Seq ◽

Mixed Support ◽

Gene Sets ◽

Phylotypic Stage ◽

Diversity Studies ◽

Relationship Of

AbstractDespite life’s diversity, studies of variation across animals often remind us of our shared evolutionary past. Abundant genome sequencing over the last ~25 years reveals remarkable conservation of genes and recent analyses of gene regulatory networks illustrate that not only genes but entire pathways are conserved, reused, and elaborated in the evolution of diversity. Predating these discoveries, 19th-century embryologists observed that though morphology at birth varies tremendously, certain stages of embryogenesis appear remarkably similar across vertebrates. Specifically, while early and late stages are variable across species, anatomy of mid-stages embryos (the ‘phylotypic’ stage) is conserved. This model of vertebrate development and diversification has found mixed support in recent analyses comparing gene expression across species possibly owing to differences across studies in species, embryonic stages, and gene sets compared. Here we perform a comparative analysis using 186 microarray and RNA-seq expression data sets covering embryogenesis in six vertebrate species spanning ~420 million years of evolution. We use an unbiased clustering approach to group stages of embryogenesis by transcriptomic similarity and ask whether gene expression similarity of clustered embryonic stages deviates from the null hypothesis of no relationship between timing and diversification. We use a phylogenetic comparative approach to characterize expression conservation pattern (i.e., early conservation, hourglass, inverse hourglass, late conservation, or no relationship) of each gene at each evolutionary node. Across vertebrates, we find an enrichment of genes exhibiting early conservation, hourglass, late conservation patterns and a large depletion of gene exhibiting no distinguishable pattern of conservation in both microarray and RNA-seq data sets. Enrichment of genes showing patterned conservation through embryogenesis indicates diversification of embryogenesis may be temporally constrained. However, the circumstances (e.g., gene groups, evolutionary nodes, species) under which each pattern emerges remain unknown and require both broad evolutionary sampling and systematic examination of embryogenesis across species.

Gene Expression Does Not Support the Developmental Hourglass Model in Three Animals with Spiralian Development

Molecular Biology and Evolution ◽

10.1093/molbev/msz065 ◽

2019 ◽

Vol 36 (7) ◽

pp. 1373-1383 ◽

Cited By ~ 1

Author(s):

Longjun Wu ◽

Kailey E Ferger ◽

J David Lambert

Keyword(s):

Gene Expression ◽

Large Fraction ◽

Molecular Data ◽

Development Stage ◽

Data Sets ◽

Rna Seq ◽

Developmental Evolution ◽

Phylotypic Stage ◽

Hourglass Model ◽

Almost All

Abstract It has been proposed that animals have a pattern of developmental evolution resembling an hourglass because the most conserved development stage—often called the phylotypic stage—is always in midembryonic development. Although the topic has been debated for decades, recent studies using molecular data such as RNA-seq gene expression data sets have largely supported the existence of periods of relative evolutionary conservation in middevelopment, consistent with the phylotypic stage and the hourglass concepts. However, so far this approach has only been applied to a limited number of taxa across the tree of life. Here, using established phylotranscriptomic approaches, we found a surprising reverse hourglass pattern in two molluscs and a polychaete annelid, representatives of the Spiralia, an understudied group that contains a large fraction of metazoan body plan diversity. These results suggest that spiralians have a divergent midembryonic stage, with more conserved early and late development, which is the inverse of the pattern seen in almost all other organisms where these phylotranscriptomic approaches have been reported. We discuss our findings in light of proposed reasons for the phylotypic stage and hourglass model in other systems.

Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM

10.1101/786285 ◽

2019 ◽

Cited By ~ 4

Author(s):

Marcus Alvarez ◽

Elior Rahmani ◽

Brandon Jew ◽

Kristina M. Garske ◽

Zong Miao ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Supervised Machine Learning ◽

Data Sets ◽

Rna Seq ◽

Novel Approach ◽

Single Nucleus ◽

Downstream Analysis

AbstractSingle-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. Contrary to single-cell RNA seq (scRNA-seq), we observe that snRNA-seq is commonly subject to contamination by high amounts of extranuclear background RNA, which can lead to identification of spurious cell types in downstream clustering analyses if overlooked. We present a novel approach to remove debris-contaminated droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: 1) human differentiating preadipocytes in vitro, 2) fresh mouse brain tissue, and 3) human frozen adipose tissue (AT) from six individuals. All three data sets showed various degrees of extranuclear RNA contamination. We observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq data, we also successfully applied DIEM to single-cell data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.

Cross-platform Data Analysis Reveals a Generic Gene Expression Signature for Microsatellite Instability in Colorectal Cancer

BioMed Research International ◽

10.1155/2019/6763596 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Anna Pačínková ◽

Vlad Popovici

Keyword(s):

Gene Expression ◽

Colorectal Cancer ◽

Colon Cancer ◽

Microsatellite Instability ◽

Gene Expression Signature ◽

Data Sets ◽

Rna Seq ◽

Cancer Data ◽

Expression Signature ◽

Endometrial Cancers

The dysfunction of the DNA mismatch repair system results in microsatellite instability (MSI). MSI plays a central role in the development of multiple human cancers. In colon cancer, despite being associated with resistance to 5-fluorouracil treatment, MSI is a favourable prognostic marker. In gastric and endometrial cancers, its prognostic value is not so well established. Nevertheless, recognising the MSI tumours may be important for predicting the therapeutic effect of immune checkpoint inhibitors. Several gene expression signatures were trained on microarray data sets to understand the regulatory mechanisms underlying microsatellite instability in colorectal cancer. A wealth of expression data already exists in the form of microarray data sets. However, the RNA-seq has become a routine for transcriptome analysis. A new MSI gene expression signature presented here is the first to be valid across two different platforms, microarrays and RNA-seq. In the case of colon cancer, its estimated performance was (i) AUC = 0.94, 95% CI = (0.90 – 0.97) on RNA-seq and (ii) AUC = 0.95, 95% CI = (0.92 – 0.97) on microarray. The 25-gene expression signature was also validated in two independent microarray colon cancer data sets. Despite being derived from colorectal cancer, the signature maintained good performance on RNA-seq and microarray gastric cancer data sets (AUC = 0.90, 95% CI = (0.85 – 0.94) and AUC = 0.83, 95% CI = (0.69 – 0.97), respectively). Furthermore, this classifier retained high concordance even when classifying RNA-seq endometrial cancers (AUC = 0.71, 95% CI = (0.62 – 0.81). These results indicate that the new signature was able to remove the platform-specific differences while preserving the underlying biological differences between MSI/MSS phenotypes in colon cancer samples.

Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes

Cancer Informatics ◽

10.4137/cin.s606 ◽

2008 ◽

Vol 6 ◽

pp. CIN.S606 ◽

Cited By ~ 23

Author(s):

Attila Frigyesi ◽

Mattias Höglund

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Matrix Factorization ◽

Biological Significance ◽

Data Sets ◽

Expression Data ◽

Microarray Expression Data ◽

Tumor Subtypes ◽

Gene Sets ◽

Non Negative Matrix Factorization

Non-negative matrix factorization (NMF) is a relatively new approach to analyze gene expression data that models data by additive combinations of non-negative basis vectors (metagenes). The non-negativity constraint makes sense biologically as genes may either be expressed or not, but never show negative expression. We applied NMF to five different microarray data sets. We estimated the appropriate number metagens by comparing the residual error of NMF reconstruction of data to that of NMF reconstruction of permutated data, thus finding when a given solution contained more information than noise. This analysis also revealed that NMF could not factorize one of the data sets in a meaningful way. We used GO categories and pre defined gene sets to evaluate the biological significance of the obtained metagenes. By analyses of metagenes specific for the same GO-categories we could show that individual metagenes activated different aspects of the same biological processes. Several of the obtained metagenes correlated with tumor subtypes and tumors with characteristic chromosomal translocations, indicating that metagenes may correspond to specific disease entities. Hence, NMF extracts biological relevant structures of microarray expression data and may thus contribute to a deeper understanding of tumor behavior.

HRT Atlas v1.1 database: redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasets

10.1101/787150 ◽

2019 ◽

Author(s):

Bidossessi Wilfried Hounkpe ◽

Francine Chenou ◽

Franciele Lima ◽

Erich Vinicius de Paula

Keyword(s):

Gene Expression ◽

Wild Type Mouse ◽

Housekeeping Genes ◽

Regulatory Elements ◽

Data Sets ◽

Rna Seq ◽

Cellular Functions ◽

Evolutionary Features ◽

Small Device ◽

Human And Mouse

AbstractHousekeeping (HK) genes are constitutively expressed genes that are required for the maintenance of basic cellular functions. Despite their importance in the calibration of gene expression, as well as the understanding of many genomic and evolutionary features, important discrepancies have been observed in studies that previously identified these genes. Here, we present Housekeeping Transcript Atlas (HRT Atlas v1.0, www.housekeeping.unicamp.br) a web-based database which addresses some of the previously observed limitations in the identification of these genes, and offers a more accurate database of human and mouse HK genes and transcripts. The database was generated by mining massive human and mouse RNA-seq data sets, including 12,482 and 507 high-quality RNA-seq samples from 82 human non-disease tissues/cells and 15 healthy tissues/cells of C57BL/6 wild type mouse, respectively. User can visualize the expression and download lists of 2,158 human HK transcripts from 2,176 HK genes and 3,024 mouse HK transcripts from 3,277 mouse HK genes. HRT Atlas also offers the most stable and suitable tissue selective candidate reference transcripts for normalization of qPCR experiments. Specific primers and predicted modifiers of gene expression for some of these HK transcripts are also proposed. HRT Atlas has also been integrated with regulatory elements from Epiregio server. All of these resources can be accessed and downloaded from any computer or small device web browsers.

ISMARA: Completely automated inference of gene regulatory networks from high-throughput data

10.7287/peerj.preprints.3328 ◽

2017 ◽

Author(s):

Mikhail Pachkov ◽

Piotr J Balwierz ◽

Phil Arnold ◽

Andreas J Gruber ◽

Mihaela Zavolan ◽

...

Keyword(s):

Gene Expression ◽

High Throughput ◽

Regulatory Networks ◽

Target Genes ◽

Chromatin State ◽

Response Analysis ◽

Rna Seq ◽

High Throughput Data ◽

Micro Rnas ◽

Gene Regulatory

As the costs of high-throughput measurement technologies continue to fall, experimental approaches in biomedicine are increasingly data intensive and the advent of big data is justifiably seen as holding the promise to transform medicine. However, as data volumes mount, researchers increasingly realize that extracting concrete, reliable, and actionable biological predictions from high-throughput data can be very challenging. Our laboratory has pioneered a number of methods for inferring key gene regulatory interactions from high-throughput data. For example, we developed motif activity response analysis (MARA)[, which models genome-wide gene expression (RNA-Seq, or microarray) and chromatin state (ChIP-Seq) data in terms of comprehensive predictions of regulatory sites for hundreds of mammalian regulators (TFs and micro-RNAs). Using these models, MARA identifies the key regulators driving gene expression and chromatin state changes, the activities of these regulators across the input samples, their target genes, and the sites on the genome through which these regulators act. We recently completely automated MARA in an integrated web-server (ismara.unibas.ch) that allows researchers to analyze their own data by simply uploading RNA-Seq or ChIP-Seq datasets, and provides results in an integrated web interface as well as in downloadable flat form.

Generation of guard cell RNA-seq transcriptomes during progressive drought and recovery using an adapted INTACT protocol for Arabidopsis thaliana shoot tissue

10.1101/2021.04.15.439991 ◽

2021 ◽

Author(s):

Anna van Weringh ◽

Asher Pasha ◽

Eddi Esteban ◽

Paul J. Gamueda ◽

Nicholas J. Provart

Keyword(s):

Gene Expression ◽

Arabidopsis Thaliana ◽

Drought Stress ◽

Guard Cell ◽

Crop Production ◽

Leaf Tissue ◽

Cell Types ◽

Severe Drought ◽

Data Sets ◽

Rna Seq

Drought is an important environmental stress that limits crop production. Guard cells (GC) act to control the rate of water loss. To better understand how GCs change their gene expression during a progressive drought we generated guard cell-specific RNA-seq transcriptomes during mild, moderate, and severe drought stress. We additionally sampled re-watered plants that had experienced severe drought stress. These transcriptomes were generated using the INTACT system to capture the RNA from GC nuclei. We optimized the INTACT protocol for Arabidopsis thaliana leaf tissue, incorporating fixation to preserve RNA during nuclear isolation. To be able to identify gene expression changes unique to GCs, we additionally generated transcriptomes from all cell types, using a 35S viral promoter to capture the nuclei of all cell types in leaves. These data sets highlight shared and unique gene expression changes between GCs and the bulk leaf tissue. The timing of gene expression changes is different between GCs and other cell types: we found that only GCs had detectable gene expression changes at the earliest drought time point. The drought responsive GC and leaf RNA-seq transcriptomes are available in the Arabidopsis ePlant at the Bio-Analytic Resource for Plant Biology website.

Integrative Genomics of the Mammalian Alveolar Macrophage Response to Intracellular Mycobacteria

10.21203/rs.3.rs-121955/v1 ◽

2020 ◽

Author(s):

Thomas J. Hall ◽

Michael P. Mullen ◽

Gillian P. McHugo ◽

Kate E. Killick ◽

Siobhán C. Ring ◽

...

Keyword(s):

Gene Expression ◽

Alveolar Macrophage ◽

Differential Expression ◽

Gene Expression Data ◽

Host Response ◽

Gwas Data ◽

Data Sets ◽

Expression Data ◽

Rna Seq ◽

Time Point

Abstract BackgroundBovine TB (BTB), caused by infection with Mycobacterium bovis, is a major endemic disease affecting global cattle production, particularly in many developing countries. The key innate immune that first encounters the pathogen is the alveolar macrophage, previously shown to be substantially reprogrammed during intracellular infection by the pathogen. Here we use differential expression, and correlation- and interaction-based network approaches to analyse the host response to infection with M. bovis at the transcriptome level to identify core infection response pathways and gene modules. These outputs were then integrated with genome-wide association study (GWAS) data sets to enhance detection of genomic variants for susceptibility/resistance to M. bovis infection.ResultsThe host gene expression data consisted of bovine RNA-seq data from alveolar macrophages infected with M. bovis at 24 and 48 hours post-infection. These RNA-seq data were analysed using three distinct analysis pipelines and novel response pathways and modules were further refined using cross-comparison and integration of the results. First, a differential expression analysis was carried out to determine the most significantly differentially expressed (DE) genes between conditions at each time point. Second, two networks were constructed at each time point using gene correlation patterns to determine changes in expression across conditions. Functional sub-modules within each correlation network were selected by statistical criteria for modularity. Third, a base gene interaction network of the mammalian host response to mycobacterial infection was generated using the GeneCards database and InnateDB. Differential gene expression data were superimposed on this base network to extract functional modules of interconnected DE genes.ConclusionsBovine GWAS data was obtained from a published BTB susceptibility/resistance study. The results from the three parallel analyses were integrated with this data to determine which of the three approaches identified genes significantly enriched for SNPs associated with susceptibility/resistance to M. bovis infection. Results indicate distinct and significant overlap in SNP discovery, demonstrating that network-based integration of biologically relevant transcriptomics data can leverage substantial additional information from GWAS data sets.

HRT Atlas v1.0 database: redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasets

Nucleic Acids Research ◽

10.1093/nar/gkaa609 ◽

2020 ◽

Cited By ~ 3

Author(s):

Bidossessi Wilfried Hounkpe ◽

Francine Chenou ◽

Franciele de Lima ◽

Erich Vinicius De Paula

Keyword(s):

Gene Expression ◽

Wild Type Mouse ◽

Housekeeping Genes ◽

Regulatory Elements ◽

Data Sets ◽

Rna Seq ◽

Cellular Functions ◽

Evolutionary Features ◽

Reference Transcript ◽

Human And Mouse

Abstract Housekeeping (HK) genes are constitutively expressed genes that are required for the maintenance of basic cellular functions. Despite their importance in the calibration of gene expression, as well as the understanding of many genomic and evolutionary features, important discrepancies have been observed in studies that previously identified these genes. Here, we present Housekeeping and Reference Transcript Atlas (HRT Atlas v1.0, www.housekeeping.unicamp.br) a web-based database which addresses some of the previously observed limitations in the identification of these genes, and offers a more accurate database of human and mouse HK genes and transcripts. The database was generated by mining massive human and mouse RNA-seq data sets, including 11 281 and 507 high-quality RNA-seq samples from 52 human non-disease tissues/cells and 14 healthy tissues/cells of C57BL/6 wild type mouse, respectively. User can visualize the expression and download lists of 2158 human HK transcripts from 2176 HK genes and 3024 mouse HK transcripts from 3277 mouse HK genes. HRT Atlas also offers the most stable and suitable tissue selective candidate reference transcripts for normalization of qPCR experiments. Specific primers and predicted modifiers of gene expression for some of these HK transcripts are also proposed. HRT Atlas has also been integrated with a regulatory elements resource from Epiregio server.

Transposable elements contribute to the genomic response to insecticides in Drosophila melanogaster

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2019.0341 ◽

2020 ◽

Vol 375 (1795) ◽

pp. 20190341 ◽

Cited By ~ 4

Author(s):

Judit Salces-Ortiz ◽

Carlos Vargas-Chavez ◽

Lain Guio ◽

Gabriel E. Rech ◽

Josefa González

Keyword(s):

Gene Expression ◽

Drosophila Melanogaster ◽

Regulatory Networks ◽

Regulation Of Gene Expression ◽

Nucleotide Polymorphisms ◽

Rna Seq ◽

Organophosphate Insecticide ◽

Single Nucleotide ◽

Insecticide Exposure ◽

Genomic Response

Most of the genotype–phenotype analyses to date have largely centred attention on single nucleotide polymorphisms. However, transposable element (TE) insertions have arisen as a plausible addition to the study of the genotypic–phenotypic link because of to their role in genome function and evolution. In this work, we investigate the contribution of TE insertions to the regulation of gene expression in response to insecticides. We exposed four Drosophila melanogaster strains to malathion, a commonly used organophosphate insecticide. By combining information from different approaches, including RNA-seq and ATAC-seq, we found that TEs can contribute to the regulation of gene expression under insecticide exposure by rewiring cis -regulatory networks. This article is part of a discussion meeting issue ‘Crossroads between transposons and gene regulation’.