scholarly journals Comparative transcriptomics reveal distinct patterns of gene expression conservation through vertebrate embryogenesis

2019 ◽  
Author(s):  
Megan E. Chan ◽  
Pranav S. Bhamidipati ◽  
Heather J. Goldsby ◽  
Arend Hintze ◽  
Hans A. Hofmann ◽  
...  

AbstractDespite life’s diversity, studies of variation across animals often remind us of our shared evolutionary past. Abundant genome sequencing over the last ~25 years reveals remarkable conservation of genes and recent analyses of gene regulatory networks illustrate that not only genes but entire pathways are conserved, reused, and elaborated in the evolution of diversity. Predating these discoveries, 19th-century embryologists observed that though morphology at birth varies tremendously, certain stages of embryogenesis appear remarkably similar across vertebrates. Specifically, while early and late stages are variable across species, anatomy of mid-stages embryos (the ‘phylotypic’ stage) is conserved. This model of vertebrate development and diversification has found mixed support in recent analyses comparing gene expression across species possibly owing to differences across studies in species, embryonic stages, and gene sets compared. Here we perform a comparative analysis using 186 microarray and RNA-seq expression data sets covering embryogenesis in six vertebrate species spanning ~420 million years of evolution. We use an unbiased clustering approach to group stages of embryogenesis by transcriptomic similarity and ask whether gene expression similarity of clustered embryonic stages deviates from the null hypothesis of no relationship between timing and diversification. We use a phylogenetic comparative approach to characterize expression conservation pattern (i.e., early conservation, hourglass, inverse hourglass, late conservation, or no relationship) of each gene at each evolutionary node. Across vertebrates, we find an enrichment of genes exhibiting early conservation, hourglass, late conservation patterns and a large depletion of gene exhibiting no distinguishable pattern of conservation in both microarray and RNA-seq data sets. Enrichment of genes showing patterned conservation through embryogenesis indicates diversification of embryogenesis may be temporally constrained. However, the circumstances (e.g., gene groups, evolutionary nodes, species) under which each pattern emerges remain unknown and require both broad evolutionary sampling and systematic examination of embryogenesis across species.


2019 ◽  
Vol 36 (7) ◽  
pp. 1373-1383 ◽  
Author(s):  
Longjun Wu ◽  
Kailey E Ferger ◽  
J David Lambert

Abstract It has been proposed that animals have a pattern of developmental evolution resembling an hourglass because the most conserved development stage—often called the phylotypic stage—is always in midembryonic development. Although the topic has been debated for decades, recent studies using molecular data such as RNA-seq gene expression data sets have largely supported the existence of periods of relative evolutionary conservation in middevelopment, consistent with the phylotypic stage and the hourglass concepts. However, so far this approach has only been applied to a limited number of taxa across the tree of life. Here, using established phylotranscriptomic approaches, we found a surprising reverse hourglass pattern in two molluscs and a polychaete annelid, representatives of the Spiralia, an understudied group that contains a large fraction of metazoan body plan diversity. These results suggest that spiralians have a divergent midembryonic stage, with more conserved early and late development, which is the inverse of the pattern seen in almost all other organisms where these phylotranscriptomic approaches have been reported. We discuss our findings in light of proposed reasons for the phylotypic stage and hourglass model in other systems.



2019 ◽  
Author(s):  
Marcus Alvarez ◽  
Elior Rahmani ◽  
Brandon Jew ◽  
Kristina M. Garske ◽  
Zong Miao ◽  
...  

AbstractSingle-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. Contrary to single-cell RNA seq (scRNA-seq), we observe that snRNA-seq is commonly subject to contamination by high amounts of extranuclear background RNA, which can lead to identification of spurious cell types in downstream clustering analyses if overlooked. We present a novel approach to remove debris-contaminated droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: 1) human differentiating preadipocytes in vitro, 2) fresh mouse brain tissue, and 3) human frozen adipose tissue (AT) from six individuals. All three data sets showed various degrees of extranuclear RNA contamination. We observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq data, we also successfully applied DIEM to single-cell data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.



2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Anna Pačínková ◽  
Vlad Popovici

The dysfunction of the DNA mismatch repair system results in microsatellite instability (MSI). MSI plays a central role in the development of multiple human cancers. In colon cancer, despite being associated with resistance to 5-fluorouracil treatment, MSI is a favourable prognostic marker. In gastric and endometrial cancers, its prognostic value is not so well established. Nevertheless, recognising the MSI tumours may be important for predicting the therapeutic effect of immune checkpoint inhibitors. Several gene expression signatures were trained on microarray data sets to understand the regulatory mechanisms underlying microsatellite instability in colorectal cancer. A wealth of expression data already exists in the form of microarray data sets. However, the RNA-seq has become a routine for transcriptome analysis. A new MSI gene expression signature presented here is the first to be valid across two different platforms, microarrays and RNA-seq. In the case of colon cancer, its estimated performance was (i) AUC = 0.94, 95% CI = (0.90 – 0.97) on RNA-seq and (ii) AUC = 0.95, 95% CI = (0.92 – 0.97) on microarray. The 25-gene expression signature was also validated in two independent microarray colon cancer data sets. Despite being derived from colorectal cancer, the signature maintained good performance on RNA-seq and microarray gastric cancer data sets (AUC = 0.90, 95% CI = (0.85 – 0.94) and AUC = 0.83, 95% CI = (0.69 – 0.97), respectively). Furthermore, this classifier retained high concordance even when classifying RNA-seq endometrial cancers (AUC = 0.71, 95% CI = (0.62 – 0.81). These results indicate that the new signature was able to remove the platform-specific differences while preserving the underlying biological differences between MSI/MSS phenotypes in colon cancer samples.



2008 ◽  
Vol 6 ◽  
pp. CIN.S606 ◽  
Author(s):  
Attila Frigyesi ◽  
Mattias Höglund

Non-negative matrix factorization (NMF) is a relatively new approach to analyze gene expression data that models data by additive combinations of non-negative basis vectors (metagenes). The non-negativity constraint makes sense biologically as genes may either be expressed or not, but never show negative expression. We applied NMF to five different microarray data sets. We estimated the appropriate number metagens by comparing the residual error of NMF reconstruction of data to that of NMF reconstruction of permutated data, thus finding when a given solution contained more information than noise. This analysis also revealed that NMF could not factorize one of the data sets in a meaningful way. We used GO categories and pre defined gene sets to evaluate the biological significance of the obtained metagenes. By analyses of metagenes specific for the same GO-categories we could show that individual metagenes activated different aspects of the same biological processes. Several of the obtained metagenes correlated with tumor subtypes and tumors with characteristic chromosomal translocations, indicating that metagenes may correspond to specific disease entities. Hence, NMF extracts biological relevant structures of microarray expression data and may thus contribute to a deeper understanding of tumor behavior.



2019 ◽  
Author(s):  
Bidossessi Wilfried Hounkpe ◽  
Francine Chenou ◽  
Franciele Lima ◽  
Erich Vinicius de Paula

AbstractHousekeeping (HK) genes are constitutively expressed genes that are required for the maintenance of basic cellular functions. Despite their importance in the calibration of gene expression, as well as the understanding of many genomic and evolutionary features, important discrepancies have been observed in studies that previously identified these genes. Here, we present Housekeeping Transcript Atlas (HRT Atlas v1.0, www.housekeeping.unicamp.br) a web-based database which addresses some of the previously observed limitations in the identification of these genes, and offers a more accurate database of human and mouse HK genes and transcripts. The database was generated by mining massive human and mouse RNA-seq data sets, including 12,482 and 507 high-quality RNA-seq samples from 82 human non-disease tissues/cells and 15 healthy tissues/cells of C57BL/6 wild type mouse, respectively. User can visualize the expression and download lists of 2,158 human HK transcripts from 2,176 HK genes and 3,024 mouse HK transcripts from 3,277 mouse HK genes. HRT Atlas also offers the most stable and suitable tissue selective candidate reference transcripts for normalization of qPCR experiments. Specific primers and predicted modifiers of gene expression for some of these HK transcripts are also proposed. HRT Atlas has also been integrated with regulatory elements from Epiregio server. All of these resources can be accessed and downloaded from any computer or small device web browsers.



2017 ◽  
Author(s):  
Mikhail Pachkov ◽  
Piotr J Balwierz ◽  
Phil Arnold ◽  
Andreas J Gruber ◽  
Mihaela Zavolan ◽  
...  

As the costs of high-throughput measurement technologies continue to fall, experimental approaches in biomedicine are increasingly data intensive and the advent of big data is justifiably seen as holding the promise to transform medicine. However, as data volumes mount, researchers increasingly realize that extracting concrete, reliable, and actionable biological predictions from high-throughput data can be very challenging. Our laboratory has pioneered a number of methods for inferring key gene regulatory interactions from high-throughput data. For example, we developed motif activity response analysis (MARA)[, which models genome-wide gene expression (RNA-Seq, or microarray) and chromatin state (ChIP-Seq) data in terms of comprehensive predictions of regulatory sites for hundreds of mammalian regulators (TFs and micro-RNAs). Using these models, MARA identifies the key regulators driving gene expression and chromatin state changes, the activities of these regulators across the input samples, their target genes, and the sites on the genome through which these regulators act. We recently completely automated MARA in an integrated web-server (ismara.unibas.ch) that allows researchers to analyze their own data by simply uploading RNA-Seq or ChIP-Seq datasets, and provides results in an integrated web interface as well as in downloadable flat form.



2021 ◽  
Author(s):  
Anna van Weringh ◽  
Asher Pasha ◽  
Eddi Esteban ◽  
Paul J. Gamueda ◽  
Nicholas J. Provart

Drought is an important environmental stress that limits crop production. Guard cells (GC) act to control the rate of water loss. To better understand how GCs change their gene expression during a progressive drought we generated guard cell-specific RNA-seq transcriptomes during mild, moderate, and severe drought stress. We additionally sampled re-watered plants that had experienced severe drought stress. These transcriptomes were generated using the INTACT system to capture the RNA from GC nuclei. We optimized the INTACT protocol for Arabidopsis thaliana leaf tissue, incorporating fixation to preserve RNA during nuclear isolation. To be able to identify gene expression changes unique to GCs, we additionally generated transcriptomes from all cell types, using a 35S viral promoter to capture the nuclei of all cell types in leaves. These data sets highlight shared and unique gene expression changes between GCs and the bulk leaf tissue. The timing of gene expression changes is different between GCs and other cell types: we found that only GCs had detectable gene expression changes at the earliest drought time point. The drought responsive GC and leaf RNA-seq transcriptomes are available in the Arabidopsis ePlant at the Bio-Analytic Resource for Plant Biology website.



2020 ◽  
Author(s):  
Thomas J. Hall ◽  
Michael P. Mullen ◽  
Gillian P. McHugo ◽  
Kate E. Killick ◽  
Siobhán C. Ring ◽  
...  

Abstract BackgroundBovine TB (BTB), caused by infection with Mycobacterium bovis, is a major endemic disease affecting global cattle production, particularly in many developing countries. The key innate immune that first encounters the pathogen is the alveolar macrophage, previously shown to be substantially reprogrammed during intracellular infection by the pathogen. Here we use differential expression, and correlation- and interaction-based network approaches to analyse the host response to infection with M. bovis at the transcriptome level to identify core infection response pathways and gene modules. These outputs were then integrated with genome-wide association study (GWAS) data sets to enhance detection of genomic variants for susceptibility/resistance to M. bovis infection.ResultsThe host gene expression data consisted of bovine RNA-seq data from alveolar macrophages infected with M. bovis at 24 and 48 hours post-infection. These RNA-seq data were analysed using three distinct analysis pipelines and novel response pathways and modules were further refined using cross-comparison and integration of the results. First, a differential expression analysis was carried out to determine the most significantly differentially expressed (DE) genes between conditions at each time point. Second, two networks were constructed at each time point using gene correlation patterns to determine changes in expression across conditions. Functional sub-modules within each correlation network were selected by statistical criteria for modularity. Third, a base gene interaction network of the mammalian host response to mycobacterial infection was generated using the GeneCards database and InnateDB. Differential gene expression data were superimposed on this base network to extract functional modules of interconnected DE genes.ConclusionsBovine GWAS data was obtained from a published BTB susceptibility/resistance study. The results from the three parallel analyses were integrated with this data to determine which of the three approaches identified genes significantly enriched for SNPs associated with susceptibility/resistance to M. bovis infection. Results indicate distinct and significant overlap in SNP discovery, demonstrating that network-based integration of biologically relevant transcriptomics data can leverage substantial additional information from GWAS data sets.



Author(s):  
Bidossessi Wilfried Hounkpe ◽  
Francine Chenou ◽  
Franciele de Lima ◽  
Erich Vinicius De Paula

Abstract Housekeeping (HK) genes are constitutively expressed genes that are required for the maintenance of basic cellular functions. Despite their importance in the calibration of gene expression, as well as the understanding of many genomic and evolutionary features, important discrepancies have been observed in studies that previously identified these genes. Here, we present Housekeeping and Reference Transcript Atlas (HRT Atlas v1.0, www.housekeeping.unicamp.br) a web-based database which addresses some of the previously observed limitations in the identification of these genes, and offers a more accurate database of human and mouse HK genes and transcripts. The database was generated by mining massive human and mouse RNA-seq data sets, including 11 281 and 507 high-quality RNA-seq samples from 52 human non-disease tissues/cells and 14 healthy tissues/cells of C57BL/6 wild type mouse, respectively. User can visualize the expression and download lists of 2158 human HK transcripts from 2176 HK genes and 3024 mouse HK transcripts from 3277 mouse HK genes. HRT Atlas also offers the most stable and suitable tissue selective candidate reference transcripts for normalization of qPCR experiments. Specific primers and predicted modifiers of gene expression for some of these HK transcripts are also proposed. HRT Atlas has also been integrated with a regulatory elements resource from Epiregio server.



2020 ◽  
Vol 375 (1795) ◽  
pp. 20190341 ◽  
Author(s):  
Judit Salces-Ortiz ◽  
Carlos Vargas-Chavez ◽  
Lain Guio ◽  
Gabriel E. Rech ◽  
Josefa González

Most of the genotype–phenotype analyses to date have largely centred attention on single nucleotide polymorphisms. However, transposable element (TE) insertions have arisen as a plausible addition to the study of the genotypic–phenotypic link because of to their role in genome function and evolution. In this work, we investigate the contribution of TE insertions to the regulation of gene expression in response to insecticides. We exposed four Drosophila melanogaster strains to malathion, a commonly used organophosphate insecticide. By combining information from different approaches, including RNA-seq and ATAC-seq, we found that TEs can contribute to the regulation of gene expression under insecticide exposure by rewiring cis -regulatory networks. This article is part of a discussion meeting issue ‘Crossroads between transposons and gene regulation’.



Sign in / Sign up

Export Citation Format

Share Document