scholarly journals EXPath 2.0: An Updated Database for Integrating High-Throughput Gene Expression Data with Biological Pathways

2020 ◽  
Vol 61 (10) ◽  
pp. 1818-1827
Author(s):  
Kuan-Chieh Tseng ◽  
Guan-Zhen Li ◽  
Yu-Cheng Hung ◽  
Chi-Nga Chow ◽  
Nai-Yun Wu ◽  
...  

Abstract Co-expressed genes tend to have regulatory relationships and participate in similar biological processes. Construction of gene correlation networks from microarray or RNA-seq expression data has been widely applied to study transcriptional regulatory mechanisms and metabolic pathways under specific conditions. Furthermore, since transcription factors (TFs) are critical regulators of gene expression, it is worth investigating TFs on the promoters of co-expressed genes. Although co-expressed genes and their related metabolic pathways can be easily identified from previous resources, such as EXPath and EXPath Tool, this information is not simultaneously available to identify their regulatory TFs. EXPath 2.0 is an updated database for the investigation of regulatory mechanisms in various plant metabolic pathways with 1,881 microarray and 978 RNA-seq samples. There are six significant improvements in EXPath 2.0: (i) the number of species has been extended from three to six to include Arabidopsis, rice, maize, Medicago, soybean and tomato; (ii) gene expression at various developmental stages have been added; (iii) construction of correlation networks according to a group of genes is available; (iv) hierarchical figures of the enriched Gene Ontology (GO) terms are accessible; (v) promoter analysis of genes in a metabolic pathway or correlation network is provided; and (vi) user’s gene expression data can be uploaded and analyzed. Thus, EXPath 2.0 is an updated platform for investigating gene expression profiles and metabolic pathways under specific conditions. It facilitates users to access the regulatory mechanisms of plant biological processes. The new version is available at http://EXPath.itps.ncku.edu.tw.

2018 ◽  
Vol 16 (1) ◽  
Author(s):  
Cindy Perscheid ◽  
Bastien Grasnick ◽  
Matthias Uflacker

AbstractThe advance of high-throughput RNA-Sequencing techniques enables researchers to analyze the complete gene activity in particular cells. From the insights of such analyses, researchers can identify disease-specific expression profiles, thus understand complex diseases like cancer, and eventually develop effective measures for diagnosis and treatment. The high dimensionality of gene expression data poses challenges to its computational analysis, which is addressed with measures of gene selection. Traditional gene selection approaches base their findings on statistical analyses of the actual expression levels, which implies several drawbacks when it comes to accurately identifying the underlying biological processes. In turn, integrative approaches include curated information on biological processes from external knowledge bases during gene selection, which promises to lead to better interpretability and improved predictive performance. Our work compares the performance of traditional and integrative gene selection approaches. Moreover, we propose a straightforward approach to integrate external knowledge with traditional gene selection approaches. We introduce a framework enabling the automatic external knowledge integration, gene selection, and evaluation. Evaluation results prove our framework to be a useful tool for evaluation and show that integration of external knowledge improves overall analysis results.


2016 ◽  
Author(s):  
Kushal K Dey ◽  
Chiaowen Joyce Hsiao ◽  
Matthew Stephens

AbstractGrade of membership models, also known as “admixture models”, “topic models” or “Latent Dirichlet Allocation”, are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple “populations”, and in natural language processing to model documents having words from multiple “topics”. Here we illustrate the potential for these models to cluster samples of RNA-seq gene expression data, measured on either bulk samples or single cells. We also provide methods to help interpret the clusters, by identifying genes that are distinctively expressed in each cluster. By applying these methods to several example RNA-seq applications we demonstrate their utility in identifying and summarizing structure and heterogeneity. Applied to data from the GTEx project on 53 human tissues, the approach highlights similarities among biologically-related tissues and identifies distinctively-expressed genes that recapitulate known biology. Applied to single-cell expression data from mouse preimplantation embryos, the approach highlights both discrete and continuous variation through early embryonic development stages, and highlights genes involved in a variety of relevant processes – from germ cell development, through compaction and morula formation, to the formation of inner cell mass and trophoblast at the blastocyst stage. The methods are implemented in the Bioconductor package CountClust.Author SummaryGene expression profile of a biological sample (either from single cells or pooled cells) results from a complex interplay of multiple related biological processes. Consequently, for example, distal tissue samples may share a similar gene expression profile through some common underlying biological processes. Our goal here is to illustrate that grade of membership (GoM) models – an approach widely used in population genetics to cluster admixed individuals who have ancestry from multiple populations – provide an attractive approach for clustering biological samples of RNA sequencing data. The GoM model allows each biological sample to have partial memberships in multiple biologically-distinct clusters, in contrast to traditional clustering methods that partition samples into distinct subgroups. We also provide methods for identifying genes that are distinctively expressed in each cluster to help biologically interpret the results. Applied to a dataset of 53 human tissues, the GoM approach highlights similarities among biologically-related tissues and identifies distinctively-expressed genes that recapitulate known biology. Applied to gene expression data of single cells from mouse preimplantation embryos, the approach highlights both discrete and continuous variation through early embryonic development stages, and genes involved in a variety of relevant processes. Our study highlights the potential of GoM models for elucidating biological structure in RNA-seq gene expression data.


Processes ◽  
2019 ◽  
Vol 7 (5) ◽  
pp. 301
Author(s):  
Muying Wang ◽  
Satoshi Fukuyama ◽  
Yoshihiro Kawaoka ◽  
Jason E. Shoemaker

Motivation: Immune cell dynamics is a critical factor of disease-associated pathology (immunopathology) that also impacts the levels of mRNAs in diseased tissue. Deconvolution algorithms attempt to infer cell quantities in a tissue/organ sample based on gene expression profiles and are often evaluated using artificial, non-complex samples. Their accuracy on estimating cell counts given temporal tissue gene expression data remains not well characterized and has never been characterized when using diseased lung. Further, how to remove the effects of cell migration on transcript counts to improve discovery of disease factors is an open question. Results: Four cell count inference (i.e., deconvolution) tools are evaluated using microarray data from influenza-infected lung sampled at several time points post-infection. The analysis finds that inferred cell quantities are accurate only for select cell types and there is a tendency for algorithms to have a good relative fit (R 2 ) but a poor absolute fit (normalized mean squared error; NMSE), which suggests systemic biases exist. Nonetheless, using cell fraction estimates to adjust gene expression data, we show that genes associated with influenza virus replication and increased infection pathology are more likely to be identified as significant than when applying traditional statistical tests.


2015 ◽  
Vol 11 (1) ◽  
pp. 86-96 ◽  
Author(s):  
Aakash Chavan Ravindranath ◽  
Nolen Perualila-Tan ◽  
Adetayo Kasim ◽  
Georgios Drakakis ◽  
Sonia Liggi ◽  
...  

Integrating gene expression profiles with certain proteins can improve our understanding of the fundamental mechanisms in protein–ligand binding.


Author(s):  
Crescenzio Gallo

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.


2019 ◽  
Vol 15 (2) ◽  
pp. e1006792 ◽  
Author(s):  
Brandon Monier ◽  
Adam McDermaid ◽  
Cankun Wang ◽  
Jing Zhao ◽  
Allison Miller ◽  
...  

eLife ◽  
2017 ◽  
Vol 6 ◽  
Author(s):  
Julien Racle ◽  
Kaat de Jonge ◽  
Petra Baumgaertner ◽  
Daniel E Speiser ◽  
David Gfeller

Immune cells infiltrating tumors can have important impact on tumor progression and response to therapy. We present an efficient algorithm to simultaneously estimate the fraction of cancer and immune cell types from bulk tumor gene expression data. Our method integrates novel gene expression profiles from each major non-malignant cell type found in tumors, renormalization based on cell-type-specific mRNA content, and the ability to consider uncharacterized and possibly highly variable cell types. Feasibility is demonstrated by validation with flow cytometry, immunohistochemistry and single-cell RNA-Seq analyses of human melanoma and colorectal tumor specimens. Altogether, our work not only improves accuracy but also broadens the scope of absolute cell fraction predictions from tumor gene expression data, and provides a unique novel experimental benchmark for immunogenomics analyses in cancer research (http://epic.gfellerlab.org).


2019 ◽  
Vol 17 (04) ◽  
pp. 1950024 ◽  
Author(s):  
Tinghua Huang ◽  
Xiali Huang ◽  
Bomei Shi ◽  
Min Yao

Understanding how genes are expressed and regulated in different biological processes are fundamental and challenging issues. Considerable progress has been made in studying the relationship between the expression and regulation of human genes. However, it is difficult to use these resources productively to analyze gene expression data. GEREDB ( www.thua45.cn/geredb ) has been developed to facilitate analyses that will provide insights into the regulation of genes that govern specific biological responses. GEREDB is a publicly available, manually curated biological database that stores the data regarding relationships between expression and regulation of human genes. To date, more than 39,000 Links have been contextually annotated by reviewing more than 53,000 abstracts. GEREDB can be searched using the official NCBI gene symbol as a query, and it can be downloaded along with the GEREA software package. GEREDB has the ability to analyze user-supplied gene expression data in a causal analysis oriented manner using the GEREA bioinformatics tool.


Sign in / Sign up

Export Citation Format

Share Document