scholarly journals PathCORE-T: identifying and visualizing globally co-occurring pathways in large transcriptomic compendia

2017 ◽  
Author(s):  
Kathleen M. Chen ◽  
Jie Tan ◽  
Gregory P. Way ◽  
Georgia Doing ◽  
Deborah A. Hogan ◽  
...  

AbstractBackgroundInvestigators often interpret genome-wide data by analyzing the expression levels of genes within pathways. While this within-pathway analysis is routine, the products of any one pathway can affect the activity of other pathways. Past efforts to identify relationships between biological processes have evaluated overlap in knowledge bases or evaluated changes that occur after specific treatments. Individual experiments can highlight condition-specific pathway-pathway relationships; however, constructing a complete network of such relationships across many conditions requires analyzing results from many studies.ResultsWe developed PathCORE-T framework by implementing existing methods to identify pathway-pathway transcriptional relationships evident across a broad data compendium. PathCORE-T is applied to the output of feature construction algorithms; it identifies pairs of pathways observed in features more than expected by chance as functionally co-occurring. We demonstrate PathCORE-T by analyzing an existing eADAGE model of a microbial compendium and building and analyzing NMF features from the TCGA dataset of 33 cancer types. The PathCORE-T framework includes a demonstration web interface, with source code, that users can launch to (1) visualize the network and (2) review the expression levels of associated genes in the original data. PathCORE-T creates and displays the network of globally co-occurring pathways based on features observed in a machine learning analysis of gene expression data.ConclusionsThe PathCORE-T framework identifies transcriptionally co-occurring pathways from the results of unsupervised analysis of gene expression data and visualizes the relationships between pathways as a network. PathCORE-T recapitulated previously described pathway-pathway relationships and suggested experimentally testable additional hypotheses that remain to be explored.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Thomas Bartlett

Abstract Background Network models are well-established as very useful computational-statistical tools in cell biology. However, a genomic network model based only on gene expression data can, by definition, only infer gene co-expression networks. Hence, in order to infer gene regulatory patterns, it is necessary to also include data related to binding of regulatory factors to DNA. Results We propose a new dynamic genomic network model, for inferring patterns of genomic regulatory influence in dynamic processes such as development. Our model fuses experiment-specific gene expression data with publicly available DNA-binding data. The method we propose is computationally efficient, and can be applied to genome-wide data with tens of thousands of transcripts. Thus, our method is well suited for use as an exploratory tool for genome-wide data. We apply our method to data from human fetal cortical development, and our findings confirm genomic regulatory patterns which are recognised as being fundamental to neuronal development. Conclusions Our method provides a mathematical/computational toolbox which, when coupled with targeted experiments, will reveal and confirm important new functional genomic regulatory processes in mammalian development.



2021 ◽  
Author(s):  
Thomas E Bartlett

Network models are well-established as very useful computational-statistical tools in cell biology. However, a genomic network model based only on gene expression data can, by definition, only infer gene co-expression networks. Hence, in order to infer gene regulatory patterns, it is neces- sary to also include data related to binding of regulatory factors to DNA. We propose a new dynamic genomic network model, for inferring patterns of genomic reg- ulatory influence in dynamic processes such as development. Our model fuses experiment-specific gene expression data with publicly available DNA-binding data. The method we propose is computa- tionally efficient, and can be applied to genome-wide data with tens of thousands of transcripts. Thus, our method is well suited for use as an exploratory tool for genome-wide data. We apply our method to data from human fetal cortical development, and our findings confirm genomic regulatory patterns which are recognised as being fundamental to neuronal development. Our method provides a mathematical/computational toolbox which, when coupled with targeted experiments, will reveal and confirm important new functional genomic regulatory processes in mammalian development.



2016 ◽  
Vol 88 (6) ◽  
pp. 2095-2110 ◽  
Author(s):  
H. Xu ◽  
C. Li ◽  
Q. Zeng ◽  
I. Agrawal ◽  
X. Zhu ◽  
...  


2020 ◽  
Vol 14 ◽  
Author(s):  
Mette Soerensen ◽  
Dominika Marzena Hozakowska-Roszkowska ◽  
Marianne Nygaard ◽  
Martin J. Larsen ◽  
Veit Schwämmle ◽  
...  


Cancers ◽  
2019 ◽  
Vol 11 (7) ◽  
pp. 983 ◽  
Author(s):  
Otília Menyhart ◽  
Tatsuhiko Kakisaka ◽  
Lőrinc Sándor Pongor ◽  
Hiroyuki Uetake ◽  
Ajay Goel ◽  
...  

Background: Numerous driver mutations have been identified in colorectal cancer (CRC), but their relevance to the development of targeted therapies remains elusive. The secondary effects of pathogenic driver mutations on downstream signaling pathways offer a potential approach for the identification of therapeutic targets. We aimed to identify differentially expressed genes as potential drug targets linked to driver mutations. Methods: Somatic mutations and the gene expression data of 582 CRC patients were utilized, incorporating the mutational status of 39,916 and the expression levels of 20,500 genes. To uncover candidate targets, the expression levels of various genes in wild-type and mutant cases for the most frequent disruptive mutations were compared with a Mann–Whitney test. A survival analysis was performed in 2100 patients with transcriptomic gene expression data. Up-regulated genes associated with worse survival were filtered for potentially actionable targets. The most significant hits were validated in an independent set of 171 CRC patients. Results: Altogether, 426 disruptive mutation-associated upregulated genes were identified. Among these, 95 were linked to worse recurrence-free survival (RFS). Based on the druggability filter, 37 potentially actionable targets were revealed. We selected seven genes and validated their expression in 171 patient specimens. The best independently validated combinations were DUSP4 (p = 2.6 × 10−12) in ACVR2A mutated (7.7%) patients; BMP4 (p = 1.6 × 10−04) in SOX9 mutated (8.1%) patients; TRIB2 (p = 1.35 × 10−14) in ACVR2A mutated patients; VSIG4 (p = 2.6 × 10−05) in ANK3 mutated (7.6%) patients, and DUSP4 (p = 7.1 × 10−04) in AMER1 mutated (8.2%) patients. Conclusions: The results uncovered potentially druggable genes in colorectal cancer. The identified mutations could enable future patient stratification for targeted therapy.



2021 ◽  
Author(s):  
Huan-Huan Wei ◽  
Hui Lu ◽  
Hongyu Zhao

AbstractMany computational methods have been developed for inferring causality among genes using cross-sectional gene expression data, such as single-cell RNA sequencing (scRNA-seq) data. However, due to the limitations of scRNA-seq technologies, time-lagged causal relationships may be missed by existing methods. In this work, we propose a method, called causal inference with time-lagged information (CITL), to infer time-lagged causal relationships from scRNA-seq data by assessing conditional independence between the changing and current expression levels of genes. CITL estimates the changing expression levels of genes by “RNA velocity”. We demonstrate the accuracy and stability of CITL for inferring time-lagged causality on simulation data against other leading approaches. We have applied CITL to real scRNA data and inferred 878 pairs of time-lagged causal relationships, with many of these inferred results supported by the literature.Author summaryComputational causal inference is a promising way to survey causal relationships between genes efficiently. Though many causal inference methods have been applied to gene expression data, none considers the time-lagged causal relationship, which means that some genes may take some time to affect their target genes with several reactions. If relationships between genes are time-lagged, the existing methods’ assumptions will be violated. The relationships will be challenging to recognize. We demonstrate that this is indeed the case through simulation. Therefore, we develop a method for inferring time-lagged causal relationships of single-cell gene expression data. We assume that a time-lagged causal relationship should present a strong association between the cause and the effect’s changing. To calculate such correlation, we first estimate the derivative of gene expression using the information from unspliced transcripts. Then, we use conditional independent tests to search gene pairs satisfying our assumption. Our results suggest that we could accurately infer time-lagged causal gene pairs validated by published literature. This method may complement gene regulatory analysis and provide candidate gene pairs for further controlled experiments.



BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Yuanyuan Li ◽  
David M. Umbach ◽  
Adrienna Bingham ◽  
Qi-Jing Li ◽  
Yuan Zhuang ◽  
...  

Abstract Background Tumor purity is the percent of cancer cells present in a sample of tumor tissue. The non-cancerous cells (immune cells, fibroblasts, etc.) have an important role in tumor biology. The ability to determine tumor purity is important to understand the roles of cancerous and non-cancerous cells in a tumor. Methods We applied a supervised machine learning method, XGBoost, to data from 33 TCGA tumor types to predict tumor purity using RNA-seq gene expression data. Results Across the 33 tumor types, the median correlation between observed and predicted tumor-purity ranged from 0.75 to 0.87 with small root mean square errors, suggesting that tumor purity can be accurately predicted υσινγ expression data. We further confirmed that expression levels of a ten-gene set (CSF2RB, RHOH, C1S, CCDC69, CCL22, CYTIP, POU2AF1, FGR, CCL21, and IL7R) were predictive of tumor purity regardless of tumor type. We tested whether our set of ten genes could accurately predict tumor purity of a TCGA-independent data set. We showed that expression levels from our set of ten genes were highly correlated (ρ = 0.88) with the actual observed tumor purity. Conclusions Our analyses suggested that the ten-gene set may serve as a biomarker for tumor purity prediction using gene expression data.



Author(s):  
M. P. KURHEKAR ◽  
S. ADAK ◽  
S. JHUNJHUNWALA ◽  
K. RAGHUPATHY


Author(s):  
Orazio Palmieri ◽  
Teresa M. Creanza ◽  
Fabrizio Bossa ◽  
Orazio Palumbo ◽  
Rosalia Maglietta ◽  
...  


Sign in / Sign up

Export Citation Format

Share Document