Clustering analysis of tumor metabolic networks

Abstract Background Biological networks are representative of the diverse molecular interactions that occur within cells. Some of the commonly studied biological networks are modeled through protein-protein interactions, gene regulatory, and metabolic pathways. Among these, metabolic networks are probably the most studied, as they directly influence all physiological processes. Exploration of biochemical pathways using multigraph representation is important in understanding complex regulatory mechanisms. Feature extraction and clustering of these networks enable grouping of samples obtained from different biological specimens. Clustering techniques separate networks depending on their mutual similarity. Results We present a clustering analysis on tissue-specific metabolic networks for single samples from three primary tumor sites: breast, lung, and kidney cancer. The metabolic networks were obtained by integrating genome scale metabolic models with gene expression data. We performed network simplification to reduce the computational time needed for the computation of network distances. We empirically proved that networks clustering can characterize groups of patients in multiple conditions. Conclusions We provide a computational methodology to explore and characterize the metabolic landscape of tumors, thus providing a general methodology to integrate analytic metabolic models with gene expression data. This method represents a first attempt in clustering large scale metabolic networks. Moreover, this approach gives the possibility to get valuable information on what are the effects of different conditions on the overall metabolism.

Download Full-text

Graph Convolutional Network for Drug Response Prediction Using Gene Expression Data

Mathematics ◽

10.3390/math9070772 ◽

2021 ◽

Vol 9 (7) ◽

pp. 772

Author(s):

Seonghun Kim ◽

Seockhun Bae ◽

Yinhua Piao ◽

Kyuri Jo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Drug Response ◽

Response Prediction ◽

Biological Data ◽

Expression Data ◽

Convolutional Network ◽

Essential Information ◽

Protein Protein Interaction

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.

Download Full-text

GENE DISCOVERY METHODS FROM LARGE-SCALE GENE EXPRESSION DATA

Quantum Bio-Informatics III ◽

10.1142/9789814304061_0040 ◽

2010 ◽

Author(s):

AKIFUMI SHIMIZU ◽

KENTARO YANO

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Gene Discovery ◽

Expression Data

Download Full-text

LSTrAP-Crowd: Prediction of novel components of bacterial ribosomes with crowd-sourced analysis of RNA sequencing data

10.1101/2020.04.20.005249 ◽

2020 ◽

Author(s):

Benedict Hew ◽

Qiao Wen Tan ◽

William Goh ◽

Jonathan Wei Xiong Ng ◽

Kenny Koh ◽

...

Keyword(s):

Gene Expression ◽

Protein Synthesis ◽

Rna Sequencing ◽

Gene Expression Data ◽

Large Scale ◽

Bacterial Resistance ◽

Expression Data ◽

Sequencing Data ◽

Novel Proteins ◽

Novel Antibiotics

AbstractBacterial resistance to antibiotics is a growing problem that is projected to cause more deaths than cancer in 2050. Consequently, novel antibiotics are urgently needed. Since more than half of the available antibiotics target the bacterial ribosomes, proteins that are involved in protein synthesis are thus prime targets for the development of novel antibiotics. However, experimental identification of these potential antibiotic target proteins can be labor-intensive and challenging, as these proteins are likely to be poorly characterized and specific to few bacteria. In order to identify these novel proteins, we established a Large-Scale Transcriptomic Analysis Pipeline in Crowd (LSTrAP-Crowd), where 285 individuals processed 26 terabytes of RNA-sequencing data of the 17 most notorious bacterial pathogens. In total, the crowd processed 26,269 RNA-seq experiments and used the data to construct gene co-expression networks, which were used to identify more than a hundred uncharacterized genes that were transcriptionally associated with protein synthesis. We provide the identity of these genes together with the processed gene expression data. The data can be used to identify other vulnerabilities or bacteria, while our approach demonstrates how the processing of gene expression data can be easily crowdsourced.

Download Full-text

A Graph Feature Auto-Encoder for the prediction of unobserved node features on biological networks

BMC Bioinformatics ◽

10.1186/s12859-021-04447-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ramin Hasibi ◽

Tom Michoel

Keyword(s):

Gene Expression ◽

Neural Networks ◽

Gene Expression Data ◽

Biological Networks ◽

Molecular Interaction ◽

Interaction Networks ◽

Omics Data ◽

Expression Data ◽

Molecular Interaction Networks ◽

Graph Neural Networks

Abstract Background Molecular interaction networks summarize complex biological processes as graphs, whose structure is informative of biological function at multiple scales. Simultaneously, omics technologies measure the variation or activity of genes, proteins, or metabolites across individuals or experimental conditions. Integrating the complementary viewpoints of biological networks and omics data is an important task in bioinformatics, but existing methods treat networks as discrete structures, which are intrinsically difficult to integrate with continuous node features or activity measures. Graph neural networks map graph nodes into a low-dimensional vector space representation, and can be trained to preserve both the local graph structure and the similarity between node features. Results We studied the representation of transcriptional, protein–protein and genetic interaction networks in E. coli and mouse using graph neural networks. We found that such representations explain a large proportion of variation in gene expression data, and that using gene expression data as node features improves the reconstruction of the graph from the embedding. We further proposed a new end-to-end Graph Feature Auto-Encoder framework for the prediction of node features utilizing the structure of the gene networks, which is trained on the feature prediction task, and showed that it performs better at predicting unobserved node features than regular MultiLayer Perceptrons. When applied to the problem of imputing missing data in single-cell RNAseq data, the Graph Feature Auto-Encoder utilizing our new graph convolution layer called FeatGraphConv outperformed a state-of-the-art imputation method that does not use protein interaction information, showing the benefit of integrating biological networks and omics data with our proposed approach. Conclusion Our proposed Graph Feature Auto-Encoder framework is a powerful approach for integrating and exploiting the close relation between molecular interaction networks and functional genomics data.

Download Full-text

Clustering analysis for gene expression data

10.1117/12.347541 ◽

1999 ◽

Author(s):

Yidong Chen ◽

Olga Ermolaeva ◽

Michael L. Bittner ◽

Paul S. Meltzer ◽

Jeffrey M. Trent ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Clustering Analysis ◽

Expression Data

Download Full-text

Defining transcription modules using large-scale gene expression data

Bioinformatics ◽

10.1093/bioinformatics/bth166 ◽

2004 ◽

Vol 20 (13) ◽

pp. 1993-2003 ◽

Cited By ~ 216

Author(s):

J. Ihmels ◽

S. Bergmann ◽

N. Barkai

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Expression Data

Download Full-text

Large-Scale Integration of MicroRNA and Gene Expression Data for Identification of Enriched MicroRNA–mRNA Associations in Biological Systems

Methods in Molecular Biology - MicroRNAs and the Immune System ◽

10.1007/978-1-60761-811-9_20 ◽

2010 ◽

pp. 297-315 ◽

Cited By ~ 28

Author(s):

Preethi H. Gunaratne ◽

Chad J. Creighton ◽

Michael Watson ◽

Jayantha B. Tennakoon

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Biological Systems ◽

Expression Data ◽

Large Scale Integration ◽

Scale Integration

Download Full-text

Processing Large-Scale, High-Dimension Genetic and Gene Expression Data

Handbook on Analyzing Human Genetic Data ◽

10.1007/978-3-540-69264-5_11 ◽

2009 ◽

pp. 307-330

Author(s):

Cliona Molony ◽

Solveig K. Sieberts ◽

Eric E. Schadt

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

High Dimension ◽

Large Scale ◽

Expression Data

Download Full-text

Mining the Gene Expression Matrix: Inferring Gene Relationships from Large Scale Gene Expression Data

Information Processing in Cells and Tissues ◽

10.1007/978-1-4615-5345-8_22 ◽

1998 ◽

pp. 203-212 ◽

Cited By ~ 35

Author(s):

Patrik D’haeseleer ◽

Xiling Wen ◽

Stefanie Fuhrman ◽

Roland Somogyi

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Expression Data ◽

Gene Expression Matrix ◽

Expression Matrix

Download Full-text

3145 An Evaluation of Machine Learning and Traditional Statistical Methods for Discovery in Large-Scale Translational Data

Journal of Clinical and Translational Science ◽

10.1017/cts.2019.8 ◽

2019 ◽

Vol 3 (s1) ◽

pp. 2-2

Author(s):

Megan C Hollister ◽

Jeffrey D. Blume

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Random Forest ◽

Gene Expression Data ◽

Large Scale ◽

Second Generation ◽

A Priori ◽

Expression Data ◽

P Values ◽

Machine Learning Methods

OBJECTIVES/SPECIFIC AIMS: To examine and compare the claims in Bzdok, Altman, and Brzywinski under a broader set of conditions by using unbiased methods of comparison. To explore how to accurately use various machine learning and traditional statistical methods in large-scale translational research by estimating their accuracy statistics. Then we will identify the methods with the best performance characteristics. METHODS/STUDY POPULATION: We conducted a simulation study with a microarray of gene expression data. We maintained the original structure proposed by Bzdok, Altman, and Brzywinski. The structure for gene expression data includes a total of 40 genes from 20 people, in which 10 people are phenotype positive and 10 are phenotype negative. In order to find a statistical difference 25% of the genes were set to be dysregulated across phenotype. This dysregulation forced the positive and negative phenotypes to have different mean population expressions. Additional variance was included to simulate genetic variation across the population. We also allowed for within person correlation across genes, which was not done in the original simulations. The following methods were used to determine the number of dysregulated genes in simulated data set: unadjusted p-values, Benjamini-Hochberg adjusted p-values, Bonferroni adjusted p-values, random forest importance levels, neural net prediction weights, and second-generation p-values. RESULTS/ANTICIPATED RESULTS: Results vary depending on whether a pre-specified significance level is used or the top 10 ranked values are taken. When all methods are given the same prior information of 10 dysregulated genes, the Benjamini-Hochberg adjusted p-values and the second-generation p-values generally outperform all other methods. We were not able to reproduce or validate the finding that random forest importance levels via a machine learning algorithm outperform classical methods. Almost uniformly, the machine learning methods did not yield improved accuracy statistics and they depend heavily on the a priori chosen number of dysregulated genes. DISCUSSION/SIGNIFICANCE OF IMPACT: In this context, machine learning methods do not outperform standard methods. Because of this and their additional complexity, machine learning approaches would not be preferable. Of all the approaches the second-generation p-value appears to offer significant benefit for the cost of a priori defining a region of trivially null effect sizes. The choice of an analysis method for large-scale translational data is critical to the success of any statistical investigation, and our simulations clearly highlight the various tradeoffs among the available methods.

Download Full-text