Applying Intelligent Computing Techniques to Modeling Biological Networks from Expression Data

Abstract Background Molecular interaction networks summarize complex biological processes as graphs, whose structure is informative of biological function at multiple scales. Simultaneously, omics technologies measure the variation or activity of genes, proteins, or metabolites across individuals or experimental conditions. Integrating the complementary viewpoints of biological networks and omics data is an important task in bioinformatics, but existing methods treat networks as discrete structures, which are intrinsically difficult to integrate with continuous node features or activity measures. Graph neural networks map graph nodes into a low-dimensional vector space representation, and can be trained to preserve both the local graph structure and the similarity between node features. Results We studied the representation of transcriptional, protein–protein and genetic interaction networks in E. coli and mouse using graph neural networks. We found that such representations explain a large proportion of variation in gene expression data, and that using gene expression data as node features improves the reconstruction of the graph from the embedding. We further proposed a new end-to-end Graph Feature Auto-Encoder framework for the prediction of node features utilizing the structure of the gene networks, which is trained on the feature prediction task, and showed that it performs better at predicting unobserved node features than regular MultiLayer Perceptrons. When applied to the problem of imputing missing data in single-cell RNAseq data, the Graph Feature Auto-Encoder utilizing our new graph convolution layer called FeatGraphConv outperformed a state-of-the-art imputation method that does not use protein interaction information, showing the benefit of integrating biological networks and omics data with our proposed approach. Conclusion Our proposed Graph Feature Auto-Encoder framework is a powerful approach for integrating and exploiting the close relation between molecular interaction networks and functional genomics data.

Download Full-text

Clustering analysis of tumor metabolic networks

BMC Bioinformatics ◽

10.1186/s12859-020-03564-9 ◽

2020 ◽

Vol 21 (S10) ◽

Author(s):

Ichcha Manipur ◽

Ilaria Granata ◽

Lucia Maddalena ◽

Mario R. Guarracino

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Protein Interactions ◽

Biological Networks ◽

Clustering Analysis ◽

Large Scale ◽

Metabolic Networks ◽

Computational Time ◽

Expression Data ◽

Metabolic Models

Abstract Background Biological networks are representative of the diverse molecular interactions that occur within cells. Some of the commonly studied biological networks are modeled through protein-protein interactions, gene regulatory, and metabolic pathways. Among these, metabolic networks are probably the most studied, as they directly influence all physiological processes. Exploration of biochemical pathways using multigraph representation is important in understanding complex regulatory mechanisms. Feature extraction and clustering of these networks enable grouping of samples obtained from different biological specimens. Clustering techniques separate networks depending on their mutual similarity. Results We present a clustering analysis on tissue-specific metabolic networks for single samples from three primary tumor sites: breast, lung, and kidney cancer. The metabolic networks were obtained by integrating genome scale metabolic models with gene expression data. We performed network simplification to reduce the computational time needed for the computation of network distances. We empirically proved that networks clustering can characterize groups of patients in multiple conditions. Conclusions We provide a computational methodology to explore and characterize the metabolic landscape of tumors, thus providing a general methodology to integrate analytic metabolic models with gene expression data. This method represents a first attempt in clustering large scale metabolic networks. Moreover, this approach gives the possibility to get valuable information on what are the effects of different conditions on the overall metabolism.

Download Full-text

Improvement of cancer subtype prediction by incorporating transcriptome expression data and heterogeneous biological networks

BMC Medical Genomics ◽

10.1186/s12920-018-0435-x ◽

2018 ◽

Vol 11 (S6) ◽

Cited By ~ 3

Author(s):

Yang Guo ◽

Yang Qi ◽

Zhanhuai Li ◽

Xuequn Shang

Keyword(s):

Biological Networks ◽

Expression Data ◽

Cancer Subtype ◽

Transcriptome Expression

Download Full-text

A Fast Quad-Tree Based Two Dimensional Hierarchical Clustering

Bioinformatics and Biology Insights ◽

10.4137/bbi.s10383 ◽

2012 ◽

Vol 6 ◽

pp. BBI.S10383

Author(s):

Priscilla Rajadurai ◽

Swamynathan Sankaranarayanan

Keyword(s):

Gene Expression ◽

Hierarchical Clustering ◽

Gene Expression Data ◽

Biological Networks ◽

Processing Time ◽

Clustering Algorithm ◽

Expression Patterns ◽

Expression Data ◽

Important Time ◽

Analogous Expression

Recently, microarray technologies have become a robust technique in the area of genomics. An important step in the analysis of gene expression data is the identification of groups of genes disclosing analogous expression patterns. Cluster analysis partitions a given dataset into groups based on specified features. Euclidean distance is a widely used similarity measure for gene expression data that considers the amount of changes in gene expression. However, the huge number of genes and the intricacy of biological networks have highly increased the challenges of comprehending and interpreting the resulting group of data, increasing processing time. The proposed technique focuses on a QT based fast 2-dimensional hierarchical clustering algorithm to perform clustering. The construction of the closest pair data structure is an each level is an important time factor, which determines the processing time of clustering. The proposed model reduces the processing time and improves analysis of gene expression data.

Download Full-text

Exploring the Operational Characteristics of Inference Algorithms for Transcriptional Networks by Means of Synthetic Data

Artificial Life ◽

10.1162/artl.2008.14.1.49 ◽

2008 ◽

Vol 14 (1) ◽

pp. 49-63 ◽

Cited By ~ 1

Author(s):

Koenraad Van Leemput ◽

Tim Van den Bulcke ◽

Thomas Dhollander ◽

Bart De Moor ◽

Kathleen Marchal ◽

...

Keyword(s):

Biological Networks ◽

Regulatory Networks ◽

Structure Learning ◽

Synthetic Data ◽

Network Size ◽

Transcriptional Networks ◽

Data Sets ◽

Expression Data ◽

Operational Characteristics ◽

Inference Algorithms

The development of structure-learning algorithms for gene regulatory networks depends heavily on the availability of synthetic data sets that contain both the original network and associated expression data. This article reports the application of SynTReN, an existing network generator that samples topologies from existing biological networks and uses Michaelis-Menten and Hill enzyme kinetics to simulate gene interactions. We illustrate the effects of different aspects of the expression data on the quality of the inferred network. The tested expression data parameters are network size, network topology, type and degree of noise, quantity of expression data, and interaction types between genes. This is done by applying three well-known inference algorithms to SynTReN data sets. The results show the power of synthetic data in revealing operational characteristics of inference algorithms that are unlikely to be discovered by means of biological microarray data only.

Download Full-text

XGRN: Reconstruction of Biological Networks Based on Boosted Trees Regression

Computation ◽

10.3390/computation9040048 ◽

2021 ◽

Vol 9 (4) ◽

pp. 48

Author(s):

Georgios N. Dimitrakopoulos

Keyword(s):

Gene Expression ◽

Regression Model ◽

Gene Expression Data ◽

Biological Networks ◽

High Performance ◽

Regulatory Networks ◽

Target Genes ◽

Biological Information ◽

Expression Data ◽

Gene Regulatory

In Systems Biology, the complex relationships between different entities in the cells are modeled and analyzed using networks. Towards this aim, a rich variety of gene regulatory network (GRN) inference algorithms has been developed in recent years. However, most algorithms rely solely on gene expression data to reconstruct the network. Due to possible expression profile similarity, predictions can contain connections between biologically unrelated genes. Therefore, previously known biological information should also be considered by computational methods to obtain more consistent results, such as experimentally validated interactions between transcription factors and target genes. In this work, we propose XGBoost for gene regulatory networks (XGRN), a supervised algorithm, which combines gene expression data with previously known interactions for GRN inference. The key idea of our method is to train a regression model for each known interaction of the network and then utilize this model to predict new interactions. The regression is performed by XGBoost, a state-of-the-art algorithm using an ensemble of decision trees. In detail, XGRN learns a regression model based on gene expression of the two interactors and then provides predictions using as input the gene expression of other candidate interactors. Application on benchmark datasets and a real large single-cell RNA-Seq experiment resulted in high performance compared to other unsupervised and supervised methods, demonstrating the ability of XGRN to provide reliable predictions.

Download Full-text

Co-clustering of biological networks and gene expression data

Bioinformatics ◽

10.1093/bioinformatics/18.suppl_1.s145 ◽

2002 ◽

Vol 18 (Suppl 1) ◽

pp. S145-S154 ◽

Cited By ~ 176

Author(s):

D. Hanisch ◽

A. Zien ◽

R. Zimmer ◽

T. Lengauer

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Biological Networks ◽

Expression Data

Download Full-text

Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1055 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-29 ◽

Cited By ~ 58

Author(s):

Jörg Rahnenführer ◽

Francisco S Domingues ◽

Jochen Maydt ◽

Thomas Lengauer

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Biological Networks ◽

Permutation Test ◽

Statistical Significance ◽

Data Sets ◽

Expression Data ◽

Biologically Relevant ◽

Gene Sets ◽

Best Fitting

We present a statistical approach to scoring changes in activity of metabolic pathways from gene expression data. The method identifies the biologically relevant pathways with corresponding statistical significance. Based on gene expression data alone, only local structures of genetic networks can be recovered. Instead of inferring such a network, we propose a hypothesis-based approach. We use given knowledge about biological networks to improve sensitivity and interpretability of findings from microarray experiments.Recently introduced methods test if members of predefined gene sets are enriched in a list of top-ranked genes in a microarray study. We improve this approach by defining scores that depend on all members of the gene set and that also take pairwise co-regulation of these genes into account. We calculate the significance of co-regulation of gene sets with a nonparametric permutation test. On two data sets the method is validated and its biological relevance is discussed. It turns out that useful measures for co-regulation of genes in a pathway can be identified adaptively.We refine our method in two aspects specific to pathways. First, to overcome the ambiguity of enzyme-to-gene mappings for a fixed pathway, we introduce algorithms for selecting the best fitting gene for a specific enzyme in a specific condition. In selected cases, functional assignment of genes to pathways is feasible. Second, the sensitivity of detecting relevant pathways is improved by integrating information about pathway topology. The distance of two enzymes is measured by the number of reactions needed to connect them, and enzyme pairs with a smaller distance receive a higher weight in the score calculation.

Download Full-text

Cluster Analysis of Gene Expression Data

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch045 ◽

2011 ◽

pp. 289-296

Author(s):

Alan Wee-Chung Liew ◽

Ngai-Fong Law ◽

Hong Yan

Keyword(s):

Gene Expression ◽

Cluster Analysis ◽

Gene Expression Data ◽

Biological Networks ◽

Microarray Experiment ◽

Hybridization Experiment ◽

Cancer Prognosis ◽

Expression Data ◽

Functional Roles ◽

Exploratory Technique

Important insights into gene function can be gained by gene expression analysis. For example, some genes are turned on (expressed) or turned off (repressed) when there is a change in external conditions or stimuli. The expression of one gene is often regulated by the expression of other genes. A detail analysis of gene expression information will provide an understanding about the inter-networking of different genes and their functional roles. DNA microarray technology allows massively parallel, high throughput genome-wide profiling of gene expression in a single hybridization experiment [Lockhart & Winzeler, 2000]. It has been widely used in numerous studies over a broad range of biological disciplines, such as cancer classification (Armstrong et al., 2002), identification of genes relevant to a certain diagnosis or therapy (Muro et al., 2003), investigation of the mechanism of drug action and cancer prognosis (Kim et al., 2000; Duggan et al., 1999). Due to the large number of genes involved in microarray experiment study and the complexity of biological networks, clustering is an important exploratory technique for gene expression data analysis. In this article, we present a succinct review of some of our work in cluster analysis of gene expression data.

Download Full-text

Contextual Hub Analysis Tool (CHAT): A Cytoscape app for identifying contextually relevant hubs in biological networks

F1000Research ◽

10.12688/f1000research.9118.1 ◽

2016 ◽

Vol 5 ◽

pp. 1745 ◽

Cited By ~ 10

Author(s):

Tanja Muetze ◽

Ivan H. Goenawan ◽

Heather L. Wiencko ◽

Manuel Bernal-Llinares ◽

Kenneth Bryan ◽

...

Keyword(s):

Viral Infection ◽

Biological Networks ◽

Contextual Information ◽

Gene List ◽

Differentially Expressed ◽

Analysis Tool ◽

P Availability ◽

Expression Data ◽

Biologically Relevant

Highly connected nodes (hubs) in biological networks are topologically important to the structure of the network and have also been shown to be preferentially associated with a range of phenotypes of interest. The relative importance of a hub node, however, can change depending on the biological context. Here, we report a Cytoscape app, the Contextual Hub Analysis Tool (CHAT), which enables users to easily construct and visualize a network of interactions from a gene list of interest, integrate contextual information, such as gene expression data, and identify hub nodes that are more highly connected to contextual nodes (e.g. genes that are differentially expressed) than expected by chance. In a case study, we use CHAT to construct a network of genes that are differentially expressed in Dengue fever, a viral infection. CHAT was used to identify and compare contextual and degree-based hubs in this network. The top 20 degree-based hubs were enriched in pathways related to the cell cycle and cancer, which is likely due to the fact that proteins involved in these processes tend to be highly connected in general. In comparison, the top 20 contextual hubs were enriched in pathways commonly observed in a viral infection including pathways related to the immune response to viral infection. This analysis shows that such contextual hubs are considerably more biologically relevant than degree-based hubs and that analyses which rely on the identification of hubs solely based on their connectivity may be biased towards nodes that are highly connected in general rather than in the specific context of interest. Availability: CHAT is available for Cytoscape 3.0+ and can be installed via the Cytoscape App Store (http://apps.cytoscape.org/apps/chat).

Download Full-text