Impute Gene Expression Missing Values via Biological Networks: Optimal Fusion of Data and Knowledge

Author(s):  
Mingrong Xiang ◽  
Jingyu Hou ◽  
Wei Luo ◽  
Wenjing Tao ◽  
Deshou Wang
2019 ◽  
Author(s):  
Prasad U. Bandodkar ◽  
Hadel Al Asafen ◽  
Gregory T. Reeves

AbstractA feed forward loop (FFL) is commonly observed in several biological networks. The FFL network motif has been mostly been studied with respect to variation of the input signal in time, with only a few studies of FFL activity in a spatially distributed system such as morphogen-mediated tissue patterning. However, most morphogen gradients also evolve in time. We studied the spatiotemporal behavior of a coherent FFL in two contexts: (1) a generic, oscillating morphogen gradient and (2) the dorsal-ventral patterning of the early Drosophila embryo by a gradient of the NF-κB homolog Dorsal with its early target Twist. In both models, we found features in the dynamics of the intermediate node – phase difference and noise filtering – that were largely independent of the parameterization of the models, and thus were functions of the structure of the FFL itself. In the Dorsal gradient model, we also found that the dynamics of Dorsal require maternal pioneering factor Zelda for proper target gene expression.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ramin Hasibi ◽  
Tom Michoel

Abstract Background Molecular interaction networks summarize complex biological processes as graphs, whose structure is informative of biological function at multiple scales. Simultaneously, omics technologies measure the variation or activity of genes, proteins, or metabolites across individuals or experimental conditions. Integrating the complementary viewpoints of biological networks and omics data is an important task in bioinformatics, but existing methods treat networks as discrete structures, which are intrinsically difficult to integrate with continuous node features or activity measures. Graph neural networks map graph nodes into a low-dimensional vector space representation, and can be trained to preserve both the local graph structure and the similarity between node features. Results We studied the representation of transcriptional, protein–protein and genetic interaction networks in E. coli and mouse using graph neural networks. We found that such representations explain a large proportion of variation in gene expression data, and that using gene expression data as node features improves the reconstruction of the graph from the embedding. We further proposed a new end-to-end Graph Feature Auto-Encoder framework for the prediction of node features utilizing the structure of the gene networks, which is trained on the feature prediction task, and showed that it performs better at predicting unobserved node features than regular MultiLayer Perceptrons. When applied to the problem of imputing missing data in single-cell RNAseq data, the Graph Feature Auto-Encoder utilizing our new graph convolution layer called FeatGraphConv outperformed a state-of-the-art imputation method that does not use protein interaction information, showing the benefit of integrating biological networks and omics data with our proposed approach. Conclusion Our proposed Graph Feature Auto-Encoder framework is a powerful approach for integrating and exploiting the close relation between molecular interaction networks and functional genomics data.


Genomics ◽  
2020 ◽  
Vol 112 (6) ◽  
pp. 5072-5085
Author(s):  
Abdulrahman Mujalli ◽  
Babajan Banaganapalli ◽  
Nuha Mohammad Alrayes ◽  
Noor A. Shaik ◽  
Ramu Elango ◽  
...  

Author(s):  
Kohbalan Moorthy ◽  
Aws Naser Jaber ◽  
Mohd Arfian Ismail ◽  
Ferda Ernawan ◽  
Mohd Saberi Mohamad ◽  
...  

Blood ◽  
2008 ◽  
Vol 112 (11) ◽  
pp. sci-51-sci-51
Author(s):  
Todd R. Golub

Genomics holds particular potential for the elucidation of biological networks that underlie disease. For example, gene expression profiles have been used to classify human cancers, and have more recently been used to predict graft rejection following organ transplantation. Such signatures thus hold promise both as diagnostic approaches and as tools with which to dissect biological mechanism. Such systems-based approaches are also beginning to impact the drug discovery process. For example, it is now feasible to measure gene expression signatures at low cost and high throughput, thereby allowing for the screening libraries of small molecule libraries in order to identify compounds capable of perturbing a signature of interest (even if the critical drivers of that signature are not yet known). This approach, known as Gene Expression-Based High Throughput Screening (GE-HTS), has been shown to identify candidate therapeutic approaches in AML, Ewing sarcoma, and neuroblastoma, and has identified tool compounds capable of inhibiting PDGF receptor signaling. A related approach, known as the Connectivity Map (www.broad.mit.edu/cmap) attempts to use gene expression profiles as a universal language with which to connect cellular states, gene product function, and drug action. In this manner, a gene expression signature of interest is used to computationally query a database of gene expression profiles of cells systematically treated with a large number of compounds (e.g., all off-patent FDA-approved drugs), thereby identifying potential new applications for existing drugs. Such systems level approaches thus seek chemical modulators of cellular states, even when the molecular basis of such altered states is unknown.


2017 ◽  
Vol 14 (1) ◽  
Author(s):  
Hamid Hamzeiy ◽  
Rabia Suluyayla ◽  
Christoph Brinkrolf ◽  
Sebastian Jan Janowski ◽  
Ralf Hofestaedt ◽  
...  

AbstractMicroRNAs (miRNAs) are small RNA molecules which are known to take part in post-transcriptional regulation of gene expression. Here, VANESA, an existing platform for reconstructing, visualizing, and analysis of large biological networks, has been further expanded to include all experimentally validated human miRNAs available within miRBase, TarBase and miRTarBase. This is done by integrating a custom hybrid miRNA database to DAWIS-M.D., VANESA’s main data source, enabling the visualization and analysis of miRNAs within large biological pathways such as those found within the Kyoto Encyclopedia of Genes and Genomes (KEGG). Interestingly, 99.15 % of human KEGG pathways either contain genes which are targeted by miRNAs or harbor them. This is mainly due to the high number of interaction partners that each miRNA could have (e.g.: hsa-miR-335-5p targets 2544 genes and 71 miRNAs target


2010 ◽  
Vol 13 (02) ◽  
pp. 217-238 ◽  
Author(s):  
GRAINNE KERR ◽  
DIMITRI PERRIN ◽  
HEATHER J. RUSKIN ◽  
MARTIN CRANE

In recent years, considerable research efforts have been directed to micro-array technologies and their role in providing simultaneous information on expression profiles for thousands of genes. These data, when subjected to clustering and classification procedures, can assist in identifying patterns and providing insight on biological processes. To understand the properties of complex gene expression datasets, graphical representations can be used. Intuitively, the data can be represented in terms of a bipartite graph, with weighted edges corresponding to gene-sample node couples in the dataset. Biologically meaningful subgraphs can be sought, but performance can be influenced both by the search algorithm, and, by the graph-weighting scheme and both merit rigorous investigation. In this paper, we focus on edge-weighting schemes for bipartite graphical representation of gene expression. Two novel methods are presented: the first is based on empirical evidence; the second on a geometric distribution. The schemes are compared for several real datasets, assessing efficiency of performance based on four essential properties: robustness to noise and missing values, discrimination, parameter influence on scheme efficiency and reusability. Recommendations and limitations are briefly discussed.


Sign in / Sign up

Export Citation Format

Share Document