scholarly journals Metabolic pathway prediction using non-negative matrix factorization with improved precision

2020 ◽  
Author(s):  
Abdur Rahman M. A. Basher ◽  
Ryan J. McLaughlin ◽  
Steven J. Hallam

AbstractMachine learning provides a probabilistic framework for metabolic pathway inference from genomic sequence information at different levels of complexity and completion. However, several challenges including pathway features engineering, multiple mapping of enzymatic reactions and emergent or distributed metabolism within populations or communities of cells can limit prediction performance. In this paper, we present triUMPF, triple non-negative matrix factorization (NMF) with community detection for metabolic pathway inference, that combines three stages of NMF to capture myriad relationships between enzymes and pathways within a graph network. This is followed by community detection to extract higher order structure based on the clustering of vertices which share similar statistical properties. We evaluated triUMPF performance using experimental datasets manifesting diverse multi-label properties, including Tier 1 genomes from the BioCyc collection of organismal Pathway/Genome Databases and low complexity microbial communities. Resulting performance metrics equaled or exceeded other prediction methods on organismal genomes with improved precision on multi-organismal datasets.

2017 ◽  
Vol 381 ◽  
pp. 304-321 ◽  
Author(s):  
Xiao Liu ◽  
Wenjun Wang ◽  
Dongxiao He ◽  
Pengfei Jiao ◽  
Di Jin ◽  
...  

2020 ◽  
Author(s):  
Abdur Rahman M. A. Basher ◽  
Steven J. Hallam

AbstractMetabolic pathway reconstruction from genomic sequence information is a key step in predicting regulatory and functional potential of cells at the individual, population and community levels of organization. Although the most common methods for metabolic pathway reconstruction are gene-centric e.g. mapping annotated proteins onto known pathways using a reference database, pathway-centric methods based on heuristics or machine learning to infer pathway presence provide a powerful engine for hypothesis generation in biological systems. Such methods rely on rule sets or rich feature information that may not be known or readily accessible. Here, we present pathway2vec, a software package consisting of six representational learning based modules used to automatically generate features for pathway inference. Specifically, we build a three layered network composed of compounds, enzymes, and pathways, where nodes within a layer manifest inter-interactions and nodes between layers manifest betweenness interactions. This layered architecture captures relevant relationships used to learn a neural embedding-based low-dimensional space of metabolic features. We benchmark pathway2vec performance based on node-clustering, embedding visualization and pathway prediction using MetaCyc as a trusted source. In the pathway prediction task, results indicate that it is possible to leverage embeddings to improve pathway prediction outcomes.Availability and implementationThe software package, and installation instructions are published on github.com/[email protected]


2020 ◽  
Vol 24 (1) ◽  
pp. 119-139
Author(s):  
Shuaihui Wang ◽  
Guopeng Li ◽  
Guyu Hu ◽  
Hao Wei ◽  
Yu Pan ◽  
...  

Author(s):  
Abdur Rahman M. A. Basher ◽  
Steven J Hallam

Abstract Motivation Metabolic pathway reconstruction from genomic sequence information is a key step in predicting regulatory and functional potential of cells at the individual, population and community levels of organization. Although the most common methods for metabolic pathway reconstruction are gene-centric e.g. mapping annotated proteins onto known pathways using a reference database, pathway-centric methods based on heuristics or machine learning to infer pathway presence provide a powerful engine for hypothesis generation in biological systems. Such methods rely on rule sets or rich feature information that may not be known or readily accessible. Results Here, we present pathway2vec, a software package consisting of six representational learning modules used to automatically generate features for pathway inference. Specifically, we build a three-layered network composed of compounds, enzymes and pathways, where nodes within a layer manifest inter-interactions and nodes between layers manifest betweenness interactions. This layered architecture captures relevant relationships used to learn a neural embedding-based low-dimensional space of metabolic features. We benchmark pathway2vec performance based on node-clustering, embedding visualization and pathway prediction using MetaCyc as a trusted source. In the pathway prediction task, results indicate that it is possible to leverage embeddings to improve prediction outcomes. Availability and implementation The software package and installation instructions are published on http://github.com/pathway2vec. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Vol 30 (20) ◽  
pp. 1650130 ◽  
Author(s):  
Xiao Liu ◽  
Yi-Ming Wei ◽  
Jian Wang ◽  
Wen-Jun Wang ◽  
Dong-Xiao He ◽  
...  

Community detection is a meaningful task in the analysis of complex networks, which has received great concern in various domains. A plethora of exhaustive studies has made great effort and proposed many methods on community detection. Particularly, a kind of attractive one is the two-step method which first makes a preprocessing for the network and then identifies its communities. However, not all types of methods can achieve satisfactory results by using such preprocessing strategy, such as the non-negative matrix factorization (NMF) methods. In this paper, rather than using the above two-step method as most works did, we propose a graph regularized-based model to improve, specialized, the NMF-based methods for the detection of communities, namely NMFGR. In NMFGR, we introduce the similarity metric which contains both the global and local information of networks, to reflect the relationships between two nodes, so as to improve the accuracy of community detection. Experimental results on both artificial and real-world networks demonstrate the superior performance of NMFGR to some competing methods.


Sign in / Sign up

Export Citation Format

Share Document