scholarly journals WMGHMDA: a novel weighted meta-graph-based model for predicting human microbe-disease association on heterogeneous information network

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Yahui Long ◽  
Jiawei Luo

Abstract Background An increasing number of biological and clinical evidences have indicated that the microorganisms significantly get involved in the pathological mechanism of extensive varieties of complex human diseases. Inferring potential related microbes for diseases can not only promote disease prevention, diagnosis and treatment, but also provide valuable information for drug development. Considering that experimental methods are expensive and time-consuming, developing computational methods is an alternative choice. However, most of existing methods are biased towards well-characterized diseases and microbes. Furthermore, existing computational methods are limited in predicting potential microbes for new diseases. Results Here, we developed a novel computational model to predict potential human microbe-disease associations (MDAs) based on Weighted Meta-Graph (WMGHMDA). We first constructed a heterogeneous information network (HIN) by combining the integrated microbe similarity network, the integrated disease similarity network and the known microbe-disease bipartite network. And then, we implemented iteratively pre-designed Weighted Meta-Graph search algorithm on the HIN to uncover possible microbe-disease pairs by cumulating the contribution values of weighted meta-graphs to the pairs as their probability scores. Depending on contribution potential, we described the contribution degree of different types of meta-graphs to a microbe-disease pair with bias rating. Meta-graph with higher bias rating will be assigned greater weight value when calculating probability scores. Conclusions The experimental results showed that WMGHMDA outperformed some state-of-the-art methods with average AUCs of 0.9288, 0.9068 ±0.0031 in global leave-one-out cross validation (LOOCV) and 5-fold cross validation (5-fold CV), respectively. In the case studies, 9, 19, 37 and 10, 20, 45 out of top-10, 20, 50 candidate microbes were manually verified by previous reports for asthma and inflammatory bowel disease (IBD), respectively. Furthermore, three common human diseases (Crohn’s disease, Liver cirrhosis, Type 1 diabetes) were adopted to demonstrate that WMGHMDA could be efficiently applied to make predictions for new diseases. In summary, WMGHMDA has a high potential in predicting microbe-disease associations.

2020 ◽  
Author(s):  
Bo-Ya Ji ◽  
Zhu-Hong You ◽  
Han-Jing Jiang ◽  
Zhen-Hao Guo ◽  
Kai Zheng

Abstract Background: The prediction of potential drug-protein target interactions (DTIs) not only provides a better comprehension of biological processes but also is critical for identifying new drugs. However, due to the disadvantages of expensive and high time-consuming traditional experiments, only a small section of interactions between drugs and targets in the database were verified experimentally. Therefore, it is meaningful and important to develop new computational methods with good performance for DTIs prediction. At present, many existing computational methods only utilize the single type of interactions between drugs and proteins without paying attention to the associations and influences with other types of molecules. Methods: In this work, we developed a novel network embedding-based heterogeneous information integration model to predict potential drug-target interactions. Firstly, a heterogeneous information network is built by combining the known associations among protein, drug, lncRNA, disease, and miRNA. Secondly, the Large-scale Information Network Embedding (LINE) model is used to learn behavior information (associations with other nodes) of drugs and proteins in the network. Hence, the known drug-protein interaction pairs can be represented as a combination of attribute information (e.g. protein sequences information and drug molecular fingerprints) and behavior information of themselves. Thirdly, the Random Forest classifier is used for training and prediction. Results: In the results, under the 5-fold cross validation, our method obtained 85.83% prediction accuracy with 80.47% sensitivity at the AUC of 92.33%. Moreover, in the case studies of three common drugs, the top 10 candidate targets have 8 (Caffeine), 7 (Clozapine) and 6 (Pioglitazone) are respectively verified to be associated with corresponding drugs. Conclusions: In short, these results indicate that our method can be a powerful tool for predicting potential drug-protein interactions and finding unknown targets for certain drugs or unknown drugs for certain targets.


2020 ◽  
Author(s):  
Bo-Ya Ji ◽  
Zhu-Hong You ◽  
Han-Jing Jiang ◽  
Zhen-Hao Guo ◽  
Kai Zheng

Abstract Background: The prediction of potential drug-protein target interactions (DTIs) not only provides a better comprehension of biological processes but also is critical for identifying new drugs. However, due to the disadvantages of costly and high time-consuming traditional experiments, only a small section of interactions between drugs and targets in the database is verified experimentally. Therefore, it is meaningful and important to develop new computational methods with good performance for predicting DTIs. At present, many existing computational methods only utilize a single type of molecule without paying attention to the interactions and influences between other types of molecules. Methods: In this work, we developed a novel network embedding-based heterogeneous information integration model to predict potential DTIs. Firstly, a heterogeneous information network is built by combining the known associations among protein, drug, lncRNA, disease, and miRNA. Secondly, the Large-scale Information Network Embedding (LINE) model is used to learn behavior information of nodes in the network. Hence, the known drug-protein interaction pairs can be represented as a combination of attribute information (e.g. protein sequences information and drug molecular fingerprints) and behavior information of themselves. Thirdly, the Random Forest classifier is used for training and predicting. Results: In the results, under the 5-fold cross validation, our method obtained 85.83% prediction accuracy with 80.47% sensitivity at the AUC of 92.33%. Moreover, in the case studies of three common drugs, the top 10 candidate targets have 8 (Caffeine), 7 (Clozapine) and 6 (Pioglitazone) are respectively verified to be associated with corresponding drugs. Conclusions: In short, these results indicate that our method can be a powerful tool for predicting drug-protein interactions and finding unknown targets for certain drugs or unknown drugs for certain targets.


2019 ◽  
Vol 17 (04) ◽  
pp. 1950020
Author(s):  
P. V. Sunil Kumar ◽  
G. Gopakumar

Recent findings from biological experiments demonstrate that long non-coding RNAs (lncRNAs) are actively involved in critical cellular processes and are associated with innumerable diseases. Computational prediction of lncRNA–disease association draws tremendous research attention nowadays. This paper proposes a machine learning model that predicts lncRNA–disease associations using Heterogeneous Information Network (HIN) of lncRNAs and diseases. A Support Vector Machine classifier is developed using the feature set extracted from a meta-path-based parameter, Association Index derived from the HIN. Performance of the model is validated using standard statistical metrics and it generated an AUC value of 0.87, which is better than the existing methods in the literature. Results are further validated using the recent literature and many of the predicted lncRNA–disease associations are identified as actually existing. This paper also proposes an HIN-based methodology to associate lncRNAs with pathways in which they may have biological influence. A case study on the pathway associations of four well-known lncRNAs (HOTAIR, TUG1, NEAT1, and MALAT1) has been conducted. It has been observed that many times the same lncRNA is associated with more than one biologically related pathways. Further exploration is needed to substantiate whether such lncRNAs have any role in determining the pathway interplay. The script and sample data for the model construction is freely available at http://bdbl.nitc.ac.in/LncDisPath/index.html .


2021 ◽  
Vol 25 (3) ◽  
pp. 711-738
Author(s):  
Phu Pham ◽  
Phuc Do

Link prediction on heterogeneous information network (HIN) is considered as a challenge problem due to the complexity and diversity in types of nodes and links. Currently, there are remained challenges of meta-path-based link prediction in HIN. Previous works of link prediction in HIN via network embedding approach are mainly focused on exploiting features of node rather than existing relations in forms of meta-paths between nodes. In fact, predicting the existence of new links between non-linked nodes is absolutely inconvincible. Moreover, recent HIN-based embedding models also lack of thorough evaluations on the topic similarity between text-based nodes along given meta-paths. To tackle these challenges, in this paper, we proposed a novel approach of topic-driven multiple meta-path-based HIN representation learning framework, namely W-MMP2Vec. Our model leverages the quality of node representations by combining multiple meta-paths as well as calculating the topic similarity weight for each meta-path during the processes of network embedding learning in content-based HINs. To validate our approach, we apply W-TMP2Vec model in solving several link prediction tasks in both content-based and non-content-based HINs (DBLP, IMDB and BlogCatalog). The experimental outputs demonstrate the effectiveness of proposed model which outperforms recent state-of-the-art HIN representation learning models.


Sign in / Sign up

Export Citation Format

Share Document