Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction

2020 ◽  
Vol 36 (8) ◽  
pp. 2538-2546 ◽  
Author(s):  
Jin Li ◽  
Sai Zhang ◽  
Tao Liu ◽  
Chenxi Ning ◽  
Zhuoxuan Zhang ◽  
...  

Abstract Motivation Predicting the association between microRNAs (miRNAs) and diseases plays an import role in identifying human disease-related miRNAs. As identification of miRNA-disease associations via biological experiments is time-consuming and expensive, computational methods are currently used as effective complements to determine the potential associations between disease and miRNA. Results We present a novel method of neural inductive matrix completion with graph convolutional network (NIMCGCN) for predicting miRNA-disease association. NIMCGCN first uses graph convolutional networks to learn miRNA and disease latent feature representations from the miRNA and disease similarity networks. Then, learned features were input into a novel neural inductive matrix completion (NIMC) model to generate an association matrix completion. The parameters of NIMCGCN were learned based on the known miRNA-disease association data in a supervised end-to-end way. We compared the proposed method with other state-of-the-art methods. The area under the receiver operating characteristic curve results showed that our method is significantly superior to existing methods. Furthermore, 50, 47 and 48 of the top 50 predicted miRNAs for three high-risk human diseases, namely, colon cancer, lymphoma and kidney cancer, were verified using experimental literature. Finally, 100% prediction accuracy was achieved when breast cancer was used as a case study to evaluate the ability of NIMCGCN for predicting a new disease without any known related miRNAs. Availability and implementation https://github.com/ljatynu/NIMCGCN/ Supplementary information Supplementary data are available at Bioinformatics online.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Shanchen Pang ◽  
Yu Zhuang ◽  
Xinzeng Wang ◽  
Fuyu Wang ◽  
Sibo Qiao

Abstract Background A large number of biological studies have shown that miRNAs are inextricably linked to many complex diseases. Studying the miRNA-disease associations could provide us a root cause understanding of the underlying pathogenesis in which promotes the progress of drug development. However, traditional biological experiments are very time-consuming and costly. Therefore, we come up with an efficient models to solve this challenge. Results In this work, we propose a deep learning model called EOESGC to predict potential miRNA-disease associations based on embedding of embedding and simplified convolutional network. Firstly, integrated disease similarity, integrated miRNA similarity, and miRNA-disease association network are used to construct a coupled heterogeneous graph, and the edges with low similarity are removed to simplify the graph structure and ensure the effectiveness of edges. Secondly, the Embedding of embedding model (EOE) is used to learn edge information in the coupled heterogeneous graph. The training rule of the model is that the associated nodes are close to each other and the unassociated nodes are far away from each other. Based on this rule, edge information learned is added into node embedding as supplementary information to enrich node information. Then, node embedding of EOE model training as a new feature of miRNA and disease, and information aggregation is performed by simplified graph convolution model, in which each level of convolution can aggregate multi-hop neighbor information. In this step, we only use the miRNA-disease association network to further simplify the graph structure, thus reducing the computational complexity. Finally, feature embeddings of both miRNA and disease are spliced into the MLP for prediction. On the EOESGC evaluation part, the AUC, AUPR, and F1-score of our model are 0.9658, 0.8543 and 0.8644 by 5-fold cross-validation respectively. Compared with the latest published models, our model shows better results. In addition, we predict the top 20 potential miRNAs for breast cancer and lung cancer, most of which are validated in the dbDEMC and HMDD3.2 databases. Conclusion The comprehensive experimental results show that EOESGC can effectively identify the potential miRNA-disease associations.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yu-Tian Wang ◽  
Lei Li ◽  
Cun-Mei Ji ◽  
Chun-Hou Zheng ◽  
Jian-Cheng Ni

MicroRNAs (miRNAs) are small non-coding RNAs that have been demonstrated to be related to numerous complex human diseases. Considerable studies have suggested that miRNAs affect many complicated bioprocesses. Hence, the investigation of disease-related miRNAs by utilizing computational methods is warranted. In this study, we presented an improved label propagation for miRNA–disease association prediction (ILPMDA) method to observe disease-related miRNAs. First, we utilized similarity kernel fusion to integrate different types of biological information for generating miRNA and disease similarity networks. Second, we applied the weighted k-nearest known neighbor algorithm to update verified miRNA–disease association data. Third, we utilized improved label propagation in disease and miRNA similarity networks to make association prediction. Furthermore, we obtained final prediction scores by adopting an average ensemble method to integrate the two kinds of prediction results. To evaluate the prediction performance of ILPMDA, two types of cross-validation methods and case studies on three significant human diseases were implemented to determine the accuracy and effectiveness of ILPMDA. All results demonstrated that ILPMDA had the ability to discover potential miRNA–disease associations.


2020 ◽  
Vol 20 (6) ◽  
pp. 452-460
Author(s):  
Lin Tang ◽  
Yu Liang ◽  
Xin Jin ◽  
Lin Liu ◽  
Wei Zhou

Background: Accumulating experimental studies demonstrated that long non-coding RNAs (LncRNAs) play crucial roles in the occurrence and development progress of various complex human diseases. Nonetheless, only a small portion of LncRNA–disease associations have been experimentally verified at present. Automatically predicting LncRNA–disease associations based on computational models can save the huge cost of wet-lab experiments. Methods and Result: To develop effective computational models to integrate various heterogeneous biological data for the identification of potential disease-LncRNA, we propose a hierarchical extension based on the Boolean matrix for LncRNA-disease association prediction model (HEBLDA). HEBLDA discovers the intrinsic hierarchical correlation based on the property of the Boolean matrix from various relational sources. Then, HEBLDA integrates these hierarchical associated matrices by fusion weights. Finally, HEBLDA uses the hierarchical associated matrix to reconstruct the LncRNA– disease association matrix by hierarchical extending. HEBLDA is able to work for potential diseases or LncRNA without known association data. In 5-fold cross-validation experiments, HEBLDA obtained an area under the receiver operating characteristic curve (AUC) of 0.8913, improving previous classical methods. Besides, case studies show that HEBLDA can accurately predict candidate disease for several LncRNAs. Conclusion: Based on its ability to discover the more-richer correlated structure of various data sources, we can anticipate that HEBLDA is a potential method that can obtain more comprehensive association prediction in a broad field.


2020 ◽  
Vol 36 (9) ◽  
pp. 2839-2847 ◽  
Author(s):  
Wenjuan Zhang ◽  
Hunan Xu ◽  
Xiaozhong Li ◽  
Qiang Gao ◽  
Lin Wang

Abstract Motivation One of the most important problems in drug discovery research is to precisely predict a new indication for an existing drug, i.e. drug repositioning. Recent recommendation system-based methods have tackled this problem using matrix completion models. The models identify latent factors contributing to known drug-disease associations, and then infer novel drug-disease associations by the correlations between latent factors. However, these models have not fully considered the various drug data sources and the sparsity of the drug-disease association matrix. In addition, using the global structure of the drug-disease association data may introduce noise, and consequently limit the prediction power. Results In this work, we propose a novel drug repositioning approach by using Bayesian inductive matrix completion (DRIMC). First, we embed four drug data sources into a drug similarity matrix and two disease data sources in a disease similarity matrix. Then, for each drug or disease, its feature is described by similarity values between it and its nearest neighbors, and these features for drugs and diseases are mapped onto a shared latent space. We model the association probability for each drug-disease pair by inductive matrix completion, where the properties of drugs and diseases are represented by projections of drugs and diseases, respectively. As the known drug-disease associations have been manually verified, they are more trustworthy and important than the unknown pairs. We assign higher confidence levels to known association pairs compared with unknown pairs. We perform comprehensive experiments on three benchmark datasets, and DRIMC improves prediction accuracy compared with six stat-of-the-art approaches. Availability and implementation Source code and datasets are available at https://github.com/linwang1982/DRIMC. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (14) ◽  
pp. i455-i463 ◽  
Author(s):  
Mengyun Yang ◽  
Huimin Luo ◽  
Yaohang Li ◽  
Jianxin Wang

Abstract Motivation Computational drug repositioning is a cost-effective strategy to identify novel indications for existing drugs. Drug repositioning is often modeled as a recommendation system problem. Taking advantage of the known drug–disease associations, the objective of the recommendation system is to identify new treatments by filling out the unknown entries in the drug–disease association matrix, which is known as matrix completion. Underpinned by the fact that common molecular pathways contribute to many different diseases, the recommendation system assumes that the underlying latent factors determining drug–disease associations are highly correlated. In other words, the drug–disease matrix to be completed is low-rank. Accordingly, matrix completion algorithms efficiently constructing low-rank drug–disease matrix approximations consistent with known associations can be of immense help in discovering the novel drug–disease associations. Results In this article, we propose to use a bounded nuclear norm regularization (BNNR) method to complete the drug–disease matrix under the low-rank assumption. Instead of strictly fitting the known elements, BNNR is designed to tolerate the noisy drug–drug and disease–disease similarities by incorporating a regularization term to balance the approximation error and the rank properties. Moreover, additional constraints are incorporated into BNNR to ensure that all predicted matrix entry values are within the specific interval. BNNR is carried out on an adjacency matrix of a heterogeneous drug–disease network, which integrates the drug–drug, drug–disease and disease–disease networks. It not only makes full use of available drugs, diseases and their association information, but also is capable of dealing with cold start naturally. Our computational results show that BNNR yields higher drug–disease association prediction accuracy than the current state-of-the-art methods. The most significant gain is in prediction precision measured as the fraction of the positive predictions that are truly positive, which is particularly useful in drug design practice. Cases studies also confirm the accuracy and reliability of BNNR. Availability and implementation The code of BNNR is freely available at https://github.com/BioinformaticsCSU/BNNR. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 17 (1) ◽  
Author(s):  
Guobo Xie ◽  
Zhiliang Fan ◽  
Yuping Sun ◽  
Cuiming Wu ◽  
Lei Ma

Abstract Background Recently, numerous biological experiments have indicated that microRNAs (miRNAs) play critical roles in exploring the pathogenesis of various human diseases. Since traditional experimental methods for miRNA-disease associations detection are costly and time-consuming, it becomes urgent to design efficient and robust computational techniques for identifying undiscovered interactions. Methods In this paper, we proposed a computation framework named weighted bipartite network projection for miRNA-disease association prediction (WBNPMD). In this method, transfer weights were constructed by combining the known miRNA and disease similarities, and the initial information was properly configured. Then the two-step bipartite network algorithm was implemented to infer potential miRNA-disease associations. Results The proposed WBNPMD was applied to the known miRNA-disease association data, and leave-one-out cross-validation (LOOCV) and fivefold cross-validation were implemented to evaluate the performance of WBNPMD. As a result, our method achieved the AUCs of 0.9321 and $$0.9173 \pm 0.0005$$ 0.9173 ± 0.0005 in LOOCV and fivefold cross-validation, and outperformed other four state-of-the-art methods. We also carried out two kinds of case studies on prostate neoplasm, colorectal neoplasm, and lung neoplasm, and most of the top 50 predicted miRNAs were confirmed to have an association with the corresponding diseases based on dbDeMC, miR2Disease, and HMDD V3.0 databases. Conclusions The experimental results demonstrate that WBNPMD can accurately infer potential miRNA-disease associations. We anticipated that the proposed WBNPMD could serve as a powerful tool for potential miRNA-disease associations excavation.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jianlin Wang ◽  
Wenxiu Wang ◽  
Chaokun Yan ◽  
Junwei Luo ◽  
Ge Zhang

Drug repositioning is used to find new uses for existing drugs, effectively shortening the drug research and development cycle and reducing costs and risks. A new model of drug repositioning based on ensemble learning is proposed. This work develops a novel computational drug repositioning approach called CMAF to discover potential drug-disease associations. First, for new drugs and diseases or unknown drug-disease pairs, based on their known neighbor information, an association probability can be obtained by implementing the weighted K nearest known neighbors (WKNKN) method and improving the drug-disease association information. Then, a new drug similarity network and new disease similarity network can be constructed. Three prediction models are applied and ensembled to enable the final association of drug-disease pairs based on improved drug-disease association information and the constructed similarity network. The experimental results demonstrate that the developed approach outperforms recent state-of-the-art prediction models. Case studies further confirm the predictive ability of the proposed method. Our proposed method can effectively improve the prediction results.


2019 ◽  
Author(s):  
Xiaoyong Pan ◽  
Hong-Bin Shen

AbstractMicroRNAs (miRNAs) play crucial roles in many biological processes involved in diseases. The associations between diseases and protein coding genes (PCGs) have been well investigated, and further the miRNAs interact with PCGs to trigger them to be functional. Thus, it is imperative to computationally infer disease-miRNA associations under the context of interaction networks.In this study, we present a computational method, DimiG, to infer miRNA-associated diseases using semi-supervised Graph Convolutional Network model (GCN). DimiG is a multi-label framework to integrate PCG-PCG interactions, PCG-miRNA interactions, PCG-disease associations and tissue expression profiles. DimiG is trained on disease-PCG associations and a graph constructed from interaction networks of PCG-PCG and miRNA-PCG using semi-supervised GCN, which is further used to score associations between diseases and miRNAs. We evaluate DimiG on a benchmark set collected from verified disease-miRNA associations. Our results demonstrate that the new DimiG yields promising performance and outperforms the best published baseline method not trained on disease-miRNA associations by 11% and is also superior to two state-of-the-art supervised methods trained on disease-miRNA associations. Three case studies of prostate cancer, lung cancer and Inflammatory bowel disease further demonstrate the efficacy of DimiG, where the top miRNAs predicted by DimiG for them are supported by literature or databases.


2021 ◽  
Author(s):  
Shanchen Pang ◽  
yu Zhuang ◽  
Xinzeng Wang ◽  
Fuyu Wang ◽  
Sibo Qiao

Abstract Background: A large number of biological studies have shown that miRNAs are inextricably linked to many complex diseases. Studying the miRNA−disease associations could provide us a root cause understanding on the underlying pathogenesis in which promotes the progress of drug development. However, traditional biological experiments are very time consuming and costly. Therefore, we come up with more efficient models to solve this challenge. Results: In this work, we propose a deep learning model called EOESGC to predict potential miRNA−disease associations based on embedding of embedding and simplified convolutional network. Firstly, a coupled heterogeneous graph is constructed by using the integrated disease similarity, integrated miRNA similarity and miRNA−disease association networks where parts of the connected edges with less similarity values are removed to simplify the graph structure. The initial feature representation of nodes in the graph is learned using the embedding of embedding model(EOE) based on the principle that the nodes with associations are close to each other and the nodes without association are far from each other. The use of EOE can effectively learn the positional information among nodes and protect the graph structure information to some extent. Then the initial features of the nodes are fed into the simplified graph convolutional network(SGC), and in this step we only use miRNA−disease association network to further simplify the graph structure and thus reduce the computational complexity. Finally, feature embeddings of both miRNA and disease spliced into the MLP for prediction. The two graph simplifications of our model effectively reduce the computational difficulty, and the experimental results show that our model can indeed predict the potential miRNA−disease associations effectively. Compared with the latest published models, our model shows better results. On EOESGC evaluation part, the AUC, AUPR and F1 of our model are 0.9658, 0.8543 and 0.8644 by 5−fold cross validation respectively. In addition, we predict the top 20 potential miRNAs for breast cancer and lung cancer, most of which are validated in the dbDEMC and HMDD3.2 databases. Conclusion: The comprehensive experimental results show that EOESGC can effectively identify the potential miRNA−disease associations.


2021 ◽  
Vol 21 ◽  
Author(s):  
Biao Du ◽  
Lin Tang ◽  
Lin Liu ◽  
Wei Zhou

Background: Increasing research reveals that long non-coding RNAs (lncRNAs) play an important role in various biological processes of human diseases. Nonetheless, only a handful of lncRNA-disease associations have been experimentally verified. The study of lncRNA-disease association prediction based on the computational model has provided a preliminary basis for biological experiments to a great degree so as to cut down the huge cost of wet lab experiments. Objective: This study aims to learn the real distribution of lncRNA-disease association from a limited number of known lncRNA-disease association data. This paper proposes a new lncRNA-disease association prediction model called LDA-GAN based on a generative adversarial network (GAN). Method: Aiming at the problems of slow convergence rate, training instabilities, and unavailability of discrete data in traditional GAN, LDA-GAN utilizes the Gumbel-softmax technology to construct a differentiable process for simulating discrete sampling. Meanwhile, the generator and the discriminator of LDA-GAN are integrated to establish the overall optimization goal based on the pairwise loss function. Results: Experiments on standard datasets demonstrate that LDA-GAN achieves not only high stability and high efficiency in the process of confrontation learning but also gives full play to the semi-supervised learning advantage of generative adversarial learning framework for unlabeled data, which further improves the prediction accuracy of lncRNA-disease association. Besides, case studies show that LDA-GAN can accurately generate potential diseases for several lncRNAs.


Sign in / Sign up

Export Citation Format

Share Document