EOESGC: predicting miRNA-disease associations based on embedding of embedding and simplified graph convolutional network

Shanchen Pang; Yu Zhuang; Xinzeng Wang; Fuyu Wang; Sibo Qiao

doi:10.1186/s12911-021-01671-y

EOESGC: predicting miRNA-disease associations based on embedding of embedding and simplified graph convolutional network

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01671-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Shanchen Pang ◽

Yu Zhuang ◽

Xinzeng Wang ◽

Fuyu Wang ◽

Sibo Qiao

Keyword(s):

Information Aggregation ◽

Disease Association ◽

Supplementary Information ◽

Graph Structure ◽

Association Network ◽

Convolutional Network ◽

Edge Information ◽

Biological Studies ◽

Convolution Model ◽

Disease Associations

Abstract Background A large number of biological studies have shown that miRNAs are inextricably linked to many complex diseases. Studying the miRNA-disease associations could provide us a root cause understanding of the underlying pathogenesis in which promotes the progress of drug development. However, traditional biological experiments are very time-consuming and costly. Therefore, we come up with an efficient models to solve this challenge. Results In this work, we propose a deep learning model called EOESGC to predict potential miRNA-disease associations based on embedding of embedding and simplified convolutional network. Firstly, integrated disease similarity, integrated miRNA similarity, and miRNA-disease association network are used to construct a coupled heterogeneous graph, and the edges with low similarity are removed to simplify the graph structure and ensure the effectiveness of edges. Secondly, the Embedding of embedding model (EOE) is used to learn edge information in the coupled heterogeneous graph. The training rule of the model is that the associated nodes are close to each other and the unassociated nodes are far away from each other. Based on this rule, edge information learned is added into node embedding as supplementary information to enrich node information. Then, node embedding of EOE model training as a new feature of miRNA and disease, and information aggregation is performed by simplified graph convolution model, in which each level of convolution can aggregate multi-hop neighbor information. In this step, we only use the miRNA-disease association network to further simplify the graph structure, thus reducing the computational complexity. Finally, feature embeddings of both miRNA and disease are spliced into the MLP for prediction. On the EOESGC evaluation part, the AUC, AUPR, and F1-score of our model are 0.9658, 0.8543 and 0.8644 by 5-fold cross-validation respectively. Compared with the latest published models, our model shows better results. In addition, we predict the top 20 potential miRNAs for breast cancer and lung cancer, most of which are validated in the dbDEMC and HMDD3.2 databases. Conclusion The comprehensive experimental results show that EOESGC can effectively identify the potential miRNA-disease associations.

Download Full-text

EOESGC: Predicting miRNA−disease Associations Based on Embedding of Embedding and Simplified Graph Convolutional Network

10.21203/rs.3.rs-831662/v1 ◽

2021 ◽

Author(s):

Shanchen Pang ◽

yu Zhuang ◽

Xinzeng Wang ◽

Fuyu Wang ◽

Sibo Qiao

Keyword(s):

Positional Information ◽

Disease Association ◽

Feature Representation ◽

Experimental Results ◽

Graph Structure ◽

Convolutional Network ◽

Structure Information ◽

Biological Studies ◽

Disease Associations ◽

Computational Difficulty

Abstract Background: A large number of biological studies have shown that miRNAs are inextricably linked to many complex diseases. Studying the miRNA−disease associations could provide us a root cause understanding on the underlying pathogenesis in which promotes the progress of drug development. However, traditional biological experiments are very time consuming and costly. Therefore, we come up with more efficient models to solve this challenge. Results: In this work, we propose a deep learning model called EOESGC to predict potential miRNA−disease associations based on embedding of embedding and simplified convolutional network. Firstly, a coupled heterogeneous graph is constructed by using the integrated disease similarity, integrated miRNA similarity and miRNA−disease association networks where parts of the connected edges with less similarity values are removed to simplify the graph structure. The initial feature representation of nodes in the graph is learned using the embedding of embedding model(EOE) based on the principle that the nodes with associations are close to each other and the nodes without association are far from each other. The use of EOE can effectively learn the positional information among nodes and protect the graph structure information to some extent. Then the initial features of the nodes are fed into the simplified graph convolutional network(SGC), and in this step we only use miRNA−disease association network to further simplify the graph structure and thus reduce the computational complexity. Finally, feature embeddings of both miRNA and disease spliced into the MLP for prediction. The two graph simplifications of our model effectively reduce the computational difficulty, and the experimental results show that our model can indeed predict the potential miRNA−disease associations effectively. Compared with the latest published models, our model shows better results. On EOESGC evaluation part, the AUC, AUPR and F1 of our model are 0.9658, 0.8543 and 0.8644 by 5−fold cross validation respectively. In addition, we predict the top 20 potential miRNAs for breast cancer and lung cancer, most of which are validated in the dbDEMC and HMDD3.2 databases. Conclusion: The comprehensive experimental results show that EOESGC can effectively identify the potential miRNA−disease associations.

Download Full-text

Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction

Bioinformatics ◽

10.1093/bioinformatics/btz965 ◽

2020 ◽

Vol 36 (8) ◽

pp. 2538-2546 ◽

Cited By ~ 9

Author(s):

Jin Li ◽

Sai Zhang ◽

Tao Liu ◽

Chenxi Ning ◽

Zhuoxuan Zhang ◽

...

Keyword(s):

Characteristic Curve ◽

Matrix Completion ◽

Disease Association ◽

Supplementary Information ◽

Convolutional Network ◽

Convolutional Networks ◽

Feature Representations ◽

Disease Similarity ◽

Disease Associations ◽

Association Data

Abstract Motivation Predicting the association between microRNAs (miRNAs) and diseases plays an import role in identifying human disease-related miRNAs. As identification of miRNA-disease associations via biological experiments is time-consuming and expensive, computational methods are currently used as effective complements to determine the potential associations between disease and miRNA. Results We present a novel method of neural inductive matrix completion with graph convolutional network (NIMCGCN) for predicting miRNA-disease association. NIMCGCN first uses graph convolutional networks to learn miRNA and disease latent feature representations from the miRNA and disease similarity networks. Then, learned features were input into a novel neural inductive matrix completion (NIMC) model to generate an association matrix completion. The parameters of NIMCGCN were learned based on the known miRNA-disease association data in a supervised end-to-end way. We compared the proposed method with other state-of-the-art methods. The area under the receiver operating characteristic curve results showed that our method is significantly superior to existing methods. Furthermore, 50, 47 and 48 of the top 50 predicted miRNAs for three high-risk human diseases, namely, colon cancer, lymphoma and kidney cancer, were verified using experimental literature. Finally, 100% prediction accuracy was achieved when breast cancer was used as a case study to evaluate the ability of NIMCGCN for predicting a new disease without any known related miRNAs. Availability and implementation https://github.com/ljatynu/NIMCGCN/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MVGCN: data integration through multi-view graph convolutional network for predicting links in biomedical bipartite networks

Bioinformatics ◽

10.1093/bioinformatics/btab651 ◽

2021 ◽

Author(s):

Haitao Fu ◽

Feng Huang ◽

Xuan Liu ◽

Yang Qiu ◽

Wen Zhang

Keyword(s):

Molecular Mechanisms ◽

Learning Strategy ◽

Information Aggregation ◽

Supplementary Information ◽

Bipartite Network ◽

Bipartite Networks ◽

Biomolecular Systems ◽

Convolutional Network ◽

Benchmark Datasets ◽

Node Attributes

Abstract Motivation There are various interaction/association bipartite networks in biomolecular systems. Identifying unobserved links in biomedical bipartite networks helps to understand the underlying molecular mechanisms of human complex diseases and thus benefits the diagnosis and treatment of diseases. Although a great number of computational methods have been proposed to predict links in biomedical bipartite networks, most of them heavily depend on features and structures involving the bioentities in one specific bipartite network, which limits the generalization capacity of applying the models to other bipartite networks. Meanwhile, bioentities usually have multiple features, and how to leverage them has also been challenging. Results In this study, we propose a novel multi-view graph convolution network (MVGCN) framework for link prediction in biomedical bipartite networks. We first construct a multi-view heterogeneous network (MVHN) by combining the similarity networks with the biomedical bipartite network, and then perform a self-supervised learning strategy on the bipartite network to obtain node attributes as initial embeddings. Further, a neighborhood information aggregation (NIA) layer is designed for iteratively updating the embeddings of nodes by aggregating information from inter- and intra-domain neighbors in every view of the MVHN. Next, we combine embeddings of multiple NIA layers in each view, and integrate multiple views to obtain the final node embeddings, which are then fed into a discriminator to predict the existence of links. Extensive experiments show MVGCN performs better than or on par with baseline methods and has the generalization capacity on six benchmark datasets involving three typical tasks. Availability and implementation Source code and data can be downloaded from https://github.com/fuhaitao95/MVGCN. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A network similarity integration method for predicting microRNA-disease associations

RSC Advances ◽

10.1039/c7ra05348g ◽

2017 ◽

Vol 7 (51) ◽

pp. 32216-32224 ◽

Cited By ~ 5

Author(s):

Xiaoying Li ◽

Yaping Lin ◽

Changlong Gu

Keyword(s):

Integration Method ◽

Disease Association ◽

Association Network ◽

Similarity Network ◽

Novel Mirna ◽

Disease Similarity ◽

Disease Associations ◽

Network Similarity

The NSIM integrates the disease similarity network, miRNA similarity network, and known miRNA-disease association network on the basis of cousin similarity to predict not only novel miRNA-disease associations but also isolated diseases.

Download Full-text

NMCMDA: neural multicategory MiRNA–disease association prediction

Briefings in Bioinformatics ◽

10.1093/bib/bbab074 ◽

2021 ◽

Author(s):

Jingru Wang ◽

Jin Li ◽

Kun Yue ◽

Li Wang ◽

Yuyun Ma ◽

...

Keyword(s):

Cost Effective ◽

Underlying Mechanism ◽

Disease Association ◽

Disease Category ◽

Convolutional Network ◽

Time Saving ◽

Disease Associations ◽

Multiple Category ◽

Latent Representations ◽

Main Components

Abstract Motivation There is growing evidence showing that the dysregulations of miRNAs cause diseases through various kinds of the underlying mechanism. Thus, predicting the multiple-category associations between microRNAs (miRNAs) and diseases plays an important role in investigating the roles of miRNAs in diseases. Moreover, in contrast with traditional biological experiments which are time-consuming and expensive, computational approaches for the prediction of multicategory miRNA–disease associations are time-saving and cost-effective that are highly desired for us. Results We present a novel data-driven end-to-end learning-based method of neural multiple-category miRNA–disease association prediction (NMCMDA) for predicting multiple-category miRNA–disease associations. The NMCMDA has two main components: (i) encoder operates directly on the miRNA–disease heterogeneous network and leverages Graph Neural Network to learn miRNA and disease latent representations, respectively. (ii) Decoder yields miRNA–disease association scores with the learned latent representations as input. Various kinds of encoders and decoders are proposed for NMCMDA. Finally, the NMCMDA with the encoder of Relational Graph Convolutional Network and the neural multirelational decoder (NMR-RGCN) achieves the best prediction performance. We compared the NMCMDA with other baselines on three experimental datasets. The experimental results show that the NMR-RGCN is significantly superior to the state-of-the-art method TDRC in terms of Top-1 precision, Top-1 Recall, and Top-1 F1. Additionally, case studies are provided for two high-risk human diseases (namely, breast cancer and lung cancer) and we also provide the prediction and validation of top-10 miRNA–disease-category associations based on all known data of HMDD v3.2, which further validate the effectiveness and feasibility of the proposed method.

Download Full-text

DRIMC: an improved drug repositioning approach using Bayesian inductive matrix completion

Bioinformatics ◽

10.1093/bioinformatics/btaa062 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2839-2847 ◽

Cited By ~ 1

Author(s):

Wenjuan Zhang ◽

Hunan Xu ◽

Xiaozhong Li ◽

Qiang Gao ◽

Lin Wang

Keyword(s):

Drug Repositioning ◽

Matrix Completion ◽

Disease Association ◽

Data Sources ◽

Supplementary Information ◽

Similarity Matrix ◽

Latent Factors ◽

Discovery Research ◽

Disease Associations ◽

Novel Drug

Abstract Motivation One of the most important problems in drug discovery research is to precisely predict a new indication for an existing drug, i.e. drug repositioning. Recent recommendation system-based methods have tackled this problem using matrix completion models. The models identify latent factors contributing to known drug-disease associations, and then infer novel drug-disease associations by the correlations between latent factors. However, these models have not fully considered the various drug data sources and the sparsity of the drug-disease association matrix. In addition, using the global structure of the drug-disease association data may introduce noise, and consequently limit the prediction power. Results In this work, we propose a novel drug repositioning approach by using Bayesian inductive matrix completion (DRIMC). First, we embed four drug data sources into a drug similarity matrix and two disease data sources in a disease similarity matrix. Then, for each drug or disease, its feature is described by similarity values between it and its nearest neighbors, and these features for drugs and diseases are mapped onto a shared latent space. We model the association probability for each drug-disease pair by inductive matrix completion, where the properties of drugs and diseases are represented by projections of drugs and diseases, respectively. As the known drug-disease associations have been manually verified, they are more trustworthy and important than the unknown pairs. We assign higher confidence levels to known association pairs compared with unknown pairs. We perform comprehensive experiments on three benchmark datasets, and DRIMC improves prediction accuracy compared with six stat-of-the-art approaches. Availability and implementation Source code and datasets are available at https://github.com/linwang1982/DRIMC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Drug repositioning based on bounded nuclear norm regularization

Bioinformatics ◽

10.1093/bioinformatics/btz331 ◽

2019 ◽

Vol 35 (14) ◽

pp. i455-i463 ◽

Cited By ~ 18

Author(s):

Mengyun Yang ◽

Huimin Luo ◽

Yaohang Li ◽

Jianxin Wang

Keyword(s):

Recommendation System ◽

Drug Repositioning ◽

Matrix Completion ◽

Approximation Error ◽

Disease Association ◽

Nuclear Norm ◽

Low Rank ◽

Supplementary Information ◽

Disease Associations ◽

Nuclear Norm Regularization

Abstract Motivation Computational drug repositioning is a cost-effective strategy to identify novel indications for existing drugs. Drug repositioning is often modeled as a recommendation system problem. Taking advantage of the known drug–disease associations, the objective of the recommendation system is to identify new treatments by filling out the unknown entries in the drug–disease association matrix, which is known as matrix completion. Underpinned by the fact that common molecular pathways contribute to many different diseases, the recommendation system assumes that the underlying latent factors determining drug–disease associations are highly correlated. In other words, the drug–disease matrix to be completed is low-rank. Accordingly, matrix completion algorithms efficiently constructing low-rank drug–disease matrix approximations consistent with known associations can be of immense help in discovering the novel drug–disease associations. Results In this article, we propose to use a bounded nuclear norm regularization (BNNR) method to complete the drug–disease matrix under the low-rank assumption. Instead of strictly fitting the known elements, BNNR is designed to tolerate the noisy drug–drug and disease–disease similarities by incorporating a regularization term to balance the approximation error and the rank properties. Moreover, additional constraints are incorporated into BNNR to ensure that all predicted matrix entry values are within the specific interval. BNNR is carried out on an adjacency matrix of a heterogeneous drug–disease network, which integrates the drug–drug, drug–disease and disease–disease networks. It not only makes full use of available drugs, diseases and their association information, but also is capable of dealing with cold start naturally. Our computational results show that BNNR yields higher drug–disease association prediction accuracy than the current state-of-the-art methods. The most significant gain is in prediction precision measured as the fraction of the positive predictions that are truly positive, which is particularly useful in drug design practice. Cases studies also confirm the accuracy and reliability of BNNR. Availability and implementation The code of BNNR is freely available at https://github.com/BioinformaticsCSU/BNNR. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

EHAI: Enhanced Human Microbe-Disease Association Identification

Current Protein and Peptide Science ◽

10.2174/1389203721666200702150249 ◽

2020 ◽

Vol 21 (11) ◽

pp. 1078-1084

Author(s):

Ruizhi Fan ◽

Chenhua Dong ◽

Hu Song ◽

Yixin Xu ◽

Linsen Shi ◽

...

Keyword(s):

Microbial Community ◽

Human Health ◽

Complex Diseases ◽

Disease Association ◽

Computational Results ◽

Computational Approaches ◽

Disease Diagnostics ◽

Disease Associations ◽

Association Discovery ◽

Biological Tool

: Recently, an increasing number of biological and clinical reports have demonstrated that imbalance of microbial community has the ability to play important roles among several complex diseases concerning human health. Having a good knowledge of discovering potential of microbe-disease relationships, which provides the ability to having a better understanding of some issues, including disease pathology, further boosts disease diagnostics and prognostics, has been taken into account. Nevertheless, a few computational approaches can meet the need of huge scale of microbe-disease association discovery. In this work, we proposed the EHAI model, which is Enhanced Human microbe- disease Association Identification. EHAI employed the microbe-disease associations, and then Gaussian interaction profile kernel similarity has been utilized to enhance the basic microbe-disease association. Actually, some known microbe-disease associations and a large amount of associations are still unavailable among the datasets. The ‘super-microbe’ and ‘super-disease’ were employed to enhance the model. Computational results demonstrated that such super-classes have the ability to be helpful to the performance of EHAI. Therefore, it is anticipated that EHAI can be treated as an important biological tool in this field.

Download Full-text

Knowledge and Geo-Object Based Graph Convolutional Network for Remote Sensing Semantic Segmentation

Sensors ◽

10.3390/s21113848 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3848

Author(s):

Wei Cui ◽

Meng Yao ◽

Yuanjie Hao ◽

Ziwei Wang ◽

Xin He ◽

...

Keyword(s):

Remote Sensing ◽

Prior Knowledge ◽

Contextual Information ◽

Information Aggregation ◽

Semantic Segmentation ◽

Spatial Correlations ◽

Convolutional Network ◽

Object Based ◽

Graph Neural Networks ◽

Salt And Pepper

Pixel-based semantic segmentation models fail to effectively express geographic objects and their topological relationships. Therefore, in semantic segmentation of remote sensing images, these models fail to avoid salt-and-pepper effects and cannot achieve high accuracy either. To solve these problems, object-based models such as graph neural networks (GNNs) are considered. However, traditional GNNs directly use similarity or spatial correlations between nodes to aggregate nodes’ information, which rely too much on the contextual information of the sample. The contextual information of the sample is often distorted, which results in a reduction in the node classification accuracy. To solve this problem, a knowledge and geo-object-based graph convolutional network (KGGCN) is proposed. The KGGCN uses superpixel blocks as nodes of the graph network and combines prior knowledge with spatial correlations during information aggregation. By incorporating the prior knowledge obtained from all samples of the study area, the receptive field of the node is extended from its sample context to the study area. Thus, the distortion of the sample context is overcome effectively. Experiments demonstrate that our model is improved by 3.7% compared with the baseline model named Cluster GCN and 4.1% compared with U-Net.

Download Full-text

Genome-wide inferring gene–phenotype relationship by walking on the heterogeneous network

Bioinformatics ◽

10.1093/bioinformatics/btq108 ◽

2010 ◽

Vol 26 (9) ◽

pp. 1219-1224 ◽

Cited By ~ 238

Author(s):

Yongjin Li ◽

Jagdish C. Patra

Keyword(s):

Heterogeneous Network ◽

Gene Network ◽

Genetic Diseases ◽

Supplementary Information ◽

Disease Genes ◽

Phenotypic Data ◽

Disease Associations ◽

Improved Performance ◽

Leave One Out ◽

Phenotype Network

Abstract Motivation: Clinical diseases are characterized by distinct phenotypes. To identify disease genes is to elucidate the gene–phenotype relationships. Mutations in functionally related genes may result in similar phenotypes. It is reasonable to predict disease-causing genes by integrating phenotypic data and genomic data. Some genetic diseases are genetically or phenotypically similar. They may share the common pathogenetic mechanisms. Identifying the relationship between diseases will facilitate better understanding of the pathogenetic mechanism of diseases. Results: In this article, we constructed a heterogeneous network by connecting the gene network and phenotype network using the phenotype–gene relationship information from the OMIM database. We extended the random walk with restart algorithm to the heterogeneous network. The algorithm prioritizes the genes and phenotypes simultaneously. We use leave-one-out cross-validation to evaluate the ability of finding the gene–phenotype relationship. Results showed improved performance than previous works. We also used the algorithm to disclose hidden disease associations that cannot be found by gene network or phenotype network alone. We identified 18 hidden disease associations, most of which were supported by literature evidence. Availability: The MATLAB code of the program is available at http://www3.ntu.edu.sg/home/aspatra/research/Yongjin_BI2010.zip Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text