LncRNA-disease association prediction based on neighborhood information aggregation in neural network

Author(s):  
Hongjie Chen ◽  
Xun Wang ◽  
Xuan Zhang ◽  
Xiangxiang Zeng ◽  
Tao Song ◽  
...  
2021 ◽  
Author(s):  
Lin Yuan ◽  
Jing Zhao ◽  
Tao Sun ◽  
Zhen Shen

Abstract Background: LncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and complex diseases. The unprecedented enrichment of multi-omics data and the rapid development of machine learning technology provide us with the opportunity to design a machine learning framework to study the relationship between lncRNAs and complex diseases. Results: In this article, we proposed a new machine learning approach, namely LGDLDA (LncRNA-Gene-Disease association networks based LncRNA-Disease Association prediction), for disease-related lncRNAs association prediction based multi-omics data, machine learning methods and neural network neighborhood information aggregation. Firstly, LGDLDA calculates the similarity matrix of lncRNA, gene and disease respectively. LGDLDA calculates the similarity between lncRNAs through the lncRNA expression profile matrix, lncRNA-miRNA interaction matrix and lncRNA-protein interaction matrix. LGDLDA obtains gene similarity matrix by calculating the lncRNA-gene association matrix and the gene-disease association matrix. LGDLDA obtains disease similarity matrix by calculating the disease ontology, the disease-miRNA association matrix, and Gaussian interaction profile kernel similarity. Secondly, LGDLDA integrates the neighborhood information in similarity matrices by using nonlinear feature learning of neural network. Thirdly, LGDLDA uses embedded node representations to approximate the observed matrices. Finally, LGDLDA ranks candidate lncRNA-disease pairs and then selects potential disease-related lncRNAs. Conclusions: Compared with lncRNA-disease prediction methods, IHI-BMLLR takes into account more critical information and obtains the performance improvement cancer-related lncRNA predictions. Randomly split data experiment results show that the stability of LGDLDA is better than IDHI-MIRW, NCPLDA, LncDisAP and NCPHLDA. The results on different simulation data sets show that LGDLDA can accurately and effectively predict the disease-related lncRNAs. Furthermore, we applied LGDLDA to three real cancer data including gastric cancer, colorectal cancer and breast cancer to predict potential cancer-related lncRNAs.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 50581-50587 ◽  
Author(s):  
Yingdong Wang ◽  
Gaoshan Deng ◽  
Nianyin Zeng ◽  
Xiao Song ◽  
Yuanying Zhuang

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Lin Yuan ◽  
Jing Zhao ◽  
Tao Sun ◽  
Zhen Shen

Abstract Background LncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and complex diseases. The unprecedented enrichment of multi-omics data and the rapid development of machine learning technology provide us with the opportunity to design a machine learning framework to study the relationship between lncRNAs and complex diseases. Results In this article, we proposed a new machine learning approach, namely LGDLDA (LncRNA-Gene-Disease association networks based LncRNA-Disease Association prediction), for disease-related lncRNAs association prediction based multi-omics data, machine learning methods and neural network neighborhood information aggregation. Firstly, LGDLDA calculates the similarity matrix of lncRNA, gene and disease respectively, and it calculates the similarity between lncRNAs through the lncRNA expression profile matrix, lncRNA-miRNA interaction matrix and lncRNA-protein interaction matrix. We obtain gene similarity matrix by calculating the lncRNA-gene association matrix and the gene-disease association matrix, and we obtain disease similarity matrix by calculating the disease ontology, the disease-miRNA association matrix, and Gaussian interaction profile kernel similarity. Secondly, LGDLDA integrates the neighborhood information in similarity matrices by using nonlinear feature learning of neural network. Thirdly, LGDLDA uses embedded node representations to approximate the observed matrices. Finally, LGDLDA ranks candidate lncRNA-disease pairs and then selects potential disease-related lncRNAs. Conclusions Compared with lncRNA-disease prediction methods, our proposed method takes into account more critical information and obtains the performance improvement cancer-related lncRNA predictions. Randomly split data experiment results show that the stability of LGDLDA is better than IDHI-MIRW, NCPLDA, LncDisAP and NCPHLDA. The results on different simulation data sets show that LGDLDA can accurately and effectively predict the disease-related lncRNAs. Furthermore, we applied the method to three real cancer data including gastric cancer, colorectal cancer and breast cancer to predict potential cancer-related lncRNAs.


2020 ◽  
Vol 34 (05) ◽  
pp. 9612-9619
Author(s):  
Zhao Zhang ◽  
Fuzhen Zhuang ◽  
Hengshu Zhu ◽  
Zhiping Shi ◽  
Hui Xiong ◽  
...  

The rapid proliferation of knowledge graphs (KGs) has changed the paradigm for various AI-related applications. Despite their large sizes, modern KGs are far from complete and comprehensive. This has motivated the research in knowledge graph completion (KGC), which aims to infer missing values in incomplete knowledge triples. However, most existing KGC models treat the triples in KGs independently without leveraging the inherent and valuable information from the local neighborhood surrounding an entity. To this end, we propose a Relational Graph neural network with Hierarchical ATtention (RGHAT) for the KGC task. The proposed model is equipped with a two-level attention mechanism: (i) the first level is the relation-level attention, which is inspired by the intuition that different relations have different weights for indicating an entity; (ii) the second level is the entity-level attention, which enables our model to highlight the importance of different neighboring entities under the same relation. The hierarchical attention mechanism makes our model more effective to utilize the neighborhood information of an entity. Finally, we extensively validate the superiority of RGHAT against various state-of-the-art baselines.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Shanchen Pang ◽  
Yu Zhuang ◽  
Xinzeng Wang ◽  
Fuyu Wang ◽  
Sibo Qiao

Abstract Background A large number of biological studies have shown that miRNAs are inextricably linked to many complex diseases. Studying the miRNA-disease associations could provide us a root cause understanding of the underlying pathogenesis in which promotes the progress of drug development. However, traditional biological experiments are very time-consuming and costly. Therefore, we come up with an efficient models to solve this challenge. Results In this work, we propose a deep learning model called EOESGC to predict potential miRNA-disease associations based on embedding of embedding and simplified convolutional network. Firstly, integrated disease similarity, integrated miRNA similarity, and miRNA-disease association network are used to construct a coupled heterogeneous graph, and the edges with low similarity are removed to simplify the graph structure and ensure the effectiveness of edges. Secondly, the Embedding of embedding model (EOE) is used to learn edge information in the coupled heterogeneous graph. The training rule of the model is that the associated nodes are close to each other and the unassociated nodes are far away from each other. Based on this rule, edge information learned is added into node embedding as supplementary information to enrich node information. Then, node embedding of EOE model training as a new feature of miRNA and disease, and information aggregation is performed by simplified graph convolution model, in which each level of convolution can aggregate multi-hop neighbor information. In this step, we only use the miRNA-disease association network to further simplify the graph structure, thus reducing the computational complexity. Finally, feature embeddings of both miRNA and disease are spliced into the MLP for prediction. On the EOESGC evaluation part, the AUC, AUPR, and F1-score of our model are 0.9658, 0.8543 and 0.8644 by 5-fold cross-validation respectively. Compared with the latest published models, our model shows better results. In addition, we predict the top 20 potential miRNAs for breast cancer and lung cancer, most of which are validated in the dbDEMC and HMDD3.2 databases. Conclusion The comprehensive experimental results show that EOESGC can effectively identify the potential miRNA-disease associations.


Sign in / Sign up

Export Citation Format

Share Document