scholarly journals gGATLDA: lncRNA-disease association prediction based on graph-level graph attention network

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Li Wang ◽  
Cheng Zhong

Abstract Background Long non-coding RNAs (lncRNAs) are related to human diseases by regulating gene expression. Identifying lncRNA-disease associations (LDAs) will contribute to diagnose, treatment, and prognosis of diseases. However, the identification of LDAs by the biological experiments is time-consuming, costly and inefficient. Therefore, the development of efficient and high-accuracy computational methods for predicting LDAs is of great significance. Results In this paper, we propose a novel computational method (gGATLDA) to predict LDAs based on graph-level graph attention network. Firstly, we extract the enclosing subgraphs of each lncRNA-disease pair. Secondly, we construct the feature vectors by integrating lncRNA similarity and disease similarity as node attributes in subgraphs. Finally, we train a graph neural network (GNN) model by feeding the subgraphs and feature vectors to it, and use the trained GNN model to predict lncRNA-disease potential association scores. The experimental results show that our method can achieve higher area under the receiver operation characteristic curve (AUC), area under the precision recall curve (AUPR), accuracy and F1-Score than the state-of-the-art methods in five fold cross-validation. Case studies show that our method can effectively identify lncRNAs associated with breast cancer, gastric cancer, prostate cancer, and renal cancer. Conclusion The experimental results indicate that our method is a useful approach for predicting potential LDAs.

2020 ◽  
Vol 21 (S13) ◽  
Author(s):  
Renyi Zhou ◽  
Zhangli Lu ◽  
Huimin Luo ◽  
Ju Xiang ◽  
Min Zeng ◽  
...  

Abstract Background Drug discovery is known for the large amount of money and time it consumes and the high risk it takes. Drug repositioning has, therefore, become a popular approach to save time and cost by finding novel indications for approved drugs. In order to distinguish these novel indications accurately in a great many of latent associations between drugs and diseases, it is necessary to exploit abundant heterogeneous information about drugs and diseases. Results In this article, we propose a meta-path-based computational method called NEDD to predict novel associations between drugs and diseases using heterogeneous information. First, we construct a heterogeneous network as an undirected graph by integrating drug-drug similarity, disease-disease similarity, and known drug-disease associations. NEDD uses meta paths of different lengths to explicitly capture the indirect relationships, or high order proximity, within drugs and diseases, by which the low dimensional representation vectors of drugs and diseases are obtained. NEDD then uses a random forest classifier to predict novel associations between drugs and diseases. Conclusions The experiments on a gold standard dataset which contains 1933 validated drug–disease associations show that NEDD produces superior prediction results compared with the state-of-the-art approaches.


2018 ◽  
Vol 19 (11) ◽  
pp. 3410 ◽  
Author(s):  
Xiujuan Lei ◽  
Zengqiang Fang ◽  
Luonan Chen ◽  
Fang-Xiang Wu

CircRNAs have particular biological structure and have proven to play important roles in diseases. It is time-consuming and costly to identify circRNA-disease associations by biological experiments. Therefore, it is appealing to develop computational methods for predicting circRNA-disease associations. In this study, we propose a new computational path weighted method for predicting circRNA-disease associations. Firstly, we calculate the functional similarity scores of diseases based on disease-related gene annotations and the semantic similarity scores of circRNAs based on circRNA-related gene ontology, respectively. To address missing similarity scores of diseases and circRNAs, we calculate the Gaussian Interaction Profile (GIP) kernel similarity scores for diseases and circRNAs, respectively, based on the circRNA-disease associations downloaded from circR2Disease database (http://bioinfo.snnu.edu.cn/CircR2Disease/). Then, we integrate disease functional similarity scores and circRNA semantic similarity scores with their related GIP kernel similarity scores to construct a heterogeneous network made up of three sub-networks: disease similarity network, circRNA similarity network and circRNA-disease association network. Finally, we compute an association score for each circRNA-disease pair based on paths connecting them in the heterogeneous network to determine whether this circRNA-disease pair is associated. We adopt leave one out cross validation (LOOCV) and five-fold cross validations to evaluate the performance of our proposed method. In addition, three common diseases, Breast Cancer, Gastric Cancer and Colorectal Cancer, are used for case studies. Experimental results illustrate the reliability and usefulness of our computational method in terms of different validation measures, which indicates PWCDA can effectively predict potential circRNA-disease associations.


Cancers ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2595
Author(s):  
Chen Bian ◽  
Xiu-Juan Lei ◽  
Fang-Xiang Wu

CircRNAs (circular RNAs) are a class of non-coding RNA molecules with a closed circular structure. CircRNAs are closely related to the occurrence and development of diseases. Due to the time-consuming nature of biological experiments, computational methods have become a better way to predict the interactions between circRNAs and diseases. In this study, we developed a novel computational method called GATCDA utilizing a graph attention network (GAT) to predict circRNA–disease associations with disease symptom similarity, network similarity, and information entropy similarity for both circRNAs and diseases. GAT learns representations for nodes on a graph by an attention mechanism, which assigns different weights to different nodes in a neighborhood. Considering that the circRNA–miRNA–mRNA axis plays an important role in the generation and development of diseases, circRNA–miRNA interactions and disease–mRNA interactions were adopted to construct features, in which mRNAs were related to 88% of miRNAs. As demonstrated by five-fold cross-validation, GATCDA yielded an AUC value of 0.9011. In addition, case studies showed that GATCDA can predict unknown circRNA–disease associations. In conclusion, GATCDA is a useful method for exploring associations between circRNAs and diseases.


2019 ◽  
Vol 17 (1) ◽  
Author(s):  
Han-Jing Jiang ◽  
Zhu-Hong You ◽  
Yu-An Huang

Abstract Background In the process of drug development, computational drug repositioning is effective and resource-saving with regards to its important functions on identifying new drug–disease associations. Recent years have witnessed a great progression in the field of data mining with the advent of deep learning. An increasing number of deep learning-based techniques have been proposed to develop computational tools in bioinformatics. Methods Along this promising direction, we here propose a drug repositioning computational method combining the techniques of Sigmoid Kernel and Convolutional Neural Network (SKCNN) which is able to learn new features effectively representing drug–disease associations via its hidden layers. Specifically, we first construct similarity metric of drugs using drug sigmoid similarity and drug structural similarity, and that of disease using disease sigmoid similarity and disease semantic similarity. Based on the combined similarities of drugs and diseases, we then use SKCNN to learn hidden representations for each drug-disease pair whose labels are finally predicted by a classifier based on random forest. Results A series of experiments were implemented for performance evaluation and their results show that the proposed SKCNN improves the prediction accuracy compared with other state-of-the-art approaches. Case studies of two selected disease are also conducted through which we prove the superior performance of our method in terms of the actual discovery of potential drug indications. Conclusion The aim of this study was to establish an effective predictive model for finding new drug–disease associations. These experimental results show that SKCNN can effectively predict the association between drugs and diseases.


2020 ◽  
Vol 36 (8) ◽  
pp. 2538-2546 ◽  
Author(s):  
Jin Li ◽  
Sai Zhang ◽  
Tao Liu ◽  
Chenxi Ning ◽  
Zhuoxuan Zhang ◽  
...  

Abstract Motivation Predicting the association between microRNAs (miRNAs) and diseases plays an import role in identifying human disease-related miRNAs. As identification of miRNA-disease associations via biological experiments is time-consuming and expensive, computational methods are currently used as effective complements to determine the potential associations between disease and miRNA. Results We present a novel method of neural inductive matrix completion with graph convolutional network (NIMCGCN) for predicting miRNA-disease association. NIMCGCN first uses graph convolutional networks to learn miRNA and disease latent feature representations from the miRNA and disease similarity networks. Then, learned features were input into a novel neural inductive matrix completion (NIMC) model to generate an association matrix completion. The parameters of NIMCGCN were learned based on the known miRNA-disease association data in a supervised end-to-end way. We compared the proposed method with other state-of-the-art methods. The area under the receiver operating characteristic curve results showed that our method is significantly superior to existing methods. Furthermore, 50, 47 and 48 of the top 50 predicted miRNAs for three high-risk human diseases, namely, colon cancer, lymphoma and kidney cancer, were verified using experimental literature. Finally, 100% prediction accuracy was achieved when breast cancer was used as a case study to evaluate the ability of NIMCGCN for predicting a new disease without any known related miRNAs. Availability and implementation https://github.com/ljatynu/NIMCGCN/ Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Zhouxin Yu ◽  
Feng Huang ◽  
Xiaohan Zhao ◽  
Wenjie Xiao ◽  
Wen Zhang

Abstract Background: Determining drug–disease associations is an integral part in the process of drug development. However, the identification of drug–disease associations through wet experiments is costly and inefficient. Hence, the development of efficient and high-accuracy computational methods for predicting drug–disease associations is of great significance. Results: In this paper, we propose a novel computational method named as layer attention graph convolutional network (LAGCN) for the drug–disease association prediction. Specifically, LAGCN first integrates the known drug–disease associations, drug–drug similarities and disease–disease similarities into a heterogeneous network, and applies the graph convolution operation to the network to learn the embeddings of drugs and diseases. Second, LAGCN combines the embeddings from multiple graph convolution layers using an attention mechanism. Third, the unobserved drug–disease associations are scored based on the integrated embeddings. Evaluated by 5-fold cross-validations, LAGCN achieves an area under the precision–recall curve of 0.3168 and an area under the receiver–operating characteristic curve of 0.8750, which are better than the results of existing state-of-the-art prediction methods and baseline methods. The case study shows that LAGCN can discover novel associations that are not curated in our dataset. Conclusion: LAGCN is a useful tool for predicting drug–disease associations. This study reveals that embeddings from different convolution layers can reflect the proximities of different orders, and combining the embeddings by the attention mechanism can improve the prediction performances.


2019 ◽  
Vol 21 (4) ◽  
pp. 1356-1367 ◽  
Author(s):  
Hang Wei ◽  
Bin Liu

Abstract Circular RNAs (circRNAs) are a group of novel discovered non-coding RNAs with closed-loop structure, which play critical roles in various biological processes. Identifying associations between circRNAs and diseases is critical for exploring the complex disease mechanism and facilitating disease-targeted therapy. Although several computational predictors have been proposed, their performance is still limited. In this study, a novel computational method called iCircDA-MF is proposed. Because the circRNA-disease associations with experimental validation are very limited, the potential circRNA-disease associations are calculated based on the circRNA similarity and disease similarity extracted from the disease semantic information and the known associations of circRNA-gene, gene-disease and circRNA-disease. The circRNA-disease interaction profiles are then updated by the neighbour interaction profiles so as to correct the false negative associations. Finally, the matrix factorization is performed on the updated circRNA-disease interaction profiles to predict the circRNA-disease associations. The experimental results on a widely used benchmark dataset showed that iCircDA-MF outperforms other state-of-the-art predictors and can identify new circRNA-disease associations effectively.


2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Xiujuan Lei ◽  
Cheng Zhang

Abstract Background Increasing numbers of evidences have illuminated that metabolites can respond to pathological changes. However, identifying the diseases-related metabolites is a magnificent challenge in the field of biology and medicine. Traditional medical equipment not only has the limitation of its accuracy but also is expensive and time-consuming. Therefore, it’s necessary to take advantage of computational methods for predicting potential associations between metabolites and diseases. Results In this study, we develop a computational method based on KATZ algorithm to predict metabolite-disease associations (KATZMDA). Firstly, we extract data about metabolite-disease pairs from the latest version of HMDB database for the materials of prediction. Then we take advantage of disease semantic similarity and the improved disease Gaussian Interaction Profile (GIP) kernel similarity to obtain more reliable disease similarity and enhance the predictive performance of our proposed computational method. Simultaneously, KATZ algorithm is applied in the domains of metabolomics for the first time. Conclusions According to three kinds of cross validations and case studies of three common diseases, KATZMDA is worth serving as an impactful measuring tool for predicting the potential associations between metabolites and diseases.


2021 ◽  
Vol 22 (16) ◽  
pp. 8505
Author(s):  
Cunmei Ji ◽  
Zhihao Liu ◽  
Yutian Wang ◽  
Jiancheng Ni ◽  
Chunhou Zheng

Circular RNAs (circRNAs) are a new class of endogenous non-coding RNAs with covalent closed loop structure. Researchers have revealed that circRNAs play an important role in human diseases. As experimental identification of interactions between circRNA and disease is time-consuming and expensive, effective computational methods are an urgent need for predicting potential circRNA–disease associations. In this study, we proposed a novel computational method named GATNNCDA, which combines Graph Attention Network (GAT) and multi-layer neural network (NN) to infer disease-related circRNAs. Specially, GATNNCDA first integrates disease semantic similarity, circRNA functional similarity and the respective Gaussian Interaction Profile (GIP) kernel similarities. The integrated similarities are used as initial node features, and then GAT is applied for further feature extraction in the heterogeneous circRNA–disease graph. Finally, the NN-based classifier is introduced for prediction. The results of fivefold cross validation demonstrated that GATNNCDA achieved an average AUC of 0.9613 and AUPR of 0.9433 on the CircR2Disease dataset, and outperformed other state-of-the-art methods. In addition, case studies on breast cancer and hepatocellular carcinoma showed that 20 and 18 of the top 20 candidates were respectively confirmed in the validation datasets or published literature. Therefore, GATNNCDA is an effective and reliable tool for discovering circRNA–disease associations.


2020 ◽  
Vol 27 (4) ◽  
pp. 329-336 ◽  
Author(s):  
Lei Xu ◽  
Guangmin Liang ◽  
Baowen Chen ◽  
Xu Tan ◽  
Huaikun Xiang ◽  
...  

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.


Sign in / Sign up

Export Citation Format

Share Document