scholarly journals A scalable random walk with restart on heterogeneous networks with Apache Spark for ranking disease-related genes through type-II fuzzy data fusion

2021 ◽  
Vol 115 ◽  
pp. 103688
Author(s):  
Mehdi Joodaki ◽  
Nasser Ghadiri ◽  
Zeinab Maleki ◽  
Maryam Lotfi Shahreza
2019 ◽  
Author(s):  
Mehdi Joodaki ◽  
Nasser Ghadiri ◽  
Zeinab Maleki ◽  
Maryam Lotfi Shahreza

AbstractPrediction and discovery of disease-causing genes are among the main missions of biology and medicine. In recent years, researchers have developed several methods based on gene/protein networks for the detection of causative genes. However, because of the presence of false positives in these networks, the results of these methods often lack accuracy and reliability. This problem can be solved by using multiple genomic sources to reduce noise in data. However, network integration can also affect the quality of the integrated network. In this paper, we present a method named RWRHN (random walk with restart on a heterogeneous network) with fuzzy fusion or RWRHN-FF. In this method, first, four gene-gene similarity networks are constructed based on different genomic sources and then integrated using the type-II fuzzy voter scheme. The resulting gene-gene network is then linked to a disease-disease similarity network, which itself is constructed by the integration of four sources, through a two-part disease-gene network. The product of this process is a reliable heterogeneous network, which is analyzed by the RWRHN algorithm. The results of the analysis with the leave-one-out cross-validation method show that RWRHN-FF outperforms both RWRHN and RWRH. The proposed method is used to predict new genes for prostate, breast, gastric and colon cancers. To reduce the algorithm run time, Apache Spark is used as a platform for parallel execution of the RWRHN algorithm on heterogeneous networks. In the test conducted on heterogeneous networks of different sizes, this solution results in faster convergence than other non-distributed modes of implementations.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Liugen Wang ◽  
Min Shang ◽  
Qi Dai ◽  
Ping-an He

Abstract Background More and more evidence showed that long non-coding RNAs (lncRNAs) play important roles in the development and progression of human sophisticated diseases. Therefore, predicting human lncRNA-disease associations is a challenging and urgently task in bioinformatics to research of human sophisticated diseases. Results In the work, a global network-based computational framework called as LRWRHLDA were proposed which is a universal network-based method. Firstly, four isomorphic networks include lncRNA similarity network, disease similarity network, gene similarity network and miRNA similarity network were constructed. And then, six heterogeneous networks include known lncRNA-disease, lncRNA-gene, lncRNA-miRNA, disease-gene, disease-miRNA, and gene-miRNA associations network were applied to design a multi-layer network. Finally, the Laplace normalized random walk with restart algorithm in this global network is suggested to predict the relationship between lncRNAs and diseases. Conclusions The ten-fold cross validation is used to evaluate the performance of LRWRHLDA. As a result, LRWRHLDA achieves an AUC of 0.98402, which is higher than other compared methods. Furthermore, LRWRHLDA can predict isolated disease-related lnRNA (isolated lnRNA related disease). The results for colorectal cancer, lung adenocarcinoma, stomach cancer and breast cancer have been verified by other researches. The case studies indicated that our method is effective.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jia Qu ◽  
Chun-Chun Wang ◽  
Shu-Bin Cai ◽  
Wen-Di Zhao ◽  
Xiao-Long Cheng ◽  
...  

Numerous experiments have proved that microRNAs (miRNAs) could be used as diagnostic biomarkers for many complex diseases. Thus, it is conceivable that predicting the unobserved associations between miRNAs and diseases is extremely significant for the medical field. Here, based on heterogeneous networks built on the information of known miRNA–disease associations, miRNA function similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity for miRNAs and diseases, we developed a computing model of biased random walk with restart on multilayer heterogeneous networks for miRNA–disease association prediction (BRWRMHMDA) through enforcing degree-based biased random walk with restart (BRWR). Assessment results reflected that an AUC of 0.8310 was gained in local leave-one-out cross-validation (LOOCV), which proved the calculation algorithm’s good performance. Besides, we carried out BRWRMHMDA to prioritize candidate miRNAs for esophageal neoplasms based on HMDD v2.0. We further prioritize candidate miRNAs for breast neoplasms based on HMDD v1.0. The local LOOCV results and performance analysis of the case study all showed that the proposed model has good and stable performance.


2017 ◽  
Author(s):  
Alberto Valdeolivas ◽  
Laurent Tichit ◽  
Claire Navarro ◽  
Sophie Perrin ◽  
Gaëlle Odelin ◽  
...  

ABSTRACTRecent years have witnessed an exponential growth in the number of identified interactions between biological molecules. These interactions are usually represented as large and complex networks, calling for the development of appropriated tools to exploit the functional information they contain. Random walk with restart is the state-of-the-art guilt-by-association approach. It explores the network vicinity of gene/protein seeds to study their functions, based on the premise that nodes related to similar functions tend to lie close to each others in the networks.In the present study, we extended the random walk with restart algorithm to multiplex and heterogeneous networks. The walk can now explore different layers of physical and functional interactions between genes and proteins, such as protein-protein interactions and co-expression associations. In addition, the walk can also jump to a network containing different sets of edges and nodes, such as phenotype similarities between diseases.We devised a leave-one-out cross-validation strategy to evaluate the algorithms abilities to predict disease-associated genes. We demonstrate the increased performances of the multiplex-heterogeneous random walk with restart as compared to several random walks on monoplex or heterogeneous networks. Overall, our framework is able to leverage the different interaction sources to outperform current approaches.Finally, we applied the algorithm to predict genes candidate for being involved in the Wiedemann-Rautenstrauch syndrome, and to explore the network vicinity of the SHORT syndrome.The source code and the software are freely available at: https://github.com/alberto-valdeolivas/RWR-MH.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Van Tinh Nguyen ◽  
Thi Tu Kien Le ◽  
Khoat Than ◽  
Dang Hung Tran

AbstractPredicting beneficial and valuable miRNA–disease associations (MDAs) by doing biological laboratory experiments is costly and time-consuming. Proposing a forceful and meaningful computational method for predicting MDAs is essential and captivated many computer scientists in recent years. In this paper, we proposed a new computational method to predict miRNA–disease associations using improved random walk with restart and integrating multiple similarities (RWRMMDA). We used a WKNKN algorithm as a pre-processing step to solve the problem of sparsity and incompletion of data to reduce the negative impact of a large number of missing associations. Two heterogeneous networks in disease and miRNA spaces were built by integrating multiple similarity networks, respectively, and different walk probabilities could be designated to each linked neighbor node of the disease or miRNA node in line with its degree in respective networks. Finally, an improve extended random walk with restart algorithm based on miRNA similarity-based and disease similarity-based heterogeneous networks was used to calculate miRNA–disease association prediction probabilities. The experiments showed that our proposed method achieved a momentous performance with Global LOOCV AUC (Area Under Roc Curve) and AUPR (Area Under Precision-Recall Curve) values of 0.9882 and 0.9066, respectively. And the best AUC and AUPR values under fivefold cross-validation of 0.9855 and 0.8642 which are proven by statistical tests, respectively. In comparison with other previous related methods, it outperformed than NTSHMDA, PMFMDA, IMCMDA and MCLPMDA methods in both AUC and AUPR values. In case studies of Breast Neoplasms, Carcinoma Hepatocellular and Stomach Neoplasms diseases, it inferred 1, 12 and 7 new associations out of top 40 predicted associated miRNAs for each disease, respectively. All of these new inferred associations have been confirmed in different databases or literatures.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yuhua Yao ◽  
Binbin Ji ◽  
Yaping Lv ◽  
Ling Li ◽  
Ju Xiang ◽  
...  

Studies have found that long non-coding RNAs (lncRNAs) play important roles in many human biological processes, and it is critical to explore potential lncRNA–disease associations, especially cancer-associated lncRNAs. However, traditional biological experiments are costly and time-consuming, so it is of great significance to develop effective computational models. We developed a random walk algorithm with restart on multiplex and heterogeneous networks of lncRNAs and diseases to predict lncRNA–disease associations (MHRWRLDA). First, multiple disease similarity networks are constructed by using different approaches to calculate similarity scores between diseases, and multiple lncRNA similarity networks are also constructed by using different approaches to calculate similarity scores between lncRNAs. Then, a multiplex and heterogeneous network was constructed by integrating multiple disease similarity networks and multiple lncRNA similarity networks with the lncRNA–disease associations, and a random walk with restart on the multiplex and heterogeneous network was performed to predict lncRNA–disease associations. The results of Leave-One-Out cross-validation (LOOCV) showed that the value of Area under the curve (AUC) was 0.68736, which was improved compared with the classical algorithm in recent years. Finally, we confirmed a few novel predicted lncRNAs associated with specific diseases like colon cancer by literature mining. In summary, MHRWRLDA contributes to predict lncRNA–disease associations.


Author(s):  
Seyyed Mohammadreza Rahimi ◽  
Rodrigo Augusto de Oliveira e Silva ◽  
Behrouz Far ◽  
Xin Wang

Sign in / Sign up

Export Citation Format

Share Document