scholarly journals Benchmark of computational methods for predicting microRNA-disease associations

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Zhou Huang ◽  
Leibo Liu ◽  
Yuanxu Gao ◽  
Jiangcheng Shi ◽  
Qinghua Cui ◽  
...  

Abstract Background A series of miRNA-disease association prediction methods have been proposed to prioritize potential disease-associated miRNAs. Independent benchmarking of these methods is warranted to assess their effectiveness and robustness. Results Based on more than 8000 novel miRNA-disease associations from the latest HMDD v3.1 database, we perform systematic comparison among 36 readily available prediction methods. Their overall performances are evaluated with rigorous precision-recall curve analysis, where 13 methods show acceptable accuracy (AUPRC > 0.200) while the top two methods achieve a promising AUPRC over 0.300, and most of these methods are also highly ranked when considering only the causal miRNA-disease associations as the positive samples. The potential of performance improvement is demonstrated by combining different predictors or adopting a more updated miRNA similarity matrix, which would result in up to 16% and 46% of AUPRC augmentations compared to the best single predictor and the predictors using the previous similarity matrix, respectively. Our analysis suggests a common issue of the available methods, which is that the prediction results are severely biased toward well-annotated diseases with many associated miRNAs known and cannot further stratify the positive samples by discriminating the causal miRNA-disease associations from the general miRNA-disease associations. Conclusion Our benchmarking results not only provide a reference for biomedical researchers to choose appropriate miRNA-disease association predictors for their purpose, but also suggest the future directions for the development of more robust miRNA-disease association predictors.

2021 ◽  
Vol 16 ◽  
Author(s):  
Yayan Zhang ◽  
Guihua Duan ◽  
Cheng Yan ◽  
Haolun Yi ◽  
Fang-Xiang Wu ◽  
...  

Background: Increasing evidence has indicated that miRNA-disease association prediction plays a critical role in the study of clinical drugs. Researchers have proposed many computational models for miRNA-disease prediction. However, there is no unified platform to compare and analyze the pros and cons or share the code and data of these models. Objective: In this study, we develop an easy-to-use platform (MDAPlatform) to construct and assess miRNA-disease association prediction method. Methods: MDAPlatform integrates the relevant data of miRNA, disease and miRNA-disease associations that are used in previous miRNA-disease association prediction studies. Based on the componentized model, it develops differet components of previous computational methods. Results: Users can conduct cross validation experiments and compare their methods with other methods, and the visualized comparison results are also provided. Conclusion: Based on the componentized model, MDAPlatform provides easy-to-operate interfaces to construct the miRNA-disease association method, which is beneficial to develop new miRNA-disease association prediction methods in the future.


Complexity ◽  
2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Zhen Shen ◽  
You-Hua Zhang ◽  
Kyungsook Han ◽  
Asoke K. Nandi ◽  
Barry Honig ◽  
...  

As one of the factors in the noncoding RNA family, microRNAs (miRNAs) are involved in the development and progression of various complex diseases. Experimental identification of miRNA-disease association is expensive and time-consuming. Therefore, it is necessary to design efficient algorithms to identify novel miRNA-disease association. In this paper, we developed the computational method of Collaborative Matrix Factorization for miRNA-Disease Association prediction (CMFMDA) to identify potential miRNA-disease associations by integrating miRNA functional similarity, disease semantic similarity, and experimentally verified miRNA-disease associations. Experiments verified that CMFMDA achieves intended purpose and application values with its short consuming-time and high prediction accuracy. In addition, we used CMFMDA on Esophageal Neoplasms and Kidney Neoplasms to reveal their potential related miRNAs. As a result, 84% and 82% of top 50 predicted miRNA-disease pairs for these two diseases were confirmed by experiment. Not only this, but also CMFMDA could be applied to new diseases and new miRNAs without any known associations, which overcome the defects of many previous computational methods.


2021 ◽  
Author(s):  
Ruben Chevez-Guardado ◽  
Lourdes Pena-Castillo

Promoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compared Promotech's performance with the performance of five other promoter prediction methods. Promotech outperformed these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at https://github.com/BioinformaticsLabAtMUN/PromoTech.


RSC Advances ◽  
2017 ◽  
Vol 7 (51) ◽  
pp. 32216-32224 ◽  
Author(s):  
Xiaoying Li ◽  
Yaping Lin ◽  
Changlong Gu

The NSIM integrates the disease similarity network, miRNA similarity network, and known miRNA-disease association network on the basis of cousin similarity to predict not only novel miRNA-disease associations but also isolated diseases.


2020 ◽  
Vol 36 (9) ◽  
pp. 2839-2847 ◽  
Author(s):  
Wenjuan Zhang ◽  
Hunan Xu ◽  
Xiaozhong Li ◽  
Qiang Gao ◽  
Lin Wang

Abstract Motivation One of the most important problems in drug discovery research is to precisely predict a new indication for an existing drug, i.e. drug repositioning. Recent recommendation system-based methods have tackled this problem using matrix completion models. The models identify latent factors contributing to known drug-disease associations, and then infer novel drug-disease associations by the correlations between latent factors. However, these models have not fully considered the various drug data sources and the sparsity of the drug-disease association matrix. In addition, using the global structure of the drug-disease association data may introduce noise, and consequently limit the prediction power. Results In this work, we propose a novel drug repositioning approach by using Bayesian inductive matrix completion (DRIMC). First, we embed four drug data sources into a drug similarity matrix and two disease data sources in a disease similarity matrix. Then, for each drug or disease, its feature is described by similarity values between it and its nearest neighbors, and these features for drugs and diseases are mapped onto a shared latent space. We model the association probability for each drug-disease pair by inductive matrix completion, where the properties of drugs and diseases are represented by projections of drugs and diseases, respectively. As the known drug-disease associations have been manually verified, they are more trustworthy and important than the unknown pairs. We assign higher confidence levels to known association pairs compared with unknown pairs. We perform comprehensive experiments on three benchmark datasets, and DRIMC improves prediction accuracy compared with six stat-of-the-art approaches. Availability and implementation Source code and datasets are available at https://github.com/linwang1982/DRIMC. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ruben Chevez-Guardado ◽  
Lourdes Peña-Castillo

AbstractPromoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compare Promotech’s performance with the performance of five other promoter prediction methods. Promotech outperforms these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at https://github.com/BioinformaticsLabAtMUN/PromoTech.


2020 ◽  
Vol 21 (11) ◽  
pp. 1078-1084
Author(s):  
Ruizhi Fan ◽  
Chenhua Dong ◽  
Hu Song ◽  
Yixin Xu ◽  
Linsen Shi ◽  
...  

: Recently, an increasing number of biological and clinical reports have demonstrated that imbalance of microbial community has the ability to play important roles among several complex diseases concerning human health. Having a good knowledge of discovering potential of microbe-disease relationships, which provides the ability to having a better understanding of some issues, including disease pathology, further boosts disease diagnostics and prognostics, has been taken into account. Nevertheless, a few computational approaches can meet the need of huge scale of microbe-disease association discovery. In this work, we proposed the EHAI model, which is Enhanced Human microbe- disease Association Identification. EHAI employed the microbe-disease associations, and then Gaussian interaction profile kernel similarity has been utilized to enhance the basic microbe-disease association. Actually, some known microbe-disease associations and a large amount of associations are still unavailable among the datasets. The ‘super-microbe’ and ‘super-disease’ were employed to enhance the model. Computational results demonstrated that such super-classes have the ability to be helpful to the performance of EHAI. Therefore, it is anticipated that EHAI can be treated as an important biological tool in this field.


2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Bo Wang ◽  
Jing Zhang

Long noncoding RNAs (lncRNAs) have an important role in various life processes of the body, especially cancer. The analysis of disease prognosis is ignored in current prediction on lncRNA–disease associations. In this study, a multiple linear regression model was constructed for lncRNA–disease association prediction based on clinical prognosis data (MlrLDAcp), which integrated the cancer data of clinical prognosis and the expression quantity of lncRNA transcript. MlrLDAcp could realize not only cancer survival prediction but also lncRNA–disease association prediction. Ultimately, 60 lncRNAs most closely related to prostate cancer survival were selected from 481 alternative lncRNAs. Then, the multiple linear regression relationship between the prognosis survival of 176 patients with prostate cancer and 60 lncRNAs was also given. Compared with previous studies, MlrLDAcp had a predominant survival predictive ability and could effectively predict lncRNA–disease associations. MlrLDAcp had an area under the curve (AUC) value of 0.875 for survival prediction and an AUC value of 0.872 for lncRNA–disease association prediction. It could be an effective biological method for biomedical research.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Jinlong Li ◽  
Xingyu Chen ◽  
Qixing Huang ◽  
Yang Wang ◽  
Yun Xie ◽  
...  

Abstract Increasing evidence indicates that miRNAs play a vital role in biological processes and are closely related to various human diseases. Research on miRNA-disease associations is helpful not only for disease prevention, diagnosis and treatment, but also for new drug identification and lead compound discovery. A novel sequence- and symptom-based random forest algorithm model (Seq-SymRF) was developed to identify potential associations between miRNA and disease. Features derived from sequence information and clinical symptoms were utilized to characterize miRNA and disease, respectively. Moreover, the clustering method by calculating the Euclidean distance was adopted to construct reliable negative samples. Based on the fivefold cross-validation, Seq-SymRF achieved the accuracy of 98.00%, specificity of 99.43%, sensitivity of 96.58%, precision of 99.40% and Matthews correlation coefficient of 0.9604, respectively. The areas under the receiver operating characteristic curve and precision recall curve were 0.9967 and 0.9975, respectively. Additionally, case studies were implemented with leukemia, breast neoplasms and hsa-mir-21. Most of the top-25 predicted disease-related miRNAs (19/25 for leukemia; 20/25 for breast neoplasms) and 15 of top-25 predicted miRNA-related diseases were verified by literature and dbDEMC database. It is anticipated that Seq-SymRF could be regarded as a powerful high-throughput virtual screening tool for drug research and development. All source codes can be downloaded from https://github.com/LeeKamlong/Seq-SymRF.


Sign in / Sign up

Export Citation Format

Share Document