Benchmark of computational methods for predicting microRNA-disease associations

Abstract Background A series of miRNA-disease association prediction methods have been proposed to prioritize potential disease-associated miRNAs. Independent benchmarking of these methods is warranted to assess their effectiveness and robustness. Results Based on more than 8000 novel miRNA-disease associations from the latest HMDD v3.1 database, we perform systematic comparison among 36 readily available prediction methods. Their overall performances are evaluated with rigorous precision-recall curve analysis, where 13 methods show acceptable accuracy (AUPRC > 0.200) while the top two methods achieve a promising AUPRC over 0.300, and most of these methods are also highly ranked when considering only the causal miRNA-disease associations as the positive samples. The potential of performance improvement is demonstrated by combining different predictors or adopting a more updated miRNA similarity matrix, which would result in up to 16% and 46% of AUPRC augmentations compared to the best single predictor and the predictors using the previous similarity matrix, respectively. Our analysis suggests a common issue of the available methods, which is that the prediction results are severely biased toward well-annotated diseases with many associated miRNAs known and cannot further stratify the positive samples by discriminating the causal miRNA-disease associations from the general miRNA-disease associations. Conclusion Our benchmarking results not only provide a reference for biomedical researchers to choose appropriate miRNA-disease association predictors for their purpose, but also suggest the future directions for the development of more robust miRNA-disease association predictors.

Download Full-text

MDAPlatform: a Component-based Platform for Constructing and Assessing miRNA-disease association Prediction Methods

Current Bioinformatics ◽

10.2174/1574893616999210120181506 ◽

2021 ◽

Vol 16 ◽

Author(s):

Yayan Zhang ◽

Guihua Duan ◽

Cheng Yan ◽

Haolun Yi ◽

Fang-Xiang Wu ◽

...

Keyword(s):

Computational Models ◽

Critical Role ◽

Prediction Method ◽

Disease Association ◽

Prediction Methods ◽

Comparison Results ◽

Disease Associations ◽

Pros And Cons ◽

Clinical Drugs ◽

Validation Experiments

Background: Increasing evidence has indicated that miRNA-disease association prediction plays a critical role in the study of clinical drugs. Researchers have proposed many computational models for miRNA-disease prediction. However, there is no unified platform to compare and analyze the pros and cons or share the code and data of these models. Objective: In this study, we develop an easy-to-use platform (MDAPlatform) to construct and assess miRNA-disease association prediction method. Methods: MDAPlatform integrates the relevant data of miRNA, disease and miRNA-disease associations that are used in previous miRNA-disease association prediction studies. Based on the componentized model, it develops differet components of previous computational methods. Results: Users can conduct cross validation experiments and compare their methods with other methods, and the visualized comparison results are also provided. Conclusion: Based on the componentized model, MDAPlatform provides easy-to-operate interfaces to construct the miRNA-disease association method, which is beneficial to develop new miRNA-disease association prediction methods in the future.

Download Full-text

miRNA-Disease Association Prediction with Collaborative Matrix Factorization

Complexity ◽

10.1155/2017/2498957 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 29

Author(s):

Zhen Shen ◽

You-Hua Zhang ◽

Kyungsook Han ◽

Asoke K. Nandi ◽

Barry Honig ◽

...

Keyword(s):

Matrix Factorization ◽

Noncoding Rna ◽

Esophageal Neoplasms ◽

Kidney Neoplasms ◽

Disease Association ◽

Computational Method ◽

Experimental Identification ◽

Novel Mirna ◽

Disease Associations ◽

High Prediction

As one of the factors in the noncoding RNA family, microRNAs (miRNAs) are involved in the development and progression of various complex diseases. Experimental identification of miRNA-disease association is expensive and time-consuming. Therefore, it is necessary to design efficient algorithms to identify novel miRNA-disease association. In this paper, we developed the computational method of Collaborative Matrix Factorization for miRNA-Disease Association prediction (CMFMDA) to identify potential miRNA-disease associations by integrating miRNA functional similarity, disease semantic similarity, and experimentally verified miRNA-disease associations. Experiments verified that CMFMDA achieves intended purpose and application values with its short consuming-time and high prediction accuracy. In addition, we used CMFMDA on Esophageal Neoplasms and Kidney Neoplasms to reveal their potential related miRNAs. As a result, 84% and 82% of top 50 predicted miRNA-disease pairs for these two diseases were confirmed by experiment. Not only this, but also CMFMDA could be applied to new diseases and new miRNAs without any known associations, which overcome the defects of many previous computational methods.

Download Full-text

Promotech: A general tool for bacterial promoter recognition

10.1101/2021.07.16.452684 ◽

2021 ◽

Author(s):

Ruben Chevez-Guardado ◽

Lourdes Pena-Castillo

Keyword(s):

Bacterial Species ◽

Prediction Methods ◽

Promoter Prediction ◽

Computational Tools ◽

Wide Range ◽

Promoter Recognition ◽

Precision Recall Curve ◽

Genomic Regions ◽

General Tool ◽

Recall Curve

Promoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compared Promotech's performance with the performance of five other promoter prediction methods. Promotech outperformed these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at https://github.com/BioinformaticsLabAtMUN/PromoTech.

Download Full-text

A network similarity integration method for predicting microRNA-disease associations

RSC Advances ◽

10.1039/c7ra05348g ◽

2017 ◽

Vol 7 (51) ◽

pp. 32216-32224 ◽

Cited By ~ 5

Author(s):

Xiaoying Li ◽

Yaping Lin ◽

Changlong Gu

Keyword(s):

Integration Method ◽

Disease Association ◽

Association Network ◽

Similarity Network ◽

Novel Mirna ◽

Disease Similarity ◽

Disease Associations ◽

Network Similarity

The NSIM integrates the disease similarity network, miRNA similarity network, and known miRNA-disease association network on the basis of cousin similarity to predict not only novel miRNA-disease associations but also isolated diseases.

Download Full-text

DRIMC: an improved drug repositioning approach using Bayesian inductive matrix completion

Bioinformatics ◽

10.1093/bioinformatics/btaa062 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2839-2847 ◽

Cited By ~ 1

Author(s):

Wenjuan Zhang ◽

Hunan Xu ◽

Xiaozhong Li ◽

Qiang Gao ◽

Lin Wang

Keyword(s):

Drug Repositioning ◽

Matrix Completion ◽

Disease Association ◽

Data Sources ◽

Supplementary Information ◽

Similarity Matrix ◽

Latent Factors ◽

Discovery Research ◽

Disease Associations ◽

Novel Drug

Abstract Motivation One of the most important problems in drug discovery research is to precisely predict a new indication for an existing drug, i.e. drug repositioning. Recent recommendation system-based methods have tackled this problem using matrix completion models. The models identify latent factors contributing to known drug-disease associations, and then infer novel drug-disease associations by the correlations between latent factors. However, these models have not fully considered the various drug data sources and the sparsity of the drug-disease association matrix. In addition, using the global structure of the drug-disease association data may introduce noise, and consequently limit the prediction power. Results In this work, we propose a novel drug repositioning approach by using Bayesian inductive matrix completion (DRIMC). First, we embed four drug data sources into a drug similarity matrix and two disease data sources in a disease similarity matrix. Then, for each drug or disease, its feature is described by similarity values between it and its nearest neighbors, and these features for drugs and diseases are mapped onto a shared latent space. We model the association probability for each drug-disease pair by inductive matrix completion, where the properties of drugs and diseases are represented by projections of drugs and diseases, respectively. As the known drug-disease associations have been manually verified, they are more trustworthy and important than the unknown pairs. We assign higher confidence levels to known association pairs compared with unknown pairs. We perform comprehensive experiments on three benchmark datasets, and DRIMC improves prediction accuracy compared with six stat-of-the-art approaches. Availability and implementation Source code and datasets are available at https://github.com/linwang1982/DRIMC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Promotech: a general tool for bacterial promoter recognition

Genome Biology ◽

10.1186/s13059-021-02514-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ruben Chevez-Guardado ◽

Lourdes Peña-Castillo

Keyword(s):

Bacterial Species ◽

Prediction Methods ◽

Promoter Prediction ◽

Computational Tools ◽

Wide Range ◽

Promoter Recognition ◽

Precision Recall Curve ◽

Genomic Regions ◽

General Tool ◽

Recall Curve

AbstractPromoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compare Promotech’s performance with the performance of five other promoter prediction methods. Promotech outperforms these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at https://github.com/BioinformaticsLabAtMUN/PromoTech.

Download Full-text

EHAI: Enhanced Human Microbe-Disease Association Identification

Current Protein and Peptide Science ◽

10.2174/1389203721666200702150249 ◽

2020 ◽

Vol 21 (11) ◽

pp. 1078-1084

Author(s):

Ruizhi Fan ◽

Chenhua Dong ◽

Hu Song ◽

Yixin Xu ◽

Linsen Shi ◽

...

Keyword(s):

Microbial Community ◽

Human Health ◽

Complex Diseases ◽

Disease Association ◽

Computational Results ◽

Computational Approaches ◽

Disease Diagnostics ◽

Disease Associations ◽

Association Discovery ◽

Biological Tool

: Recently, an increasing number of biological and clinical reports have demonstrated that imbalance of microbial community has the ability to play important roles among several complex diseases concerning human health. Having a good knowledge of discovering potential of microbe-disease relationships, which provides the ability to having a better understanding of some issues, including disease pathology, further boosts disease diagnostics and prognostics, has been taken into account. Nevertheless, a few computational approaches can meet the need of huge scale of microbe-disease association discovery. In this work, we proposed the EHAI model, which is Enhanced Human microbe- disease Association Identification. EHAI employed the microbe-disease associations, and then Gaussian interaction profile kernel similarity has been utilized to enhance the basic microbe-disease association. Actually, some known microbe-disease associations and a large amount of associations are still unavailable among the datasets. The ‘super-microbe’ and ‘super-disease’ were employed to enhance the model. Computational results demonstrated that such super-classes have the ability to be helpful to the performance of EHAI. Therefore, it is anticipated that EHAI can be treated as an important biological tool in this field.

Download Full-text

Critical assessment and performance improvement of plant–pathogen protein–protein interaction prediction methods

Briefings in Bioinformatics ◽

10.1093/bib/bbx123 ◽

2017 ◽

Vol 20 (1) ◽

pp. 274-287 ◽

Cited By ~ 12

Author(s):

Shiping Yang ◽

Hong Li ◽

Huaqin He ◽

Yuan Zhou ◽

Ziding Zhang

Keyword(s):

Performance Improvement ◽

Protein Interaction ◽

Plant Pathogen ◽

Critical Assessment ◽

Prediction Methods ◽

Interaction Prediction ◽

Protein Protein Interaction ◽

Pathogen Protein ◽

Protein Interaction Prediction ◽

And Performance

Download Full-text

Multiple Linear Regression Analysis of lncRNA–Disease Association Prediction Based on Clinical Prognosis Data

BioMed Research International ◽

10.1155/2018/3823082 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Bo Wang ◽

Jing Zhang

Keyword(s):

Prostate Cancer ◽

Linear Regression ◽

Multiple Linear Regression ◽

Cancer Survival ◽

Disease Association ◽

The Body ◽

Survival Prediction ◽

Clinical Prognosis ◽

Disease Associations ◽

Auc Value

Long noncoding RNAs (lncRNAs) have an important role in various life processes of the body, especially cancer. The analysis of disease prognosis is ignored in current prediction on lncRNA–disease associations. In this study, a multiple linear regression model was constructed for lncRNA–disease association prediction based on clinical prognosis data (MlrLDAcp), which integrated the cancer data of clinical prognosis and the expression quantity of lncRNA transcript. MlrLDAcp could realize not only cancer survival prediction but also lncRNA–disease association prediction. Ultimately, 60 lncRNAs most closely related to prostate cancer survival were selected from 481 alternative lncRNAs. Then, the multiple linear regression relationship between the prognosis survival of 176 patients with prostate cancer and 60 lncRNAs was also given. Compared with previous studies, MlrLDAcp had a predominant survival predictive ability and could effectively predict lncRNA–disease associations. MlrLDAcp had an area under the curve (AUC) value of 0.875 for survival prediction and an AUC value of 0.872 for lncRNA–disease association prediction. It could be an effective biological method for biomedical research.

Download Full-text

Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms

Scientific Reports ◽

10.1038/s41598-020-75005-9 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Jinlong Li ◽

Xingyu Chen ◽

Qixing Huang ◽

Yang Wang ◽

Yun Xie ◽

...

Keyword(s):

Random Forest ◽

Breast Neoplasms ◽

Clinical Symptoms ◽

Characteristic Curve ◽

Vital Role ◽

Sequence Information ◽

Source Codes ◽

Drug Identification ◽

Disease Associations ◽

Precision Recall Curve

Abstract Increasing evidence indicates that miRNAs play a vital role in biological processes and are closely related to various human diseases. Research on miRNA-disease associations is helpful not only for disease prevention, diagnosis and treatment, but also for new drug identification and lead compound discovery. A novel sequence- and symptom-based random forest algorithm model (Seq-SymRF) was developed to identify potential associations between miRNA and disease. Features derived from sequence information and clinical symptoms were utilized to characterize miRNA and disease, respectively. Moreover, the clustering method by calculating the Euclidean distance was adopted to construct reliable negative samples. Based on the fivefold cross-validation, Seq-SymRF achieved the accuracy of 98.00%, specificity of 99.43%, sensitivity of 96.58%, precision of 99.40% and Matthews correlation coefficient of 0.9604, respectively. The areas under the receiver operating characteristic curve and precision recall curve were 0.9967 and 0.9975, respectively. Additionally, case studies were implemented with leukemia, breast neoplasms and hsa-mir-21. Most of the top-25 predicted disease-related miRNAs (19/25 for leukemia; 20/25 for breast neoplasms) and 15 of top-25 predicted miRNA-related diseases were verified by literature and dbDEMC database. It is anticipated that Seq-SymRF could be regarded as a powerful high-throughput virtual screening tool for drug research and development. All source codes can be downloaded from https://github.com/LeeKamlong/Seq-SymRF.

Download Full-text