scholarly journals Decision-Tree Based Meta-Strategy Improved Accuracy of Disorder Prediction and Identified Novel Disordered Residues Inside Binding Motifs

2018 ◽  
Vol 19 (10) ◽  
pp. 3052 ◽  
Author(s):  
Bi Zhao ◽  
Bin Xue

Using computational techniques to identify intrinsically disordered residues is practical and effective in biological studies. Therefore, designing novel high-accuracy strategies is always preferable when existing strategies have a lot of room for improvement. Among many possibilities, a meta-strategy that integrates the results of multiple individual predictors has been broadly used to improve the overall performance of predictors. Nonetheless, a simple and direct integration of individual predictors may not effectively improve the performance. In this project, dual-threshold two-step significance voting and neural networks were used to integrate the predictive results of four individual predictors, including: DisEMBL, IUPred, VSL2, and ESpritz. The new meta-strategy has improved the prediction performance of intrinsically disordered residues significantly, compared to all four individual predictors and another four recently-designed predictors. The improvement was validated using five-fold cross-validation and in independent test datasets.

2005 ◽  
Vol 03 (01) ◽  
pp. 35-60 ◽  
Author(s):  
KANG PENG ◽  
SLOBODAN VUCETIC ◽  
PREDRAG RADIVOJAC ◽  
CELESTE J. BROWN ◽  
A. KEITH DUNKER ◽  
...  

Protein existing as an ensemble of structures, called intrinsically disordered, has been shown to be responsible for a wide variety of biological functions and to be common in nature. Here we focus on improving sequence-based predictions of long (>30 amino acid residues) regions lacking specific 3-D structure by means of four new neural-network-based Predictors Of Natural Disordered Regions (PONDRs): VL3, VL3H, VL3P, and VL3E. PONDR VL3 used several features from a previously introduced PONDR VL2, but benefitted from optimized predictor models and a slightly larger (152 vs. 145) set of disordered proteins that were cleaned of mislabeling errors found in the smaller set. PONDR VL3H utilized homologues of the disordered proteins in the training stage, while PONDR VL3P used attributes derived from sequence profiles obtained by PSI-BLAST searches. The measure of accuracy was the average between accuracies on disordered and ordered protein regions. By this measure, the 30-fold cross-validation accuracies of VL3, VL3H, and VL3P were, respectively, 83.6 ± 1.4%, 85.3 ± 1.4%, and 85.2 ± 1.5%. By combining VL3H and VL3P, the resulting PONDR VL3E achieved an accuracy of 86.7 ± 1.4%. This is a significant improvement over our previous PONDRs VLXT (71.6 ± 1.3%) and VL2 (80.9 ± 1.4%). The new disorder predictors with the corresponding datasets are freely accessible through the web server at .


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0251865
Author(s):  
Seonwoo Min ◽  
HyunGi Kim ◽  
Byunghan Lee ◽  
Sungroh Yoon

Heat shock proteins (HSPs) play a pivotal role as molecular chaperones against unfavorable conditions. Although HSPs are of great importance, their computational identification remains a significant challenge. Previous studies have two major limitations. First, they relied heavily on amino acid composition features, which inevitably limited their prediction performance. Second, their prediction performance was overestimated because of the independent two-stage evaluations and train-test data redundancy. To overcome these limitations, we introduce two novel deep learning algorithms: (1) time-efficient DeepHSP and (2) high-performance DeeperHSP. We propose a convolutional neural network (CNN)-based DeepHSP that classifies both non-HSPs and six HSP families simultaneously. It outperforms state-of-the-art algorithms, despite taking 14–15 times less time for both training and inference. We further improve the performance of DeepHSP by taking advantage of protein transfer learning. While DeepHSP is trained on raw protein sequences, DeeperHSP is trained on top of pre-trained protein representations. Therefore, DeeperHSP remarkably outperforms state-of-the-art algorithms increasing F1 scores in both cross-validation and independent test experiments by 20% and 10%, respectively. We envision that the proposed algorithms can provide a proteome-wide prediction of HSPs and help in various downstream analyses for pathology and clinical research.


2019 ◽  
Author(s):  
Yan-Li Lee ◽  
Ratha Pech ◽  
Maryna Po ◽  
Dong Hao ◽  
Tao Zhou

AbstractMicroRNAs (miRNAs) have been playing a crucial role in many important biological processes e.g., pathogenesis of diseases. Currently, the validated associations between miRNAs and diseases are insufficient comparing to the hidden associations. Testing all these hidden associations by biological experiments is expensive, laborious, and time consuming. Therefore, computationally inferring hidden associations from biological datasets for further laboratory experiments has attracted increasing interests from different communities ranging from biological to computational science. In this work, we propose an effective and efficient method to predict associations between miRNAs and diseases, namely linear optimization (LOMDA). The proposed method uses the heterogenous matrix incorporating of miRNA functional similarity information, disease similarity information and known miRNA-disease associations. Compared with the other methods, LOMDA performs best in terms of AUC (0.970), precision (0.566), and accuracy (0.971) in average over 15 diseases in local 5-fold cross-validation. Moreover, LOMDA has also been applied to two types of case studies. In the first case study, 30 predictions from breast neoplasms, 24 from colon neoplasms, and 26 from kidney neoplasms among top 30 predicted miRNAs are confirmed. In the second case study, for new diseases without any known associations, top 30 predictions from hepatocellular carcinoma and 29 from lung neoplasms among top 30 predicted miRNAs are confirmed.Author summaryIdentifying associations between miRNAs and diseases is significant in investigation of pathogenesis, diagnosis, treatment and preventions of related diseases. Employing computational methods to predict the hidden associations based on known associations and focus on those predicted associations can sharply reduce the experimental costs. We developed a computational method LOMDA based on the linear optimization technique to predict the hidden associations. In addition to the observed associations, LOMDA also can employ the auxiliary information (diseases and miRNAs similarity information) flexibly and effectively. Numerical experiments on global 5-fold cross validation show that the use of the auxiliary information can greatly improve the prediction performance. Meanwhile, the result on local 5-fold cross validation shows that LOMDA performs best among the seven related methods. We further test the prediction performance of LOMDA for two types of diseases based on HDMMv2.0 (2014), including (i) diseases with all the known associations, and (ii) new diseases without known associations. Three independent or updated databases (dbDEMC, 2010; miR2Disease, 2009; HDMMv3.2, 2019) are introduced to evaluate the prediction results. As a result, most miRNAs for target diseases are confirmed by at least one of the three databases. So, we believe that LOMDA can guide experiments to identify the hidden miRNA-disease associations.


2019 ◽  
Vol 9 (17) ◽  
pp. 3538 ◽  
Author(s):  
Hailong Hu ◽  
Zhong Li ◽  
Arne Elofsson ◽  
Shangxin Xie

The prediction of protein secondary structure continues to be an active area of research in bioinformatics. In this paper, a Bi-LSTM based ensemble model is developed for the prediction of protein secondary structure. The ensemble model with dual loss function consists of five sub-models, which are finally joined by a Bi-LSTM layer. In contrast to existing ensemble methods, which generally train each sub-model and then join them as a whole, this ensemble model and sub-models can be trained simultaneously and the performance of each model can be observed and compared during the training process. Three independent test sets (e.g., data1199, 513 protein Cuff & Barton set (CB513) and 203 proteins from Critical Appraisals Skills Programme (CASP203)) are employed to test the method. On average, the ensemble model achieved 84.3% in Q 3 accuracy and 81.9% in segment overlap measure ( SOV ) score by using 10-fold cross validation. There is an improvement of up to 1% over some state-of-the-art prediction methods of protein secondary structure.


2015 ◽  
Vol 2015 ◽  
pp. 1-14 ◽  
Author(s):  
Guohua Huang ◽  
Yin Lu ◽  
Changhong Lu ◽  
Mingyue Zheng ◽  
Yu-Dong Cai

Discovering potential indications of novel or approved drugs is a key step in drug development. Previous computational approaches could be categorized into disease-centric and drug-centric based on the starting point of the issues or small-scaled application and large-scale application according to the diversity of the datasets. Here, a classifier has been constructed to predict the indications of a drug based on the assumption that interactive/associated drugs or drugs with similar structures are more likely to target the same diseases using a large drug indication dataset. To examine the classifier, it was conducted on a dataset with 1,573 drugs retrieved from Comprehensive Medicinal Chemistry database for five times, evaluated by 5-fold cross-validation, yielding five 1st order prediction accuracies that were all approximately 51.48%. Meanwhile, the model yielded an accuracy rate of 50.00% for the 1st order prediction by independent test on a dataset with 32 other drugs in which drug repositioning has been confirmed. Interestingly, some clinically repurposed drug indications that were not included in the datasets are successfully identified by our method. These results suggest that our method may become a useful tool to associate novel molecules with new indications or alternative indications with existing drugs.


2017 ◽  
Vol 31 (10) ◽  
pp. 2001-2019 ◽  
Author(s):  
Jonne Pohjankukka ◽  
Tapio Pahikkala ◽  
Paavo Nevalainen ◽  
Jukka Heikkonen

2020 ◽  
Author(s):  
Sergey Kucheryavskiy ◽  
Sergei Zhilin ◽  
Oxana Ye. Rodionova ◽  
Alexey L. Pomerantsev

<div><div><div><p>In this paper we propose a new approach for validation of chemometric models. It is based on k-fold cross-validation algorithm, but, in contrast to conventional cross-validation, our approach makes possible to create a new dataset, which carries sampling uncertainty estimated by the cross-validation procedure. This dataset, called <i>pseudo-validation set</i>, can be used similar to independent test set, giving a possibility to compute residual distances, explained variance, scores and other results, which can not be obtained in the conventional cross-validation. The paper describes theoretical details of the proposed approach and its implementation as well as presents experimental results obtained using simulated and real chemical datasets.</p></div></div></div>


2021 ◽  
Vol 16 ◽  
Author(s):  
Lingzhi Zhu ◽  
Guihua Duan ◽  
Cheng Yan ◽  
Jianxin Wang

Background: Microbial communities have important influences on our health and disease. Identifying potential human microbe-drug associations will be greatly advantageous to explore complex mechanisms of microbes in drug discovery, combinations and repositioning. Until now, the complex mechanism of microbe-drug associations remains unknown. Objective: Computational models play an important role in discovering hidden microbe-drug associations, because biological experiments are time-consuming and expensive. Based on chemical structures of drugs and the KATZ measure, a new computational model (HMDAKATZ) is proposed for identifying potential Human Microbe-Drug Associations. Methods: In HMDAKATZ, the similarity between microbes is computed using the Gaussian Interaction Profile (GIP) kernel based on known human microbe-drug associations. The similarity between drugs is computed based on known human microbe-drug associations and chemical structures. Then, a microbe-drug heterogeneous network is constructed by integrating the microbe-microbe network, the drug-drug network, and a known microbe-drug association network. Finally, we apply KATZ to identify potential association s between microbes and drugs. Results: The experimental results showed that HMDAKATZ achieved area under the curve (AUC) values of 0.9010±0.0020, 0.9066±0.0015, and 0.9116 in 5-fold cross validation (5-fold CV), 10-fold cross validation (10-fold CV), and leave one out cross validation (LOOCV), respectively, which outperformed four other computational models (SNMF, RLS, HGBI, and NBI). Conclusion: HMDAKATZ obtained the better prediction performance than four other methods in 5-fold CV, 10-fold CV, and LOOCV. Furthermore, three case studies also illustrated that HMDAKATZ is an effective way to discover hidden microbe-drug associations.


2020 ◽  
Author(s):  
Sergey Kucheryavskiy ◽  
Sergei Zhilin ◽  
Oxana Ye. Rodionova ◽  
Alexey L. Pomerantsev

<div><div><div><p>In this paper we propose a new approach for validation of chemometric models. It is based on k-fold cross-validation algorithm, but, in contrast to conventional cross-validation, our approach makes possible to create a new dataset, which carries sampling uncertainty estimated by the cross-validation procedure. This dataset, called <i>pseudo-validation set</i>, can be used similar to independent test set, giving a possibility to compute residual distances, explained variance, scores and other results, which can not be obtained in the conventional cross-validation. The paper describes theoretical details of the proposed approach and its implementation as well as presents experimental results obtained using simulated and real chemical datasets.</p></div></div></div>


2021 ◽  
Vol 12 ◽  
Author(s):  
Cunmei Ji ◽  
Yutian Wang ◽  
Jiancheng Ni ◽  
Chunhou Zheng ◽  
Yansen Su

In recent years, more and more evidence has shown that microRNAs (miRNAs) play an important role in the regulation of post-transcriptional gene expression, and are closely related to human diseases. Many studies have also revealed that miRNAs can be served as promising biomarkers for the potential diagnosis and treatment of human diseases. The interactions between miRNA and human disease have rarely been demonstrated, and the underlying mechanism of miRNA is not clear. Therefore, computational approaches has attracted the attention of researchers, which can not only save time and money, but also improve the efficiency and accuracy of biological experiments. In this work, we proposed a Heterogeneous Graph Attention Networks (GAT) based method for miRNA-disease associations prediction, named HGATMDA. We constructed a heterogeneous graph for miRNAs and diseases, introduced weighted DeepWalk and GAT methods to extract features of miRNAs and diseases from the graph. Moreover, a fully-connected neural networks is used to predict correlation scores between miRNA-disease pairs. Experimental results under five-fold cross validation (five-fold CV) showed that HGATMDA achieved better prediction performance than other state-of-the-art methods. In addition, we performed three case studies on breast neoplasms, lung neoplasms and kidney neoplasms. The results showed that for the three diseases mentioned above, 50 out of top 50 candidates were confirmed by the validation datasets. Therefore, HGATMDA is suitable as an effective tool to identity potential diseases-related miRNAs.


Sign in / Sign up

Export Citation Format

Share Document