scholarly journals Prediction of druggable proteins using machine learning and functional enrichment analysis: a focus on cancer-related proteins and RNA-binding proteins

2019 ◽  
Author(s):  
Andrés López-Cortés ◽  
Alejandro Cabrera-Andrade ◽  
Carlos M. Cruz-Segundo ◽  
Julian Dorado ◽  
Alejandro Pazos ◽  
...  

ABSTRACTBackgroundDruggable proteins are a trending topic in drug design. The druggable proteome can be defined as the percentage of proteins that have the capacity to bind an antibody or small molecule with adequate chemical properties and affinity. The screening and in silico modeling are critical activities for the reduction of experimental costs.MethodsThe current work proposes a unique prediction model for druggable proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. After feature selection, the best classifier was obtained using the support vector machine method and 200 tri-amino acid composition descriptors.ResultsThe high performance of the model is determined by an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and accuracy of 0.929 ± 0.006 (3-fold cross-validation). Regarding the prediction of cancer-associated proteins using this model, the best ranked druggable predicted proteins in the breast cancer protein set were CDK4, AP1S1, POLE, HMMR, RPL5, PALB2, TIMP1, RPL22, NFKB1 and TOP2A; in the cancer-driving protein set were TLL2, FAM47C, SAGE1, HTR1E, MACC1, ZFR2, VMA21, DUSP9, CTNNA3 and GABRG1; and in the RNA-binding protein set were PLA2G1B, CPEB2, NOL6, LRRC47, CTTN, CORO1A, SCAF11, KCTD12, DDX43 and TMPO.ConclusionsThis powerful model predicts several druggable proteins which should be deeply studied to find better therapeutic targets and thus improve clinical trials. The scripts are freely available at https://github.com/muntisa/machine-learning-for-druggable-proteins.

2021 ◽  
Vol 22 (23) ◽  
pp. 13124
Author(s):  
Phasit Charoenkwan ◽  
Chanin Nantasenamat ◽  
Md Mehedi Hasan ◽  
Mohammad Ali Moni ◽  
Balachandran Manavalan ◽  
...  

Umami ingredients have been identified as important factors in food seasoning and production. Traditional experimental methods for characterizing peptides exhibiting umami sensory properties (umami peptides) are time-consuming, laborious, and costly. As a result, it is preferable to develop computational tools for the large-scale identification of available sequences in order to identify novel peptides with umami sensory properties. Although a computational tool has been developed for this purpose, its predictive performance is still insufficient. In this study, we use a feature representation learning approach to create a novel machine-learning meta-predictor called UMPred-FRL for improved umami peptide identification. We combined six well-known machine learning algorithms (extremely randomized trees, k-nearest neighbor, logistic regression, partial least squares, random forest, and support vector machine) with seven different feature encodings (amino acid composition, amphiphilic pseudo-amino acid composition, dipeptide composition, composition-transition-distribution, and pseudo-amino acid composition) to develop the final meta-predictor. Extensive experimental results demonstrated that UMPred-FRL was effective and achieved more accurate performance on the benchmark dataset compared to its baseline models, and consistently outperformed the existing method on the independent test dataset. Finally, to aid in the high-throughput identification of umami peptides, the UMPred-FRL web server was established and made freely available online. It is expected that UMPred-FRL will be a powerful tool for the cost-effective large-scale screening of candidate peptides with potential umami sensory properties.


2020 ◽  
Vol 17 (5) ◽  
pp. 647-660 ◽  
Author(s):  
Shivananda Kandagalla ◽  
Sharath Belenahalli Shekarappa ◽  
Gollapalli Pavan ◽  
Umme Hani ◽  
Manjunatha Hanumanthappa

Background: Capsaicin is an active alkaloid /principal component of red pepper responsible for the pungency of chili pepper. Capsaicin by changing the intracellular redox homeostasis regulate a variety of signaling pathways ultimately producing a divergent cellular outcome. Several reports showed the potential of capsaicin against cancer metastasis, however unexplored molecular mechanism is still an active part of the research. Several growth factors have a critical role during cancer metastasis among them TGF- β signaling play a vital role. Methods: The present study aimed at analyzing capsaicin modulation of TGF-β signaling using network pharmacology approach. The chemical and protein interaction data of capsaicin was curated and abstracted using STITCH4.0, PubChem and ChEMBL database. Further, the compiled data set was subjected to the pathway and functional enrichment analysis using Protein Analysis THrough Evolutionary Relationship (PANTHER) and, Database for Annotation, Visualization, and Integrated Discovery (DAVID) database. Meanwhile, the pattern of amino acid composition across the capsaicin targets was analyzed using the EMBOSS Pepstat tool. Capsaicin targets involved in TGF- β were identified and their Protein-Protein Interaction (PPI) network constructed using STRING v10 and Cytoscape (v 3.2.1). From the above-constructed network, the clusters were mined using the MCODE clustering algorithm and finally binding affinity of capsaicin with its targets involved in TGF-β signaling pathway was analyzed using Autodock Vina. Results: The analysis explored capsaicin targets and, their associated functional and pathway annotations. Besides, the analysis also provides a detailed distinct pattern of amino acid composition across the capsaicin targets. The capsaicin targets described as MAPK14, JUN, SMAD3, MAPK3, MAPK1 and MYC involved in TGF-β signaling pathway through pathway enrichment analysis. The binding mode analysis of capsaicin with its targets has shown high affinity with MAPK3, MAPK1, JUN and MYC. Conclusion: The study explores the potential of capsaicin as a potent modulator of TGF-β signaling pathway during cancer metastasis and proposes new methodology and mechanism of action of capsaicin against TGF- β signaling pathway.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Lifu Zhang ◽  
Benzhi Dong ◽  
Zhixia Teng ◽  
Ying Zhang ◽  
Liran Juan

Enzymes are proteins that can efficiently catalyze specific biochemical reactions, and they are widely present in the human body. Developing an efficient method to identify human enzymes is vital to select enzymes from the vast number of human proteins and to investigate their functions. Nevertheless, only a limited amount of research has been conducted on the classification of human enzymes and nonenzymes. In this work, we developed a support vector machine- (SVM-) based predictor to classify human enzymes using the amino acid composition (AAC), the composition of k-spaced amino acid pairs (CKSAAP), and selected informative amino acid pairs through the use of a feature selection technique. A training dataset including 1117 human enzymes and 2099 nonenzymes and a test dataset including 684 human enzymes and 1270 nonenzymes were constructed to train and test the proposed model. The results of jackknife cross-validation showed that the overall accuracy was 76.46% for the training set and 76.21% for the test set, which are higher than the 72.6% achieved in previous research. Furthermore, various feature extraction methods and mainstream classifiers were compared in this task, and informative feature parameters of k-spaced amino acid pairs were selected and compared. The results suggest that our classifier can be used in human enzyme identification effectively and efficiently and can help to understand their functions and develop new drugs.


2016 ◽  
Vol 2016 ◽  
pp. 1-5 ◽  
Author(s):  
Yun Wu ◽  
Yufei Zheng ◽  
Hua Tang

Conotoxins are a kind of neurotoxin which can specifically interact with potassium, sodium type, and calcium channels. They have become potential drug candidates to treat diseases such as chronic pain, epilepsy, and cardiovascular diseases. Thus, correctly identifying the types of ion channel-targeted conotoxins will provide important clue to understand their function and find potential drugs. Based on this consideration, we developed a new computational method to rapidly and accurately predict the types of ion-targeted conotoxins. Three kinds of new properties of residues were proposed to use in pseudo amino acid composition to formulate conotoxins samples. The support vector machine was utilized as classifier. A feature selection technique based onF-score was used to optimize features. Jackknife cross-validated results showed that the overall accuracy of 94.6% was achieved, which is higher than other published results, demonstrating that the proposed method is superior to published methods. Hence the current method may play a complementary role to other existing methods for recognizing the types of ion-target conotoxins.


Sign in / Sign up

Export Citation Format

Share Document