A Novel Amino Acid Sequence-based Computational Approach to Predicting Cell-penetrating Peptides

2019 ◽  
Vol 15 (3) ◽  
pp. 206-211 ◽  
Author(s):  
Jihui Tang ◽  
Jie Ning ◽  
Xiaoyan Liu ◽  
Baoming Wu ◽  
Rongfeng Hu

<P>Introduction: Machine Learning is a useful tool for the prediction of cell-penetration compounds as drug candidates. </P><P> Materials and Methods: In this study, we developed a novel method for predicting Cell-Penetrating Peptides (CPPs) membrane penetrating capability. For this, we used orthogonal encoding to encode amino acid and each amino acid position as one variable. Then a software of IBM spss modeler and a dataset including 533 CPPs, were used for model screening. </P><P> Results: The results indicated that the machine learning model of Support Vector Machine (SVM) was suitable for predicting membrane penetrating capability. For improvement, the three CPPs with the most longer lengths were used to predict CPPs. The penetration capability can be predicted with an accuracy of close to 95%. </P><P> Conclusion: All the results indicated that by using amino acid position as a variable can be a perspective method for predicting CPPs membrane penetrating capability.</P>

Life ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 866
Author(s):  
Sony Hartono Wijaya ◽  
Farit Mochamad Afendi ◽  
Irmanida Batubara ◽  
Ming Huang ◽  
Naoaki Ono ◽  
...  

Background: We performed in silico prediction of the interactions between compounds of Jamu herbs and human proteins by utilizing data-intensive science and machine learning methods. Verifying the proteins that are targeted by compounds of natural herbs will be helpful to select natural herb-based drug candidates. Methods: Initially, data related to compounds, target proteins, and interactions between them were collected from open access databases. Compounds are represented by molecular fingerprints, whereas amino acid sequences are represented by numerical protein descriptors. Then, prediction models that predict the interactions between compounds and target proteins were constructed using support vector machine and random forest. Results: A random forest model constructed based on MACCS fingerprint and amino acid composition obtained the highest accuracy. We used the best model to predict target proteins for 94 important Jamu compounds and assessed the results by supporting evidence from published literature and other sources. There are 27 compounds that can be validated by professional doctors, and those compounds belong to seven efficacy groups. Conclusion: By comparing the efficacy of predicted compounds and the relations of the targeted proteins with diseases, we found that some compounds might be considered as drug candidates.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ewerton Cristhian Lima de Oliveira ◽  
Kauê Santana ◽  
Luiz Josino ◽  
Anderson Henrique Lima e Lima ◽  
Claudomiro de Souza de Sales Júnior

AbstractCell-penetrating peptides (CPPs) are naturally able to cross the lipid bilayer membrane that protects cells. These peptides share common structural and physicochemical properties and show different pharmaceutical applications, among which drug delivery is the most important. Due to their ability to cross the membranes by pulling high-molecular-weight polar molecules, they are termed Trojan horses. In this study, we proposed a machine learning (ML)-based framework named BChemRF-CPPred (beyondchemicalrules-basedframework forCPP prediction) that uses an artificial neural network, a support vector machine, and a Gaussian process classifier to differentiate CPPs from non-CPPs, using structure- and sequence-based descriptors extracted from PDB and FASTA formats. The performance of our algorithm was evaluated by tenfold cross-validation and compared with those of previously reported prediction tools using an independent dataset. The BChemRF-CPPred satisfactorily identified CPP-like structures using natural and synthetic modified peptide libraries and also obtained better performance than those of previously reported ML-based algorithms, reaching the independent test accuracy of 90.66% (AUC = 0.9365) for PDB, and an accuracy of 86.5% (AUC = 0.9216) for FASTA input. Moreover, our analyses of the CPP chemical space demonstrated that these peptides break some molecular rules related to the prediction of permeability of therapeutic molecules in cell membranes. This is the first comprehensive analysis to predict synthetic and natural CPP structures and to evaluate their chemical space using an ML-based framework. Our algorithm is freely available for academic use at http://comptools.linc.ufpa.br/BChemRF-CPPred.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10381
Author(s):  
Rohit Nandakumar ◽  
Valentin Dinu

Throughout the history of drug discovery, an enzymatic-based approach for identifying new drug molecules has been primarily utilized. Recently, protein–protein interfaces that can be disrupted to identify small molecules that could be viable targets for certain diseases, such as cancer and the human immunodeficiency virus, have been identified. Existing studies computationally identify hotspots on these interfaces, with most models attaining accuracies of ~70%. Many studies do not effectively integrate information relating to amino acid chains and other structural information relating to the complex. Herein, (1) a machine learning model has been created and (2) its ability to integrate multiple features, such as those associated with amino-acid chains, has been evaluated to enhance the ability to predict protein–protein interface hotspots. Virtual drug screening analysis of a set of hotspots determined on the EphB2-ephrinB2 complex has also been performed. The predictive capabilities of this model offer an AUROC of 0.842, sensitivity/recall of 0.833, and specificity of 0.850. Virtual screening of a set of hotspots identified by the machine learning model developed in this study has identified potential medications to treat diseases caused by the overexpression of the EphB2-ephrinB2 complex, including prostate, gastric, colorectal and melanoma cancers which are linked to EphB2 mutations. The efficacy of this model has been demonstrated through its successful ability to predict drug-disease associations previously identified in literature, including cimetidine, idarubicin, pralatrexate for these conditions. In addition, nadolol, a beta blocker, has also been identified in this study to bind to the EphB2-ephrinB2 complex, and the possibility of this drug treating multiple cancers is still relatively unexplored.


2018 ◽  
Vol 17 (8) ◽  
pp. 2715-2726 ◽  
Author(s):  
Balachandran Manavalan ◽  
Sathiyamoorthy Subramaniyam ◽  
Tae Hwan Shin ◽  
Myeong Ok Kim ◽  
Gwang Lee

2021 ◽  
Vol 28 ◽  
Author(s):  
Yuyang Xue ◽  
Xiucai Ye ◽  
Lesong Wei ◽  
Xin Zhang ◽  
Tetsuya Sakurai ◽  
...  

: With its superior performance, the Transformer model, which is based on the 'Encoder-Decoder' paradigm, has become the mainstream in natural language processing. On the other hand, bioinformatics has embraced machine learning and made great progress in drug design and protein property prediction. Cell-penetrating peptides (CPPs) are one kind of permeable protein that is convenient as a kind of 'postman' in drug penetration tasks. However, a small number of CPPs have been discovered by research, let alone practical applications in drug permeability. Therefore, correctly identifying the CPPs has opened up a new way to take macromolecules into cells without other potentially harmful materials in the drug. Most of the previous work only uses trivial machine learning techniques and hand-crafted features to construct a simple classifier. In CPPFormer, we learn from the idea of implementing the attention structure of Transformer, rebuilding the network based on the characteristics of CPPs according to its short length, and using an automatic feature extractor with a few manual engineered features to co-direct the predicted results. Compared to all previous methods and other classic text classification models, the empirical result has shown that our proposed deep model-based method has achieved the best performance of 92.16% accuracy in the CPP924 dataset and has passed various index tests.


Author(s):  
Monalisa Ghosh ◽  
Chetna Singhal

Video streaming services top the internet traffic surging forward a competitive environment to impart best quality of experience (QoE) to the users. The standard codecs utilized in video transmission systems eliminate the spatiotemporal redundancies in order to decrease the bandwidth requirement. This may adversely affect the perceptual quality of videos. To rate a video quality both subjective and objective parameters can be used. So, it is essential to construct frameworks which will measure integrity of video just like humans. This chapter focuses on application of machine learning to evaluate the QoE without requiring human efforts with higher accuracy of 86% and 91% employing the linear and support vector regression respectively. Machine learning model is developed to forecast the subjective quality of H.264 videos obtained after streaming through wireless networks from the subjective scores.


2020 ◽  
Vol 9 (2) ◽  
pp. 343 ◽  
Author(s):  
Arash Kia ◽  
Prem Timsina ◽  
Himanshu N. Joshi ◽  
Eyal Klang ◽  
Rohit R. Gupta ◽  
...  

Early detection of patients at risk for clinical deterioration is crucial for timely intervention. Traditional detection systems rely on a limited set of variables and are unable to predict the time of decline. We describe a machine learning model called MEWS++ that enables the identification of patients at risk of escalation of care or death six hours prior to the event. A retrospective single-center cohort study was conducted from July 2011 to July 2017 of adult (age > 18) inpatients excluding psychiatric, parturient, and hospice patients. Three machine learning models were trained and tested: random forest (RF), linear support vector machine, and logistic regression. We compared the models’ performance to the traditional Modified Early Warning Score (MEWS) using sensitivity, specificity, and Area Under the Curve for Receiver Operating Characteristic (AUC-ROC) and Precision-Recall curves (AUC-PR). The primary outcome was escalation of care from a floor bed to an intensive care or step-down unit, or death, within 6 h. A total of 96,645 patients with 157,984 hospital encounters and 244,343 bed movements were included. Overall rate of escalation or death was 3.4%. The RF model had the best performance with sensitivity 81.6%, specificity 75.5%, AUC-ROC of 0.85, and AUC-PR of 0.37. Compared to traditional MEWS, sensitivity increased 37%, specificity increased 11%, and AUC-ROC increased 14%. This study found that using machine learning and readily available clinical data, clinical deterioration or death can be predicted 6 h prior to the event. The model we developed can warn of patient deterioration hours before the event, thus helping make timely clinical decisions.


Sensors ◽  
2020 ◽  
Vol 20 (22) ◽  
pp. 6491
Author(s):  
Le Zhang ◽  
Jeyan Thiyagalingam ◽  
Anke Xue ◽  
Shuwen Xu

Classification of clutter, especially in the context of shore based radars, plays a crucial role in several applications. However, the task of distinguishing and classifying the sea clutter from land clutter has been historically performed using clutter models and/or coastal maps. In this paper, we propose two machine learning, particularly neural network, based approaches for sea-land clutter separation, namely the regularized randomized neural network (RRNN) and the kernel ridge regression neural network (KRR). We use a number of features, such as energy variation, discrete signal amplitude change frequency, autocorrelation performance, and other statistical characteristics of the respective clutter distributions, to improve the performance of the classification. Our evaluation based on a unique mixed dataset, which is comprised of partially synthetic clutter data for land and real clutter data from sea, offers improved classification accuracy. More specifically, the RRNN and KRR methods offer 98.50% and 98.75% accuracy, outperforming the conventional support vector machine and extreme learning based solutions.


RSC Advances ◽  
2019 ◽  
Vol 9 (59) ◽  
pp. 34196-34206
Author(s):  
Zhe Li ◽  
Shunhao Huang ◽  
Juan Chen

Establish soft measurement model of total chlorine: cyclic voltammetry curves, principal component analysis and support vector regression.


2016 ◽  
Vol 2016 ◽  
pp. 1-5 ◽  
Author(s):  
Yun Wu ◽  
Yufei Zheng ◽  
Hua Tang

Conotoxins are a kind of neurotoxin which can specifically interact with potassium, sodium type, and calcium channels. They have become potential drug candidates to treat diseases such as chronic pain, epilepsy, and cardiovascular diseases. Thus, correctly identifying the types of ion channel-targeted conotoxins will provide important clue to understand their function and find potential drugs. Based on this consideration, we developed a new computational method to rapidly and accurately predict the types of ion-targeted conotoxins. Three kinds of new properties of residues were proposed to use in pseudo amino acid composition to formulate conotoxins samples. The support vector machine was utilized as classifier. A feature selection technique based onF-score was used to optimize features. Jackknife cross-validated results showed that the overall accuracy of 94.6% was achieved, which is higher than other published results, demonstrating that the proposed method is superior to published methods. Hence the current method may play a complementary role to other existing methods for recognizing the types of ion-target conotoxins.


Sign in / Sign up

Export Citation Format

Share Document