PASSer2.0: Accurate Prediction of Protein Allosteric Sites Through Automated Machine Learning

Author(s):  
Sian Xiao ◽  
Hao Tian ◽  
Peng Tao

Allostery is a fundamental process in regulating proteins’ activity. The discovery, design and development of allosteric drugs demand for better identification of allosteric sites. Several computational methods have been developed previously to predict allosteric sites using static pocket features and protein dynamics. Here, we present a computational model using automated machine learning for allosteric site prediction. Our model, PASSer2.0, advanced the previous results and performed well across multiple indicators with 89.2% of allosteric pockets appeared among the top 3 positions. The trained machine learning model has been integrated with the Protein Allosteric Sites Server (https://passer.smu.edu) to facilitate allosteric drug discovery.

2021 ◽  
Author(s):  
Sian Xiao ◽  
Hao Tian ◽  
Peng Tao

Allostery is a fundamental process in regulating proteins’ activity. The discovery, design and development of allosteric drugs demand for better identification of allosteric sites. Several computational methods have been developed previously to predict allosteric sites using static pocket features and protein dynamics. Here, we present a computational model using automated machine learning for allosteric site prediction. Our model, PASSer2.0, advanced the previous results and performed well across multiple indicators with 89.2% of allosteric pockets appeared among the top 3 positions. The trained machine learning model has been integrated with the Protein Allosteric Sites Server (https://passer.smu.edu) to facilitate allosteric drug discovery.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Milos Kotlar ◽  
Marija Punt ◽  
Zaharije Radivojevic ◽  
Milos Cvetanovic ◽  
Veljko Milutinovic

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10381
Author(s):  
Rohit Nandakumar ◽  
Valentin Dinu

Throughout the history of drug discovery, an enzymatic-based approach for identifying new drug molecules has been primarily utilized. Recently, protein–protein interfaces that can be disrupted to identify small molecules that could be viable targets for certain diseases, such as cancer and the human immunodeficiency virus, have been identified. Existing studies computationally identify hotspots on these interfaces, with most models attaining accuracies of ~70%. Many studies do not effectively integrate information relating to amino acid chains and other structural information relating to the complex. Herein, (1) a machine learning model has been created and (2) its ability to integrate multiple features, such as those associated with amino-acid chains, has been evaluated to enhance the ability to predict protein–protein interface hotspots. Virtual drug screening analysis of a set of hotspots determined on the EphB2-ephrinB2 complex has also been performed. The predictive capabilities of this model offer an AUROC of 0.842, sensitivity/recall of 0.833, and specificity of 0.850. Virtual screening of a set of hotspots identified by the machine learning model developed in this study has identified potential medications to treat diseases caused by the overexpression of the EphB2-ephrinB2 complex, including prostate, gastric, colorectal and melanoma cancers which are linked to EphB2 mutations. The efficacy of this model has been demonstrated through its successful ability to predict drug-disease associations previously identified in literature, including cimetidine, idarubicin, pralatrexate for these conditions. In addition, nadolol, a beta blocker, has also been identified in this study to bind to the EphB2-ephrinB2 complex, and the possibility of this drug treating multiple cancers is still relatively unexplored.


Author(s):  
Xinyi Liu ◽  
Shaoyong Lu ◽  
Kun Song ◽  
Qiancheng Shen ◽  
Duan Ni ◽  
...  

Abstract Allosteric regulation is one of the most direct and efficient ways to fine-tune protein function; it is induced by the binding of a ligand at an allosteric site that is topographically distinct from an orthosteric site. The Allosteric Database (ASD, available online at http://mdl.shsmu.edu.cn/ASD) was developed ten years ago to provide comprehensive information related to allosteric regulation. In recent years, allosteric regulation has received great attention in biological research, bioengineering, and drug discovery, leading to the emergence of entire allosteric landscapes as allosteromes. To facilitate research from the perspective of the allosterome, in ASD 2019, novel features were curated as follows: (i) >10 000 potential allosteric sites of human proteins were deposited for allosteric drug discovery; (ii) 7 human allosterome maps, including protease and ion channel maps, were built to reveal allosteric evolution within families; (iii) 1312 somatic missense mutations at allosteric sites were collected from patient samples from 33 cancer types and (iv) 1493 pharmacophores extracted from allosteric sites were provided for modulator screening. Over the past ten years, the ASD has become a central resource for studying allosteric regulation and will play more important roles in both target identification and allosteric drug discovery in the future.


2020 ◽  
Author(s):  
Hao Tian ◽  
Xi Jiang ◽  
Peng Tao

Allostery is considered important in regulating protein's activity. Drug development depends on the understanding of allosteric mechanisms, especially the identification of allosteric sites, which is prerequisite in drug discovery and design. Many computational methods have been developed for allosteric site prediction using pocket features and dynamics information. Here, we provide a novel ensembled model, consisting of eXtreme gradient boosting (XGBoost) and graph convolutional neural network (GCNN) to predict allosteric sites. Our model can learn both physical properties and topology structure without any prior information and exhibited good performance under several indicators. Prediction results have shown that 84.9% of allosteric pockets in the testing proteins appeared in the top 3 positions. The PASSer: Protein Allosteric Sites Server (https://passer.smu.edu), along with a command line interface (CLI, https://github.com/smutaogroup/passerCLI) provide insights for further analysis in drug discovery.


2020 ◽  
Author(s):  
Charalambos Themistocleous ◽  
Bronte Ficek ◽  
Kimberly Webster ◽  
Dirk-Bart den Ouden ◽  
Argye E. Hillis ◽  
...  

AbstractBackgroundThe classification of patients with Primary Progressive Aphasia (PPA) into variants is time-consuming, costly, and requires combined expertise by clinical neurologists, neuropsychologists, speech pathologists, and radiologists.ObjectiveThe aim of the present study is to determine whether acoustic and linguistic variables provide accurate classification of PPA patients into one of three variants: nonfluent PPA, semantic PPA, and logopenic PPA.MethodsIn this paper, we present a machine learning model based on Deep Neural Networks (DNN) for the subtyping of patients with PPA into three main variants, using combined acoustic and linguistic information elicited automatically via acoustic and linguistic analysis. The performance of the DNN was compared to the classification accuracy of Random Forests, Support Vector Machines, and Decision Trees, as well as expert clinicians’ classifications.ResultsThe DNN model outperformed the other machine learning models with 80% classification accuracy, providing reliable subtyping of patients with PPA into variants and it even outperformed auditory classification of patients into variants by clinicians.ConclusionsWe show that the combined speech and language markers from connected speech productions provide information about symptoms and variant subtyping in PPA. The end-to-end automated machine learning approach we present can enable clinicians and researchers to provide an easy, quick and inexpensive classification of patients with PPA.


2018 ◽  
Vol 59 (3) ◽  
pp. 1221-1229 ◽  
Author(s):  
Antonius P. A. Janssen ◽  
Sebastian H. Grimm ◽  
Ruud H. M. Wijdeven ◽  
Eelke B. Lenselink ◽  
Jacques Neefjes ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Abolfazl Zargari Khuzani ◽  
Morteza Heidari ◽  
S. Ali Shariati

AbstractChest-X ray (CXR) radiography can be used as a first-line triage process for non-COVID-19 patients with pneumonia. However, the similarity between features of CXR images of COVID-19 and pneumonia caused by other infections makes the differential diagnosis by radiologists challenging. We hypothesized that machine learning-based classifiers can reliably distinguish the CXR images of COVID-19 patients from other forms of pneumonia. We used a dimensionality reduction method to generate a set of optimal features of CXR images to build an efficient machine learning classifier that can distinguish COVID-19 cases from non-COVID-19 cases with high accuracy and sensitivity. By using global features of the whole CXR images, we successfully implemented our classifier using a relatively small dataset of CXR images. We propose that our COVID-Classifier can be used in conjunction with other tests for optimal allocation of hospital resources by rapid triage of non-COVID-19 cases.


2021 ◽  
pp. 1-10
Author(s):  
Charalambos Themistocleous ◽  
Bronte Ficek ◽  
Kimberly Webster ◽  
Dirk-Bart den Ouden ◽  
Argye E. Hillis ◽  
...  

Background: The classification of patients with primary progressive aphasia (PPA) into variants is time-consuming, costly, and requires combined expertise by clinical neurologists, neuropsychologists, speech pathologists, and radiologists. Objective: The aim of the present study is to determine whether acoustic and linguistic variables provide accurate classification of PPA patients into one of three variants: nonfluent PPA, semantic PPA, and logopenic PPA. Methods: In this paper, we present a machine learning model based on deep neural networks (DNN) for the subtyping of patients with PPA into three main variants, using combined acoustic and linguistic information elicited automatically via acoustic and linguistic analysis. The performance of the DNN was compared to the classification accuracy of Random Forests, Support Vector Machines, and Decision Trees, as well as to expert clinicians’ classifications. Results: The DNN model outperformed the other machine learning models as well as expert clinicians’ classifications with 80% classification accuracy. Importantly, 90% of patients with nfvPPA and 95% of patients with lvPPA was identified correctly, providing reliable subtyping of these patients into their corresponding PPA variants. Conclusion: We show that the combined speech and language markers from connected speech productions can inform variant subtyping in patients with PPA. The end-to-end automated machine learning approach we present can enable clinicians and researchers to provide an easy, quick, and inexpensive classification of patients with PPA.


Sign in / Sign up

Export Citation Format

Share Document