Can We Rely on Computational Predictions To Correctly Identify Ligand Binding Sites on Novel Protein Drug Targets? Assessment of Binding Site Prediction Methods and a Protocol for Validation of Predicted Binding Sites

2016 ◽  
Vol 75 (1) ◽  
pp. 15-23 ◽  
Author(s):  
Neal K. Broomhead ◽  
Mahmoud E. Soliman
2022 ◽  
Author(s):  
Adam Zemla ◽  
Jonathan E. Allen ◽  
Dan Kirshner ◽  
Felice C. Lightstone

We present a structure-based method for finding and evaluating structural similarities in protein regions relevant to ligand binding. PDBspheres comprises an exhaustive library of protein structure regions (spheres) adjacent to complexed ligands derived from the Protein Data Bank (PDB), along with methods to find and evaluate structural matches between a protein of interest and spheres in the library. Currently, PDBspheres library contains more than 2 million spheres, organized to facilitate searches by sequence and/or structure similarity of protein-ligand binding sites or interfaces between interacting molecules. PDBspheres uses the LGA structure alignment algorithm as the main engine for detecting structure similarities between the protein of interest and library spheres. An all-atom structure similarity metric ensures that sidechain placement is taken into account in the PDBspheres primary assessment of confidence in structural matches. In this paper, we (1) describe the PDBspheres method, (2) demonstrate how PDBspheres can be used to detect and characterize binding sites in protein structures, (3) compare PDBspheres use for binding site prediction with seven other binding site prediction methods using a curated dataset of 2,528 ligand-bound and ligand-free crystal structures, and (4) use PDBspheres to cluster pockets and assess structural similarities among protein binding sites of the 4,876 structures in the refined set of PDBbind 2019 dataset. The PDBspheres library is made publicly available for download at https://proteinmodel.org/AS2TS/PDBspheres


2019 ◽  
Vol 47 (W1) ◽  
pp. W345-W349 ◽  
Author(s):  
Lukas Jendele ◽  
Radoslav Krivak ◽  
Petr Skoda ◽  
Marian Novotny ◽  
David Hoksza

AbstractPrankWeb is an online resource providing an interface to P2Rank, a state-of-the-art method for ligand binding site prediction. P2Rank is a template-free machine learning method based on the prediction of local chemical neighborhood ligandability centered on points placed on a solvent-accessible protein surface. Points with a high ligandability score are then clustered to form the resulting ligand binding sites. In addition, PrankWeb provides a web interface enabling users to easily carry out the prediction and visually inspect the predicted binding sites via an integrated sequence-structure view. Moreover, PrankWeb can determine sequence conservation for the input molecule and use this in both the prediction and result visualization steps. Alongside its online visualization options, PrankWeb also offers the possibility of exporting the results as a PyMOL script for offline visualization. The web frontend communicates with the server side via a REST API. In high-throughput scenarios, therefore, users can utilize the server API directly, bypassing the need for a web-based frontend or installation of the P2Rank application. PrankWeb is available at http://prankweb.cz/, while the web application source code and the P2Rank method can be accessed at https://github.com/jendelel/PrankWebApp and https://github.com/rdk/p2rank, respectively.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Mingjian Jiang ◽  
Zhen Li ◽  
Yujie Bian ◽  
Zhiqiang Wei

Abstract Background Binding sites are the pockets of proteins that can bind drugs; the discovery of these pockets is a critical step in drug design. With the help of computers, protein pockets prediction can save manpower and financial resources. Results In this paper, a novel protein descriptor for the prediction of binding sites is proposed. Information on non-bonded interactions in the three-dimensional structure of a protein is captured by a combination of geometry-based and energy-based methods. Moreover, due to the rapid development of deep learning, all binding features are extracted to generate three-dimensional grids that are fed into a convolution neural network. Two datasets were introduced into the experiment. The sc-PDB dataset was used for descriptor extraction and binding site prediction, and the PDBbind dataset was used only for testing and verification of the generalization of the method. The comparison with previous methods shows that the proposed descriptor is effective in predicting the binding sites. Conclusions A new protein descriptor is proposed for the prediction of the drug binding sites of proteins. This method combines the three-dimensional structure of a protein and non-bonded interactions with small molecules to involve important factors influencing the formation of binding site. Analysis of the experiments indicates that the descriptor is robust for site prediction.


2016 ◽  
Vol 82 (9) ◽  
pp. 2819-2832 ◽  
Author(s):  
Rongsui Gao ◽  
Jingxia Lin ◽  
Han Zhang ◽  
Youjun Feng

ABSTRACTRecently, our group along with others reported that theVibrioFadR regulatory protein is unusual in that, unlike the prototypicalfadRproduct ofEscherichia coli, which has only one ligand-binding site,VibrioFadR has two ligand-binding sites and represents a new mechanism for fatty acid sensing. The promoter region of thevc2105gene, encoding a putative thioesterase, was mapped, and a putative FadR-binding site (AA CTG GTA AGA GCA CTT) was proposed. Different versions of the FadR regulatory proteins were prepared and purified to homogeneity. Both electrophoretic mobility shift assay (EMSA) and surface plasmon resonance (SPR) determined the direct interaction of thevc2105gene with FadR proteins of various origins. Further, EMSAs illustrated that the addition of long-chain acyl-coenzyme A (CoA) species efficiently dissociates thevc2105promoter from the FadR regulator. The expression level of theVibrio cholerae vc2105gene was elevated 2- to 3-fold in afadRnull mutant strain, validating that FadR is a repressor for thevc2105gene. The β-galactosidase activity of avc2105-lacZtranscriptional fusion was increased over 2-fold upon supplementation of growth medium with oleic acid. Unlike thefadDgene, a member of theVibrio fadregulon, the VC2105 protein played no role in bacterial growth and virulence-associated gene expression ofctxAB(cholera toxin A/B) andtcpA(toxin coregulated pilus A). Given that the transcriptional regulation ofvc2105fits the criteria for fatty acid degradation (fad) genes, we suggested that it is a new member of theVibrio fadregulon.IMPORTANCETheVibrioFadR regulator is unusual in that it has two ligand-binding sites. Different versions of the FadR regulatory proteins were prepared and characterizedin vitroandin vivo. An auxiliaryfadgene (vc2105) fromVibriowas proposed that encodes a putative thioesterase and has a predicted FadR-binding site (AAC TGG TA A GAG CAC TT). The function of this putative binding site was proved using both EMSA and SPR. Furtherin vitroandin vivoexperiments revealed that theVibrioFadR is a repressor for thevc2105gene. UnlikefadD, a member of theVibrio fadregulon, VC2105 played no role in bacterial growth and expression of the two virulence-associated genes (ctxABandtcpA). Therefore, since transcriptional regulation ofvc2105fits the criteria forfadgenes, it seems likely thatvc2105acts as a new auxiliary member of theVibrio fadregulon.


PLoS ONE ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. e0246181
Author(s):  
Matthew R. Freidel ◽  
Roger S. Armen

The 2019 emergence of, SARS-CoV-2 has tragically taken an immense toll on human life and far reaching impacts on society. There is a need to identify effective antivirals with diverse mechanisms of action in order to accelerate preclinical development. This study focused on five of the most established drug target proteins for direct acting small molecule antivirals: Nsp5 Main Protease, Nsp12 RNA-dependent RNA polymerase, Nsp13 Helicase, Nsp16 2’-O methyltransferase and the S2 subunit of the Spike protein. A workflow of solvent mapping and free energy calculations was used to identify and characterize favorable small-molecule binding sites for an aromatic pharmacophore (benzene). After identifying the most favorable sites, calculated ligand efficiencies were compared utilizing computational fragment screening. The most favorable sites overall were located on Nsp12 and Nsp16, whereas the most favorable sites for Nsp13 and S2 Spike had comparatively lower ligand efficiencies relative to Nsp12 and Nsp16. Utilizing fragment screening on numerous possible sites on Nsp13 helicase, we identified a favorable allosteric site on the N-terminal zinc binding domain (ZBD) that may be amenable to virtual or biophysical fragment screening efforts. Recent structural studies of the Nsp12:Nsp13 replication-transcription complex experimentally corroborates ligand binding at this site, which is revealed to be a functional Nsp8:Nsp13 protein-protein interaction site in the complex. Detailed structural analysis of Nsp13 ZBD conformations show the role of induced-fit flexibility in this ligand binding site and identify which conformational states are associated with efficient ligand binding. We hope that this map of over 200 possible small-molecule binding sites for these drug targets may be of use for ongoing discovery, design, and drug repurposing efforts. This information may be used to prioritize screening efforts or aid in the process of deciphering how a screening hit may bind to a specific target protein.


2018 ◽  
Vol 14 (2) ◽  
Author(s):  
Daniele Toti ◽  
Gabriele Macari ◽  
Fabio Polticelli

Abstract After the onset of the genomic era, the detection of ligand binding sites in proteins has emerged over the last few years as a powerful tool for protein function prediction. Several approaches, both sequence and structure based, have been developed, but the full potential of the corresponding tools has not been exploited yet. Here, we describe the development and classification of a large, almost exhaustive, collection of protein-ligand binding sites to be used, in conjunction with the Ligand Binding Site Recognition Application Web Application developed in our laboratory, as an alternative to virtual screening through molecular docking simulations to identify novel lead compounds for known targets. Ligand binding sites derived from the Protein Data Bank have been clustered according to ligand similarity, and given a known ligand, the binding mode of related ligands to the same target can be predicted. The collection of ligand binding sites contains more than 200,000 sites corresponding to more than 20,000 different ligands. Furthermore, the ligand binding sites of all Food and Drug Administration-approved drugs have been classified as well, allowing to investigate the possible binding of each of them (and related compounds) to a given target for drug repurposing and redesign initiatives. Sample usage cases are also described to demonstrate the effectiveness of this approach.


Genes ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 965 ◽  
Author(s):  
Ziqi Zhao ◽  
Yonghong Xu ◽  
Yong Zhao

The prediction of protein–ligand binding sites is important in drug discovery and drug design. Protein–ligand binding site prediction computational methods are inexpensive and fast compared with experimental methods. This paper proposes a new computational method, SXGBsite, which includes the synthetic minority over-sampling technique (SMOTE) and the Extreme Gradient Boosting (XGBoost). SXGBsite uses the position-specific scoring matrix discrete cosine transform (PSSM-DCT) and predicted solvent accessibility (PSA) to extract features containing sequence information. A new balanced dataset was generated by SMOTE to improve classifier performance, and a prediction model was constructed using XGBoost. The parallel computing and regularization techniques enabled high-quality and fast predictions and mitigated overfitting caused by SMOTE. An evaluation using 12 different types of ligand binding site independent test sets showed that SXGBsite performs similarly to the existing methods on eight of the independent test sets with a faster computation time. SXGBsite may be applied as a complement to biological experiments.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i726-i734
Author(s):  
Charles A Santana ◽  
Sabrina de A Silveira ◽  
João P A Moraes ◽  
Sandro C Izidoro ◽  
Raquel C de Melo-Minardi ◽  
...  

Abstract Motivation The discovery of protein–ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein–ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost. Results We proposed Graph-based Residue neighborhood Strategy to Predict binding sites (GRaSP), a novel residue centric and scalable method to predict ligand-binding site residues. It is based on a supervised learning strategy that models the residue environment as a graph at the atomic level. Results show that GRaSP made compatible or superior predictions when compared with methods described in the literature. GRaSP outperformed six other residue-centric methods, including the one considered as state-of-the-art. Also, our method achieved better results than the method from CAMEO independent assessment. GRaSP ranked second when compared with five state-of-the-art pocket-centric methods, which we consider a significant result, as it was not devised to predict pockets. Finally, our method proved scalable as it took 10–20 s on average to predict the binding site for a protein complex whereas the state-of-the-art residue-centric method takes 2–5 h on average. Availability and implementation The source code and datasets are available at https://github.com/charles-abreu/GRaSP. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document