GRaSP: a graph-based residue neighborhood strategy to predict binding sites

Charles A Santana; Sabrina de A Silveira; João P A Moraes; Sandro C Izidoro; Raquel C de Melo-Minardi; António J M Ribeiro; Jonathan D Tyzack; Neera Borkakoti; Janet M Thornton

doi:10.1093/bioinformatics/btaa805

GRaSP: a graph-based residue neighborhood strategy to predict binding sites

Bioinformatics ◽

10.1093/bioinformatics/btaa805 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i726-i734

Author(s):

Charles A Santana ◽

Sabrina de A Silveira ◽

João P A Moraes ◽

Sandro C Izidoro ◽

Raquel C de Melo-Minardi ◽

...

Keyword(s):

Ligand Binding ◽

Binding Site ◽

Binding Sites ◽

Protein Function ◽

Learning Strategy ◽

State Of The Art ◽

Supplementary Information ◽

Major Step ◽

Ligand Binding Sites ◽

Residue Neighborhood

Abstract Motivation The discovery of protein–ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein–ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost. Results We proposed Graph-based Residue neighborhood Strategy to Predict binding sites (GRaSP), a novel residue centric and scalable method to predict ligand-binding site residues. It is based on a supervised learning strategy that models the residue environment as a graph at the atomic level. Results show that GRaSP made compatible or superior predictions when compared with methods described in the literature. GRaSP outperformed six other residue-centric methods, including the one considered as state-of-the-art. Also, our method achieved better results than the method from CAMEO independent assessment. GRaSP ranked second when compared with five state-of-the-art pocket-centric methods, which we consider a significant result, as it was not devised to predict pockets. Finally, our method proved scalable as it took 10–20 s on average to predict the binding site for a protein complex whereas the state-of-the-art residue-centric method takes 2–5 h on average. Availability and implementation The source code and datasets are available at https://github.com/charles-abreu/GRaSP. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BionoiNet: ligand-binding site classification with off-the-shelf deep neural network

Bioinformatics ◽

10.1093/bioinformatics/btaa094 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3077-3083

Author(s):

Wentao Shi ◽

Jeffrey M Lemoine ◽

Abd-El-Monsif A Shawky ◽

Manali Singha ◽

Limeng Pu ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Ligand Binding ◽

Binding Sites ◽

Protein Structures ◽

Supplementary Information ◽

Heme Binding ◽

Unseen Data ◽

Ligand Binding Sites ◽

Binding Pockets

Abstract Motivation Fast and accurate classification of ligand-binding sites in proteins with respect to the class of binding molecules is invaluable not only to the automatic functional annotation of large datasets of protein structures but also to projects in protein evolution, protein engineering and drug development. Deep learning techniques, which have already been successfully applied to address challenging problems across various fields, are inherently suitable to classify ligand-binding pockets. Our goal is to demonstrate that off-the-shelf deep learning models can be employed with minimum development effort to recognize nucleotide- and heme-binding sites with a comparable accuracy to highly specialized, voxel-based methods. Results We developed BionoiNet, a new deep learning-based framework implementing a popular ResNet model for image classification. BionoiNet first transforms the molecular structures of ligand-binding sites to 2D Voronoi diagrams, which are then used as the input to a pretrained convolutional neural network classifier. The ResNet model generalizes well to unseen data achieving the accuracy of 85.6% for nucleotide- and 91.3% for heme-binding pockets. BionoiNet also computes significance scores of pocket atoms, called BionoiScores, to provide meaningful insights into their interactions with ligand molecules. BionoiNet is a lightweight alternative to computationally expensive 3D architectures. Availability and implementation BionoiNet is implemented in Python with the source code freely available at: https://github.com/CSBG-LSU/BionoiNet. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Recognizing ion ligand binding sites by SMO algorithm

BMC Molecular and Cell Biology ◽

10.1186/s12860-019-0237-9 ◽

2019 ◽

Vol 20 (S3) ◽

Cited By ~ 2

Author(s):

Shan Wang ◽

Xiuzhen Hu ◽

Zhenxing Feng ◽

Xiaojin Zhang ◽

Liu Liu ◽

...

Keyword(s):

Ligand Binding ◽

Binding Sites ◽

Protein Function ◽

Metal Ion ◽

Cross Validation ◽

Sequence Information ◽

Sequential Minimal Optimization ◽

Ligand Binding Sites ◽

Smo Algorithm ◽

Fold Cross Validation

Abstract Background In many important life activities, the execution of protein function depends on the interaction between proteins and ligands. As an important protein binding ligand, the identification of the binding site of the ion ligands plays an important role in the study of the protein function. Results In this study, four acid radical ion ligands (NO2−,CO32−,SO42−,PO43−) and ten metal ion ligands (Zn2+,Cu2+,Fe2+,Fe3+,Ca2+,Mg2+,Mn2+,Na+,K+,Co2+) are selected as the research object, and the Sequential minimal optimization (SMO) algorithm based on sequence information was proposed, better prediction results were obtained by 5-fold cross validation. Conclusions An efficient method for predicting ion ligand binding sites was presented.

Download Full-text

Transcriptional Repression of the VC2105 Protein by Vibrio FadR Suggests that It Is a New Auxiliary Member of thefadRegulon

Applied and Environmental Microbiology ◽

10.1128/aem.00293-16 ◽

2016 ◽

Vol 82 (9) ◽

pp. 2819-2832 ◽

Cited By ~ 6

Author(s):

Rongsui Gao ◽

Jingxia Lin ◽

Han Zhang ◽

Youjun Feng

Keyword(s):

Transcriptional Regulation ◽

Fatty Acid ◽

Ligand Binding ◽

Binding Site ◽

Binding Sites ◽

Regulatory Proteins ◽

Content Type ◽

Ligand Binding Sites

ABSTRACTRecently, our group along with others reported that theVibrioFadR regulatory protein is unusual in that, unlike the prototypicalfadRproduct ofEscherichia coli, which has only one ligand-binding site,VibrioFadR has two ligand-binding sites and represents a new mechanism for fatty acid sensing. The promoter region of thevc2105gene, encoding a putative thioesterase, was mapped, and a putative FadR-binding site (AA CTG GTA AGA GCA CTT) was proposed. Different versions of the FadR regulatory proteins were prepared and purified to homogeneity. Both electrophoretic mobility shift assay (EMSA) and surface plasmon resonance (SPR) determined the direct interaction of thevc2105gene with FadR proteins of various origins. Further, EMSAs illustrated that the addition of long-chain acyl-coenzyme A (CoA) species efficiently dissociates thevc2105promoter from the FadR regulator. The expression level of theVibrio cholerae vc2105gene was elevated 2- to 3-fold in afadRnull mutant strain, validating that FadR is a repressor for thevc2105gene. The β-galactosidase activity of avc2105-lacZtranscriptional fusion was increased over 2-fold upon supplementation of growth medium with oleic acid. Unlike thefadDgene, a member of theVibrio fadregulon, the VC2105 protein played no role in bacterial growth and virulence-associated gene expression ofctxAB(cholera toxin A/B) andtcpA(toxin coregulated pilus A). Given that the transcriptional regulation ofvc2105fits the criteria for fatty acid degradation (fad) genes, we suggested that it is a new member of theVibrio fadregulon.IMPORTANCETheVibrioFadR regulator is unusual in that it has two ligand-binding sites. Different versions of the FadR regulatory proteins were prepared and characterizedin vitroandin vivo. An auxiliaryfadgene (vc2105) fromVibriowas proposed that encodes a putative thioesterase and has a predicted FadR-binding site (AAC TGG TA A GAG CAC TT). The function of this putative binding site was proved using both EMSA and SPR. Furtherin vitroandin vivoexperiments revealed that theVibrioFadR is a repressor for thevc2105gene. UnlikefadD, a member of theVibrio fadregulon, VC2105 played no role in bacterial growth and expression of the two virulence-associated genes (ctxABandtcpA). Therefore, since transcriptional regulation ofvc2105fits the criteria forfadgenes, it seems likely thatvc2105acts as a new auxiliary member of theVibrio fadregulon.

Download Full-text

Protein-ligand binding site detection as an alternative route to molecular docking and drug repurposing

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2018-0004 ◽

2018 ◽

Vol 14 (2) ◽

Cited By ~ 1

Author(s):

Daniele Toti ◽

Gabriele Macari ◽

Fabio Polticelli

Keyword(s):

Molecular Docking ◽

Ligand Binding ◽

Binding Site ◽

Binding Sites ◽

Web Application ◽

Drug Repurposing ◽

Full Potential ◽

Ligand Binding Site ◽

Docking Simulations ◽

Ligand Binding Sites

Abstract After the onset of the genomic era, the detection of ligand binding sites in proteins has emerged over the last few years as a powerful tool for protein function prediction. Several approaches, both sequence and structure based, have been developed, but the full potential of the corresponding tools has not been exploited yet. Here, we describe the development and classification of a large, almost exhaustive, collection of protein-ligand binding sites to be used, in conjunction with the Ligand Binding Site Recognition Application Web Application developed in our laboratory, as an alternative to virtual screening through molecular docking simulations to identify novel lead compounds for known targets. Ligand binding sites derived from the Protein Data Bank have been clustered according to ligand similarity, and given a known ligand, the binding mode of related ligands to the same target can be predicted. The collection of ligand binding sites contains more than 200,000 sites corresponding to more than 20,000 different ligands. Furthermore, the ligand binding sites of all Food and Drug Administration-approved drugs have been classified as well, allowing to investigate the possible binding of each of them (and related compounds) to a given target for drug repurposing and redesign initiatives. Sample usage cases are also described to demonstrate the effectiveness of this approach.

Download Full-text

SXGBsite: Prediction of Protein–Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting

Genes ◽

10.3390/genes10120965 ◽

2019 ◽

Vol 10 (12) ◽

pp. 965 ◽

Cited By ~ 1

Author(s):

Ziqi Zhao ◽

Yonghong Xu ◽

Yong Zhao

Keyword(s):

Ligand Binding ◽

Binding Site ◽

Binding Sites ◽

Gradient Boosting ◽

Sequence Information ◽

Ligand Binding Site ◽

Extreme Gradient Boosting ◽

Ligand Binding Sites ◽

Independent Test ◽

Test Sets

The prediction of protein–ligand binding sites is important in drug discovery and drug design. Protein–ligand binding site prediction computational methods are inexpensive and fast compared with experimental methods. This paper proposes a new computational method, SXGBsite, which includes the synthetic minority over-sampling technique (SMOTE) and the Extreme Gradient Boosting (XGBoost). SXGBsite uses the position-specific scoring matrix discrete cosine transform (PSSM-DCT) and predicted solvent accessibility (PSA) to extract features containing sequence information. A new balanced dataset was generated by SMOTE to improve classifier performance, and a prediction model was constructed using XGBoost. The parallel computing and regularization techniques enabled high-quality and fast predictions and mitigated overfitting caused by SMOTE. An evaluation using 12 different types of ligand binding site independent test sets showed that SXGBsite performs similarly to the existing methods on eight of the independent test sets with a faster computation time. SXGBsite may be applied as a complement to biological experiments.

Download Full-text

De novo protein fold families expand the designable ligand binding site space

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009620 ◽

2021 ◽

Vol 17 (11) ◽

pp. e1009620

Author(s):

Xingjie Pan ◽

Tanja Kortemme

Keyword(s):

Ligand Binding ◽

Binding Site ◽

Binding Sites ◽

De Novo ◽

Protein Structures ◽

Protein Fold ◽

Protein Families ◽

Existing Structures ◽

Fold Family ◽

Ligand Binding Sites

A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place (“match”) these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein “scaffold”, which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.

Download Full-text

Can We Rely on Computational Predictions To Correctly Identify Ligand Binding Sites on Novel Protein Drug Targets? Assessment of Binding Site Prediction Methods and a Protocol for Validation of Predicted Binding Sites

Cell Biochemistry and Biophysics ◽

10.1007/s12013-016-0769-y ◽

2016 ◽

Vol 75 (1) ◽

pp. 15-23 ◽

Cited By ~ 20

Author(s):

Neal K. Broomhead ◽

Mahmoud E. Soliman

Keyword(s):

Ligand Binding ◽

Binding Site ◽

Binding Sites ◽

Drug Targets ◽

Prediction Methods ◽

Protein Drug ◽

Site Prediction ◽

Ligand Binding Sites ◽

Computational Predictions ◽

Novel Protein

Download Full-text

Assessment of Molecular Mechanics-based Zn2+ Models in Mono- and Bimetallic Ligand Binding Sites

10.1101/2021.06.28.450184 ◽

2021 ◽

Author(s):

Okke Melse ◽

Iris Antes

Keyword(s):

Ligand Binding ◽

Binding Site ◽

Molecular Mechanics ◽

Binding Sites ◽

Md Simulations ◽

Evaluation Study ◽

Induced Dipole ◽

Dummy Atom ◽

Ligand Binding Sites ◽

Biomolecular Simulations

Zn2+ ions play an important role in biology, but accurate sampling of metalloproteins using Molecular Mechanics remains challenging. Several models have been proposed to describe Zn2+ in biomolecular simulations, ranging from nonbonded models, employing classical 12-6 Lennard-Jones (LJ) potentials or extended LJ-potentials, to dummy-atom models and bonded models. We evaluated the performance of a large variety of these Zn2+ models in two challenging environments for which little is known about the performance of these methods, namely in a monometallic (Carbonic Anhydrase II) and a bimetallic ligand binding site (metallo-β-lactamase VIM-2). We focused on properties which are important for a stable, correct binding site description during molecular dynamics (MD) simulations, because a proper treatment of the metal coordination and forces are here essential. We observed that the strongest difference in performance of these Zn2+ models can be found in the description of interactions between Zn2+ and non-charged ligating atoms, such as the imidazole nitrogen in histidine residues. We further show that the nonbonded (12-6 LJ) models struggle most in the description of Zn2+-biomolecule interactions, while the inclusion of ion-induced dipole effects strongly improves the description between Zn2+ and non-charged ligating atoms. The octahedral dummy-atom models result in highly stable simulations and correct Zn2+ coordination, and are therefore highly suitable for binding sites containing an octahedral coordinated Zn2+ ion. The results from this evaluation study in ligand binding sites can guide structural studies of Zn2+ containing proteins, such as MD-refinement of docked ligand poses and long-term MD simulations.

Download Full-text

Mapping Co-regulatory Interactions among Ligand Binding sites in RyR1

10.22541/au.162006886.62484150/v1 ◽

2021 ◽

Author(s):

Venkat Chirasani ◽

Konstantin Popov ◽

Gerhard Meissner ◽

Nikolay Dokholyan

Keyword(s):

Ligand Binding ◽

Binding Site ◽

Binding Sites ◽

Pore Region ◽

Release Channel ◽

Regulatory Interactions ◽

Allosteric Control ◽

Skeletal Muscle Contraction ◽

Ligand Binding Sites ◽

Intracellular Calcium Ion

Ryanodine receptor 1 (RyR1) is an intracellular calcium ion (Ca2+) release channel required for skeletal muscle contraction. Although cryo-electron microscopy identified binding sites of three coactivators Ca2+, ATP and caffeine (CFF), the mechanism of co-regulation and synergy of these activators is unknown. Here, we report allosteric connections among the three ligand binding sites and pore region in (i) Ca2+ bound-closed, (ii) ATP/CFF bound- closed, (iii) Ca2+/ATP/CFF bound-closed, and (iv) Ca2+/ATP/CFF bound-open RyR1 states. We identified two dominant interactions that mediate interactions between the Ca2+ binding site and pore region in Ca2+ bound-closed state, which partially overlapped with the pore communications in ATP/CFF bound-closed RyR1 state. In Ca2+/ATP/CFF bound-closed and -open RyR1 states, co-regulatory interactions were analogous to communications in the Ca2+ bound-closed and ATP/CFF bound- closed states. Both ATP- and CFF- binding sites mediate communication between the Ca2+ binding site and the pore region in Ca2+/ATP/CFF bound - open RyR1 structure. We conclude that Ca2+, ATP, and CFF propagate their effects to the pore region through a network of overlapping interactions that mediate allosteric control and molecular synergy in channel regulation.

Download Full-text

De novo protein fold families expand the designable ligand binding site space

10.1101/2021.01.13.426598 ◽

2021 ◽

Author(s):

Xingjie Pan ◽

Tanja Kortemme

Keyword(s):

Ligand Binding ◽

Binding Site ◽

Binding Sites ◽

Method Development ◽

De Novo ◽

Protein Structures ◽

Protein Fold ◽

Ligand Binding Site ◽

Protein Families ◽

Ligand Binding Sites

AbstractA major challenge in designing proteins de novo to bind user-defined ligands with high specificity and affinity is finding backbones structures that can accommodate a desired binding site geometry with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place (“match”) these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.Author summaryDe novo design of proteins that can bind to novel and highly diverse user-defined small molecule ligands could have broad biomedical and synthetic biology applications. Because ligand binding site geometries need to be accommodated by protein backbone scaffolds at high accuracy, the diversity of scaffolds is a major limitation for designing new ligand binding functions. Advances in computational protein structure design methods have significantly increased the number of accessible stable scaffold structures. Understanding how many new ligand binding sites can be accommodated by the de novo scaffolds is important for designing novel ligand binding proteins. To answer this question, we constructed a large library of ligand binding sites from the Protein Data Bank (PDB). We tested the number of ligand binding sites that can be accommodated by de novo scaffolds and naturally existing scaffolds with same fold topologies. The results showed that de novo scaffolds significantly expanded the ligand binding space of their respective fold topologies. We also identified factors that affect difficulties of binding site accommodation, as well as the relationship between the number of scaffolds and the accessible ligand binding site space. We believe our findings will benefit future method development and applications of ligand binding protein design.

Download Full-text