De novo protein fold families expand the designable ligand binding site space

A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place (“match”) these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein “scaffold”, which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.

Download Full-text

De novo protein fold families expand the designable ligand binding site space

10.1101/2021.01.13.426598 ◽

2021 ◽

Author(s):

Xingjie Pan ◽

Tanja Kortemme

Keyword(s):

Ligand Binding ◽

Binding Site ◽

Binding Sites ◽

Method Development ◽

De Novo ◽

Protein Structures ◽

Protein Fold ◽

Ligand Binding Site ◽

Protein Families ◽

Ligand Binding Sites

AbstractA major challenge in designing proteins de novo to bind user-defined ligands with high specificity and affinity is finding backbones structures that can accommodate a desired binding site geometry with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place (“match”) these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.Author summaryDe novo design of proteins that can bind to novel and highly diverse user-defined small molecule ligands could have broad biomedical and synthetic biology applications. Because ligand binding site geometries need to be accommodated by protein backbone scaffolds at high accuracy, the diversity of scaffolds is a major limitation for designing new ligand binding functions. Advances in computational protein structure design methods have significantly increased the number of accessible stable scaffold structures. Understanding how many new ligand binding sites can be accommodated by the de novo scaffolds is important for designing novel ligand binding proteins. To answer this question, we constructed a large library of ligand binding sites from the Protein Data Bank (PDB). We tested the number of ligand binding sites that can be accommodated by de novo scaffolds and naturally existing scaffolds with same fold topologies. The results showed that de novo scaffolds significantly expanded the ligand binding space of their respective fold topologies. We also identified factors that affect difficulties of binding site accommodation, as well as the relationship between the number of scaffolds and the accessible ligand binding site space. We believe our findings will benefit future method development and applications of ligand binding protein design.

Download Full-text

FTSite: high accuracy detection of ligand binding sites on unbound protein structures

Bioinformatics ◽

10.1093/bioinformatics/btr651 ◽

2011 ◽

Vol 28 (2) ◽

pp. 286-287 ◽

Cited By ~ 100

Author(s):

Chi-Ho Ngan ◽

David R. Hall ◽

Brandon Zerbe ◽

Laurie E. Grove ◽

Dima Kozakov ◽

...

Keyword(s):

Ligand Binding ◽

Binding Sites ◽

Protein Structures ◽

High Accuracy ◽

Ligand Binding Sites

Download Full-text

BionoiNet: ligand-binding site classification with off-the-shelf deep neural network

Bioinformatics ◽

10.1093/bioinformatics/btaa094 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3077-3083

Author(s):

Wentao Shi ◽

Jeffrey M Lemoine ◽

Abd-El-Monsif A Shawky ◽

Manali Singha ◽

Limeng Pu ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Ligand Binding ◽

Binding Sites ◽

Protein Structures ◽

Supplementary Information ◽

Heme Binding ◽

Unseen Data ◽

Ligand Binding Sites ◽

Binding Pockets

Abstract Motivation Fast and accurate classification of ligand-binding sites in proteins with respect to the class of binding molecules is invaluable not only to the automatic functional annotation of large datasets of protein structures but also to projects in protein evolution, protein engineering and drug development. Deep learning techniques, which have already been successfully applied to address challenging problems across various fields, are inherently suitable to classify ligand-binding pockets. Our goal is to demonstrate that off-the-shelf deep learning models can be employed with minimum development effort to recognize nucleotide- and heme-binding sites with a comparable accuracy to highly specialized, voxel-based methods. Results We developed BionoiNet, a new deep learning-based framework implementing a popular ResNet model for image classification. BionoiNet first transforms the molecular structures of ligand-binding sites to 2D Voronoi diagrams, which are then used as the input to a pretrained convolutional neural network classifier. The ResNet model generalizes well to unseen data achieving the accuracy of 85.6% for nucleotide- and 91.3% for heme-binding pockets. BionoiNet also computes significance scores of pocket atoms, called BionoiScores, to provide meaningful insights into their interactions with ligand molecules. BionoiNet is a lightweight alternative to computationally expensive 3D architectures. Availability and implementation BionoiNet is implemented in Python with the source code freely available at: https://github.com/CSBG-LSU/BionoiNet. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BSAlign: A RAPID GRAPH-BASED ALGORITHM FOR DETECTING LIGAND-BINDING SITES IN PROTEIN STRUCTURES

Genome Informatics 2008 ◽

10.1142/9781848163324_0006 ◽

2008 ◽

Cited By ~ 5

Author(s):

ZEYAR AUNG ◽

JOO CHUAN TONG

Keyword(s):

Ligand Binding ◽

Binding Sites ◽

Protein Structures ◽

Ligand Binding Sites

Download Full-text

webPDBinder: a server for the identification of ligand binding sites on protein structures

Nucleic Acids Research ◽

10.1093/nar/gkt457 ◽

2013 ◽

Vol 41 (W1) ◽

pp. W308-W313 ◽

Cited By ~ 3

Author(s):

Valerio Bianchi ◽

Iolanda Mangone ◽

Fabrizio Ferrè ◽

Manuela Helmer-Citterich ◽

Gabriele Ausiello

Keyword(s):

Ligand Binding ◽

Binding Sites ◽

Protein Structures ◽

Ligand Binding Sites

Download Full-text

PDBspheres - a method for finding 3D similarities in local regions in proteins

10.1101/2022.01.04.474934 ◽

2022 ◽

Author(s):

Adam Zemla ◽

Jonathan E. Allen ◽

Dan Kirshner ◽

Felice C. Lightstone

Keyword(s):

Ligand Binding ◽

Binding Site ◽

Binding Sites ◽

Protein Structures ◽

Data Bank ◽

Structure Alignment ◽

Alignment Algorithm ◽

Site Prediction ◽

Binding Site Prediction ◽

Structure Similarity

We present a structure-based method for finding and evaluating structural similarities in protein regions relevant to ligand binding. PDBspheres comprises an exhaustive library of protein structure regions (spheres) adjacent to complexed ligands derived from the Protein Data Bank (PDB), along with methods to find and evaluate structural matches between a protein of interest and spheres in the library. Currently, PDBspheres library contains more than 2 million spheres, organized to facilitate searches by sequence and/or structure similarity of protein-ligand binding sites or interfaces between interacting molecules. PDBspheres uses the LGA structure alignment algorithm as the main engine for detecting structure similarities between the protein of interest and library spheres. An all-atom structure similarity metric ensures that sidechain placement is taken into account in the PDBspheres primary assessment of confidence in structural matches. In this paper, we (1) describe the PDBspheres method, (2) demonstrate how PDBspheres can be used to detect and characterize binding sites in protein structures, (3) compare PDBspheres use for binding site prediction with seven other binding site prediction methods using a curated dataset of 2,528 ligand-bound and ligand-free crystal structures, and (4) use PDBspheres to cluster pockets and assess structural similarities among protein binding sites of the 4,876 structures in the refined set of PDBbind 2019 dataset. The PDBspheres library is made publicly available for download at https://proteinmodel.org/AS2TS/PDBspheres

Download Full-text

DeepPocket: Ligand Binding Site Detection and Segmentation using 3D Convolutional Neural Networks

10.26434/chemrxiv.14611146 ◽

2021 ◽

Author(s):

Rishal Aggarwal ◽

Akash Gupta ◽

Vineeth Chelur ◽

C. V. Jawahar ◽

U. Deva Priyakumar

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Ligand Binding ◽

Binding Site ◽

Convolutional Neural Networks ◽

Binding Sites ◽

Protein Structures ◽

3D Structure ◽

Ligand Binding Site ◽

Structure Based Drug Design

<div> A structure-based drug design pipeline involves the development of potential drug molecules or ligands that form stable complexes with a given receptor at its binding site. A prerequisite to this is finding druggable and functionally relevant binding sites on the 3D structure of the protein. Although several methods for detecting binding sites have been developed beforehand, a majority of them surprisingly fail in the identification and ranking of binding sites accurately. The rapid adoption and success of deep learning algorithms in various sections of structural biology beckons the usage of such algorithms for accurate binding site detection. As a combination of geometry based software and deep learning, we report a novel framework, DeepPocket that utilises 3D convolutional neural networks for the rescoring of pockets identified by Fpocket and further segments these identified cavities on the protein surface. Apart from this, we also propose another dataset SC6K containing protein structures submitted in the Protein Data Bank (PDB) from January 2018 till February 2020 for ligand binding site (LBS) detection. DeepPocket's results on various binding site datasets and SC6K highlights its better performance over current state-of-the-art methods and good generalization ability over novel structures. </div><div><br></div>

Download Full-text

Systematic domain-based aggregation of protein structures highlights DNA-, RNA- and other ligand-binding positions

Nucleic Acids Research ◽

10.1093/nar/gky1224 ◽

2018 ◽

Vol 47 (2) ◽

pp. 582-593 ◽

Cited By ~ 5

Author(s):

Shilpa Nadimpalli Kobren ◽

Mona Singh

Keyword(s):

Ligand Binding ◽

Small Molecules ◽

Binding Sites ◽

Protein Structures ◽

Complex Structures ◽

Disease Etiology ◽

Systematic Assessment ◽

Protein Ligand Interactions ◽

Ligand Binding Sites ◽

Ligand Interactions

Abstract Domains are fundamental subunits of proteins, and while they play major roles in facilitating protein–DNA, protein–RNA and other protein–ligand interactions, a systematic assessment of their various interaction modes is still lacking. A comprehensive resource identifying positions within domains that tend to interact with nucleic acids, small molecules and other ligands would expand our knowledge of domain functionality as well as aid in detecting ligand-binding sites within structurally uncharacterized proteins. Here, we introduce an approach to identify per-domain-position interaction ‘frequencies’ by aggregating protein co-complex structures by domain and ascertaining how often residues mapping to each domain position interact with ligands. We perform this domain-based analysis on ∼91000 co-complex structures, and infer positions involved in binding DNA, RNA, peptides, ions or small molecules across 4128 domains, which we refer to collectively as the InteracDome. Cross-validation testing reveals that ligand-binding positions for 2152 domains are highly consistent and can be used to identify residues facilitating interactions in ∼63–69% of human genes. Our resource of domain-inferred ligand-binding sites should be a great aid in understanding disease etiology: whereas these sites are enriched in Mendelian-associated and cancer somatic mutations, they are depleted in polymorphisms observed across healthy populations. The InteracDome is available at http://interacdome.princeton.edu.

Download Full-text

Transcriptional Repression of the VC2105 Protein by Vibrio FadR Suggests that It Is a New Auxiliary Member of thefadRegulon

Applied and Environmental Microbiology ◽

10.1128/aem.00293-16 ◽

2016 ◽

Vol 82 (9) ◽

pp. 2819-2832 ◽

Cited By ~ 6

Author(s):

Rongsui Gao ◽

Jingxia Lin ◽

Han Zhang ◽

Youjun Feng

Keyword(s):

Transcriptional Regulation ◽

Fatty Acid ◽

Ligand Binding ◽

Binding Site ◽

Binding Sites ◽

Regulatory Proteins ◽

Content Type ◽

Ligand Binding Sites

ABSTRACTRecently, our group along with others reported that theVibrioFadR regulatory protein is unusual in that, unlike the prototypicalfadRproduct ofEscherichia coli, which has only one ligand-binding site,VibrioFadR has two ligand-binding sites and represents a new mechanism for fatty acid sensing. The promoter region of thevc2105gene, encoding a putative thioesterase, was mapped, and a putative FadR-binding site (AA CTG GTA AGA GCA CTT) was proposed. Different versions of the FadR regulatory proteins were prepared and purified to homogeneity. Both electrophoretic mobility shift assay (EMSA) and surface plasmon resonance (SPR) determined the direct interaction of thevc2105gene with FadR proteins of various origins. Further, EMSAs illustrated that the addition of long-chain acyl-coenzyme A (CoA) species efficiently dissociates thevc2105promoter from the FadR regulator. The expression level of theVibrio cholerae vc2105gene was elevated 2- to 3-fold in afadRnull mutant strain, validating that FadR is a repressor for thevc2105gene. The β-galactosidase activity of avc2105-lacZtranscriptional fusion was increased over 2-fold upon supplementation of growth medium with oleic acid. Unlike thefadDgene, a member of theVibrio fadregulon, the VC2105 protein played no role in bacterial growth and virulence-associated gene expression ofctxAB(cholera toxin A/B) andtcpA(toxin coregulated pilus A). Given that the transcriptional regulation ofvc2105fits the criteria for fatty acid degradation (fad) genes, we suggested that it is a new member of theVibrio fadregulon.IMPORTANCETheVibrioFadR regulator is unusual in that it has two ligand-binding sites. Different versions of the FadR regulatory proteins were prepared and characterizedin vitroandin vivo. An auxiliaryfadgene (vc2105) fromVibriowas proposed that encodes a putative thioesterase and has a predicted FadR-binding site (AAC TGG TA A GAG CAC TT). The function of this putative binding site was proved using both EMSA and SPR. Furtherin vitroandin vivoexperiments revealed that theVibrioFadR is a repressor for thevc2105gene. UnlikefadD, a member of theVibrio fadregulon, VC2105 played no role in bacterial growth and expression of the two virulence-associated genes (ctxABandtcpA). Therefore, since transcriptional regulation ofvc2105fits the criteria forfadgenes, it seems likely thatvc2105acts as a new auxiliary member of theVibrio fadregulon.

Download Full-text