scholarly journals Census of halide-binding sites in protein structures

2020 ◽  
Vol 36 (10) ◽  
pp. 3064-3071
Author(s):  
Rostislav K Skitchenko ◽  
Dmitrii Usoltsev ◽  
Mayya Uspenskaya ◽  
Andrey V Kajava ◽  
Albert Guskov

Abstract Motivation Halides are negatively charged ions of halogens, forming fluorides (F−), chlorides (Cl−), bromides (Br−) and iodides (I−). These anions are quite reactive and interact both specifically and non-specifically with proteins. Despite their ubiquitous presence and important roles in protein function, little is known about the preferences of halides binding to proteins. To address this problem, we performed the analysis of halide–protein interactions, based on the entries in the Protein Data Bank. Results We have compiled a pipeline for the quick analysis of halide-binding sites in proteins using the available software. Our analysis revealed that all of halides are strongly attracted by the guanidinium moiety of arginine side chains, however, there are also certain preferences among halides for other partners. Furthermore, there is a certain preference for coordination numbers in the binding sites, with a correlation between coordination numbers and amino acid composition. This pipeline can be used as a tool for the analysis of specific halide–protein interactions and assist phasing experiments relying on halides as anomalous scatters. Availability and implementation All data described in this article can be reproduced via complied pipeline published at https://github.com/rostkick/Halide_sites/blob/master/README.md. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Vol 39 (3) ◽  
Author(s):  
Kyle T. Helzer ◽  
Mary Szatkowski Ozers ◽  
Mark B. Meyer ◽  
Nancy A. Benkusky ◽  
Natalia Solodin ◽  
...  

ABSTRACT Posttranslational modifications are key regulators of protein function, providing cues that can alter protein interactions and cellular location. Phosphorylation of estrogen receptor α (ER) at serine 118 (pS118-ER) occurs in response to multiple stimuli and is involved in modulating ER-dependent gene transcription. While the cistrome of ER is well established, surprisingly little is understood about how phosphorylation impacts ER-DNA binding activity. To define the pS118-ER cistrome, chromatin immunoprecipitation sequencing was performed on pS118-ER and ER in MCF-7 cells treated with estrogen. pS118-ER occupied a subset of ER binding sites which were associated with an active enhancer mark, acetylated H3K27. Unlike ER, pS118-ER sites were enriched in GRHL2 DNA binding motifs, and estrogen treatment increased GRHL2 recruitment to sites occupied by pS118-ER. Additionally, pS118-ER occupancy sites showed greater enrichment of full-length estrogen response elements relative to ER sites. In an in vitro DNA binding array of genomic binding sites, pS118-ER was more commonly associated with direct DNA binding events than indirect binding events. These results indicate that phosphorylation of ER at serine 118 promotes direct DNA binding at active enhancers and is a distinguishing mark for associated transcription factor complexes on chromatin.


2020 ◽  
Vol 36 (10) ◽  
pp. 3077-3083
Author(s):  
Wentao Shi ◽  
Jeffrey M Lemoine ◽  
Abd-El-Monsif A Shawky ◽  
Manali Singha ◽  
Limeng Pu ◽  
...  

Abstract Motivation Fast and accurate classification of ligand-binding sites in proteins with respect to the class of binding molecules is invaluable not only to the automatic functional annotation of large datasets of protein structures but also to projects in protein evolution, protein engineering and drug development. Deep learning techniques, which have already been successfully applied to address challenging problems across various fields, are inherently suitable to classify ligand-binding pockets. Our goal is to demonstrate that off-the-shelf deep learning models can be employed with minimum development effort to recognize nucleotide- and heme-binding sites with a comparable accuracy to highly specialized, voxel-based methods. Results We developed BionoiNet, a new deep learning-based framework implementing a popular ResNet model for image classification. BionoiNet first transforms the molecular structures of ligand-binding sites to 2D Voronoi diagrams, which are then used as the input to a pretrained convolutional neural network classifier. The ResNet model generalizes well to unseen data achieving the accuracy of 85.6% for nucleotide- and 91.3% for heme-binding pockets. BionoiNet also computes significance scores of pocket atoms, called BionoiScores, to provide meaningful insights into their interactions with ligand molecules. BionoiNet is a lightweight alternative to computationally expensive 3D architectures. Availability and implementation BionoiNet is implemented in Python with the source code freely available at: https://github.com/CSBG-LSU/BionoiNet. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (22) ◽  
pp. 4854-4856 ◽  
Author(s):  
James D Stephenson ◽  
Roman A Laskowski ◽  
Andrew Nightingale ◽  
Matthew E Hurles ◽  
Janet M Thornton

Abstract Motivation Understanding the protein structural context and patterning on proteins of genomic variants can help to separate benign from pathogenic variants and reveal molecular consequences. However, mapping genomic coordinates to protein structures is non-trivial, complicated by alternative splicing and transcript evidence. Results Here we present VarMap, a web tool for mapping a list of chromosome coordinates to canonical UniProt sequences and associated protein 3D structures, including validation checks, and annotating them with structural information. Availability and implementation https://www.ebi.ac.uk/thornton-srv/databases/VarMap. Supplementary information Supplementary data are available at Bioinformatics online.


2003 ◽  
Vol 01 (01) ◽  
pp. 119-138 ◽  
Author(s):  
LIPING WEI ◽  
RUSS B. ALTMAN

The increase in known three-dimensional protein structures enables us to build statistical profiles of important functional sites in protein molecules. These profiles can then be used to recognize sites in large-scale automated annotations of new protein structures. We report an improved FEATURE system which recognizes functional sites in protein structures. FEATURE defines multi-level physico-chemical properties and recognizes sites based on the spatial distribution of these properties in the sites' microenvironments. It uses a Bayesian scoring function to compare a query region with the statistical profile built from known examples of sites and control nonsites. We have previously shown that FEATURE can accurately recognize calcium-binding sites and have reported interesting results scanning for calcium-binding sites in the entire Protein Data Bank. Here we report the ability of the improved FEATURE to characterize and recognize geometrically complex and asymmetric sites such as ATP-binding sites and disulfide bond-forming sites. FEATURE does not rely on conserved residues or conserved residue geometry of the sites. We also demonstrate that, in the absence of a statistical profile of the sites, FEATURE can use an artificially constructed profile based on a priori knowledge to recognize the sites in new structures, using redoxin active sites as an example.


2021 ◽  
Author(s):  
Sandeep Kaur ◽  
Neblina Sikta ◽  
Andrea Schafferhans ◽  
Nicola Bordin ◽  
Mark J. Cowley ◽  
...  

AbstractMotivationVariant analysis is a core task in bioinformatics that requires integrating data from many sources. This process can be helped by using 3D structures of proteins, which can provide a spatial context that can provide insight into how variants affect function. Many available tools can help with mapping variants onto structures; but each has specific restrictions, with the result that many researchers fail to benefit from valuable insights that could be gained from structural data.ResultsTo address this, we have created a streamlined system for incorporating 3D structures into variant analysis. Variants can be easily specified via URLs that are easily readable and writable, and use the notation recommended by the Human Genome Variation Society (HGVS). For example, ‘https://aquaria.app/SARS-CoV-2/S/?N501Y’ specifies the N501Y variant of SARS-CoV-2 S protein. In addition to mapping variants onto structures, our system provides summary information from multiple external resources, including COSMIC, CATH-FunVar, and PredictProtein. Furthermore, our system identifies and summarizes structures containing the variant, as well as the variant-position. Our system supports essentially any mutation for any well-studied protein, and uses all available structural data — including models inferred via very remote homology — integrated into a system that is fast and simple to use. By giving researchers easy, streamlined access to a wealth of structural information during variant analysis, our system will help in revealing novel insights into the molecular mechanisms underlying protein function in health and disease.AvailabilityOur resource is freely available at the project home page (https://aquaria.app). After peer review, the code will be openly available via a GPL version 2 license at https://github.com/ODonoghueLab/Aquaria. PSSH2, the database of sequence-to-structure alignments, is also freely available for download at https://zenodo.org/record/[email protected] informationNone.


2022 ◽  
Author(s):  
Adam Zemla ◽  
Jonathan E. Allen ◽  
Dan Kirshner ◽  
Felice C. Lightstone

We present a structure-based method for finding and evaluating structural similarities in protein regions relevant to ligand binding. PDBspheres comprises an exhaustive library of protein structure regions (spheres) adjacent to complexed ligands derived from the Protein Data Bank (PDB), along with methods to find and evaluate structural matches between a protein of interest and spheres in the library. Currently, PDBspheres library contains more than 2 million spheres, organized to facilitate searches by sequence and/or structure similarity of protein-ligand binding sites or interfaces between interacting molecules. PDBspheres uses the LGA structure alignment algorithm as the main engine for detecting structure similarities between the protein of interest and library spheres. An all-atom structure similarity metric ensures that sidechain placement is taken into account in the PDBspheres primary assessment of confidence in structural matches. In this paper, we (1) describe the PDBspheres method, (2) demonstrate how PDBspheres can be used to detect and characterize binding sites in protein structures, (3) compare PDBspheres use for binding site prediction with seven other binding site prediction methods using a curated dataset of 2,528 ligand-bound and ligand-free crystal structures, and (4) use PDBspheres to cluster pockets and assess structural similarities among protein binding sites of the 4,876 structures in the refined set of PDBbind 2019 dataset. The PDBspheres library is made publicly available for download at https://proteinmodel.org/AS2TS/PDBspheres


IUCrJ ◽  
2018 ◽  
Vol 5 (5) ◽  
pp. 585-594 ◽  
Author(s):  
Bart van Beusekom ◽  
Krista Joosten ◽  
Maarten L. Hekkelman ◽  
Robbie P. Joosten ◽  
Anastassis Perrakis

Inherent protein flexibility, poor or low-resolution diffraction data or poorly defined electron-density maps often inhibit the building of complete structural models during X-ray structure determination. However, recent advances in crystallographic refinement and model building often allow completion of previously missing parts. This paper presents algorithms that identify regions missing in a certain model but present in homologous structures in the Protein Data Bank (PDB), and `graft' these regions of interest. These new regions are refined and validated in a fully automated procedure. Including these developments in the PDB-REDO pipeline has enabled the building of 24 962 missing loops in the PDB. The models and the automated procedures are publicly available through the PDB-REDO databank and webserver. More complete protein structure models enable a higher quality public archive but also a better understanding of protein function, better comparison between homologous structures and more complete data mining in structural bioinformatics projects.


2019 ◽  
Vol 47 (W1) ◽  
pp. W350-W356 ◽  
Author(s):  
Nur Syatila Ab Ghani ◽  
Effirul Ikhwan Ramlan ◽  
Mohd Firdaus-Raih

AbstractA common drug repositioning strategy is the re-application of an existing drug to address alternative targets. A crucial aspect to enable such repurposing is that the drug's binding site on the original target is similar to that on the alternative target. Based on the assumption that proteins with similar binding sites may bind to similar drugs, the 3D substructure similarity data can be used to identify similar sites in other proteins that are not known targets. The Drug ReposER (DRug REPOSitioning Exploration Resource) web server is designed to identify potential targets for drug repurposing based on sub-structural similarity to the binding interfaces of known drug binding sites. The application has pre-computed amino acid arrangements from protein structures in the Protein Data Bank that are similar to the 3D arrangements of known drug binding sites thus allowing users to explore them as alternative targets. Users can annotate new structures for sites that are similarly arranged to the residues found in known drug binding interfaces. The search results are presented as mappings of matched sidechain superpositions. The results of the searches can be visualized using an integrated NGL viewer. The Drug ReposER server has no access restrictions and is available at http://mfrlab.org/drugreposer/.


2019 ◽  
Author(s):  
Riccardo Delli Ponti ◽  
Alexandros Armaos ◽  
Andrea Vandelli ◽  
Gian Gaetano Tartaglia

Abstract Motivation RNA structure is difficult to predict in vivo due to interactions with enzymes and other molecules. Here we introduce CROSSalive, an algorithm to predict the single- and double-stranded regions of RNAs in vivo using predictions of protein interactions. Results Trained on icSHAPE data in presence (m6a+) and absence of N6 methyladenosine modification (m6a-), CROSSalive achieves cross-validation accuracies between 0.70 and 0.88 in identifying high-confidence single- and double-stranded regions. The algorithm was applied to the long non-coding RNA Xist (17 900 nt, not present in the training) and shows an Area under the ROC curve of 0.83 in predicting structured regions. Availability and implementation CROSSalive webserver is freely accessible at http://service.tartaglialab.com/new_submission/crossalive Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Wei Li

Protein is the proteios building block of life. Evolutionarily, its sequence is not as conserved as its structure, making it more reasonable for protein structure, instead of protein sequence, to be the descriptor of protein function. Yet, in the National Center for Biotechnology Information (NCBI) database, the number of experimentally identified protein sequences is in great excess of that of experimentally determined protein structures inside the almost-half-a-century old Protein Data Bank (PDB). For instance, GPR151 is an proton-sensing G-protein coupled receptor (GPCR) originally identified as homologous to galanin receptors. As of March 19, 2020, GPR151’s structure has not been experimentally determined and deposited in PDB yet. Thus, an ab initio modelling approach was employed here to build a three-dimensional structure of GPR151. Overall, the ab initio GPR151 model presented herein constitutes the first structural hypothesis of GPR151 to be experimentally tested in future with previously published, currently ongoing and future GPR151 studies.


Sign in / Sign up

Export Citation Format

Share Document