PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools

Understanding of the three-dimensional structures of proteins that interact with carbohydrates covalently (glycoproteins) as well as noncovalently (protein-carbohydrate complexes) is essential to many biological processes and plays a significant role in normal and disease-associated functions. It is important to have a central repository of knowledge available about these protein-carbohydrate complexes as well as preprocessed data of predicted structures. This can be significantly enhanced by tools de novo which can predict carbohydrate-binding sites for proteins in the absence of structure of experimentally known binding site. PROCARB is an open-access database comprising three independently working components, namely, (i) Core PROCARB module, consisting of three-dimensional structures of protein-carbohydrate complexes taken from Protein Data Bank (PDB), (ii) Homology Models module, consisting of manually developed three-dimensional models of N-linked and O-linked glycoproteins of unknown three-dimensional structure, and (iii) CBS-Pred prediction module, consisting of web servers to predict carbohydrate-binding sites using single sequence or server-generated PSSM. Several precomputed structural and functional properties of complexes are also included in the database for quick analysis. In particular, information about function, secondary structure, solvent accessibility, hydrogen bonds and literature reference, and so forth, is included. In addition, each protein in the database is mapped to Uniprot, Pfam, PDB, and so forth.

Download Full-text

Solvent Accessibility of Residues Undergoing Pathogenic Variations in Humans: From Protein Structures to Protein Sequences

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2020.626363 ◽

2021 ◽

Vol 7 ◽

Author(s):

Castrense Savojardo ◽

Matteo Manfredi ◽

Pier Luigi Martelli ◽

Rita Casadio

Keyword(s):

Solvent Accessibility ◽

Protein Structures ◽

Three Dimensional ◽

Protein Sequences ◽

Large Data ◽

Human Protein ◽

Dimensional Structure ◽

Wild Type ◽

Solvent Exposure ◽

Data Set

Solvent accessibility (SASA) is a key feature of proteins for determining their folding and stability. SASA is computed from protein structures with different algorithms, and from protein sequences with machine-learning based approaches trained on solved structures. Here we ask the question as to which extent solvent exposure of residues can be associated to the pathogenicity of the variation. By this, SASA of the wild-type residue acquires a role in the context of functional annotation of protein single-residue variations (SRVs). By mapping variations on a curated database of human protein structures, we found that residues targeted by disease related SRVs are less accessible to solvent than residues involved in polymorphisms. The disease association is not evenly distributed among the different residue types: SRVs targeting glycine, tryptophan, tyrosine, and cysteine are more frequently disease associated than others. For all residues, the proportion of disease related SRVs largely increases when the wild-type residue is buried and decreases when it is exposed. The extent of the increase depends on the residue type. With the aid of an in house developed predictor, based on a deep learning procedure and performing at the state-of-the-art, we are able to confirm the above tendency by analyzing a large data set of residues subjected to variations and occurring in some 12,494 human protein sequences still lacking three-dimensional structure (derived from HUMSAVAR). Our data support the notion that surface accessible area is a distinguished property of residues that undergo variation and that pathogenicity is more frequently associated to the buried property than to the exposed one.

Download Full-text

MRPC (Missing Regions in Polypeptide Chains): a knowledgebase

Journal of Applied Crystallography ◽

10.1107/s1600576719012330 ◽

2019 ◽

Vol 52 (6) ◽

pp. 1422-1426

Author(s):

Rajendran Santhosh ◽

Namrata Bankoti ◽

Adgonda Malgonnavar Padmashri ◽

Daliah Michael ◽

Jeyaraman Jeyakanthan ◽

...

Keyword(s):

Protein Structures ◽

Three Dimensional ◽

Protein Molecule ◽

Data Bank ◽

Protein Crystal ◽

Dimensional Structure ◽

Protein Structure Analysis ◽

Three Dimensional Structure ◽

X Ray Crystallography ◽

Polypeptide Chains

Missing regions in protein crystal structures are those regions that cannot be resolved, mainly owing to poor electron density (if the three-dimensional structure was solved using X-ray crystallography). These missing regions are known to have high B factors and could represent loops with a possibility of being part of an active site of the protein molecule. Thus, they are likely to provide valuable information and play a crucial role in the design of inhibitors and drugs and in protein structure analysis. In view of this, an online database, Missing Regions in Polypeptide Chains (MRPC), has been developed which provides information about the missing regions in protein structures available in the Protein Data Bank. In addition, the new database has an option for users to obtain the above data for non-homologous protein structures (25 and 90%). A user-friendly graphical interface with various options has been incorporated, with a provision to view the three-dimensional structure of the protein along with the missing regions using JSmol. The MRPC database is updated regularly (currently once every three months) and can be accessed freely at the URL http://cluster.physics.iisc.ac.in/mrpc.

Download Full-text

Prediction and Analysis of Surface Hydrophobic Residues in Tertiary Structure of Proteins

The Scientific World JOURNAL ◽

10.1155/2014/971258 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 19

Author(s):

Shambhu Malleshappa Gowder ◽

Jhinuk Chatterjee ◽

Tanusree Chaudhuri ◽

Kusum Paul

Keyword(s):

Tertiary Structure ◽

Structural Information ◽

Solvent Accessibility ◽

Conservation Score ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Data Set ◽

Hydrophobic Residues ◽

Monomeric Proteins

The analysis of protein structures provides plenty of information about the factors governing the folding and stability of proteins, the preferred amino acids in the protein environment, the location of the residues in the interior/surface of a protein and so forth. In general, hydrophobic residues such as Val, Leu, Ile, Phe, and Met tend to be buried in the interior and polar side chains exposed to solvent. The present work depends on sequence as well as structural information of the protein and aims to understand nature of hydrophobic residues on the protein surfaces. It is based on the nonredundant data set of 218 monomeric proteins. Solvent accessibility of each protein was determined using NACCESS software and then obtained the homologous sequences to understand how well solvent exposed and buried hydrophobic residues are evolutionarily conserved and assigned the confidence scores to hydrophobic residues to be buried or solvent exposed based on the information obtained from conservation score and knowledge of flanking regions of hydrophobic residues. In the absence of a three-dimensional structure, the ability to predict surface accessibility of hydrophobic residues directly from the sequence is of great help in choosing the sites of chemical modification or specific mutations and in the studies of protein stability and molecular interactions.

Download Full-text

TOP: a new method for protein structure comparisons and similarity searches

Journal of Applied Crystallography ◽

10.1107/s0021889899012339 ◽

2000 ◽

Vol 33 (1) ◽

pp. 176-183 ◽

Cited By ~ 149

Author(s):

Guoguang Lu

Keyword(s):

User Interface ◽

Protein Structure ◽

Protein Structures ◽

Three Dimensional ◽

Data Bank ◽

Structure Alignment ◽

Dimensional Structure ◽

Protein Structure Alignment ◽

Protein Structure Analysis ◽

Structure Comparison

In order to facilitate the three-dimensional structure comparison of proteins, software for making comparisons and searching for similarities to protein structures in databases has been developed. The program identifies the residues that share similar positions of both main-chain and side-chain atoms between two proteins. The unique functions of the software also include database processingviaInternet- and Web-based servers for different types of users. The developed method and its friendly user interface copes with many of the problems that frequently occur in protein structure comparisons, such as detecting structurally equivalent residues, misalignment caused by coincident match of Cαatoms, circular sequence permutations, tedious repetition of access, maintenance of the most recent database, and inconvenience of user interface. The program is also designed to cooperate with other tools in structural bioinformatics, such as the 3DB Browser software [Prilusky (1998).Protein Data Bank Q. Newslett.84, 3–4] and the SCOP database [Murzin, Brenner, Hubbard & Chothia (1995).J. Mol. Biol.247, 536–540], for convenient molecular modelling and protein structure analysis. A similarity ranking score of `structure diversity' is proposed in order to estimate the evolutionary distance between proteins based on the comparisons of their three-dimensional structures. The function of the program has been utilized as a part of an automated program for multiple protein structure alignment. In this paper, the algorithm of the program and results of systematic tests are presented and discussed.

Download Full-text

BiRDS - Binding Residue Detection from Protein Sequences using Deep ResNets

10.33774/chemrxiv-2021-013gn-v2 ◽

2021 ◽

Author(s):

Vineeth Chelur ◽

U. Deva Priyakumar

Keyword(s):

Binding Site ◽

Binding Sites ◽

Tertiary Structure ◽

Solvent Accessibility ◽

Three Dimensional ◽

Dimensional Structure ◽

Relative Solvent Accessibility ◽

Single Chain ◽

Sequence Alignments ◽

Multiple Sequence

Protein-drug interactions play important roles in many biological processes and therapeutics. Prediction of the active binding site of a protein helps discover and optimise these interactions leading to the design of better ligand molecules. The tertiary structure of a protein determines the binding sites available to the drug molecule. A quick and accurate prediction of the binding site from sequence alone without utilising the three-dimensional structure is challenging. Deep Learning has been used in a variety of biochemical tasks and has been hugely successful. In this paper, a Residual Neural Network (leveraging skip connections) is implemented to predict a protein's most active binding site. An Annotated Database of Druggable Binding Sites from the Protein DataBank, sc-PDB, is used for training the network. Features extracted from the Multiple Sequence Alignments (MSAs) of the protein generated using DeepMSA, such as Position-Specific Scoring Matrix (PSSM), Secondary Structure (SS3), and Relative Solvent Accessibility (RSA), are provided as input to the network. A weighted binary cross-entropy loss function is used to counter the substantial imbalance in the two classes of binding and non-binding residues. The network performs very well on single-chain proteins, providing a pocket that has good interactions with a ligand.

Download Full-text

Ab Initio Modelling the Structure of Proton-Sensing G-Protein Coupled Receptor GPR151

10.20944/preprints202003.0304.v1 ◽

2020 ◽

Author(s):

Wei Li

Keyword(s):

Ab Initio ◽

G Protein ◽

Protein Function ◽

Protein Structures ◽

Three Dimensional ◽

Data Bank ◽

Dimensional Structure ◽

G Protein Coupled Receptor ◽

Great Excess ◽

G Protein Coupled

Protein is the proteios building block of life. Evolutionarily, its sequence is not as conserved as its structure, making it more reasonable for protein structure, instead of protein sequence, to be the descriptor of protein function. Yet, in the National Center for Biotechnology Information (NCBI) database, the number of experimentally identified protein sequences is in great excess of that of experimentally determined protein structures inside the almost-half-a-century old Protein Data Bank (PDB). For instance, GPR151 is an proton-sensing G-protein coupled receptor (GPCR) originally identified as homologous to galanin receptors. As of March 19, 2020, GPR151’s structure has not been experimentally determined and deposited in PDB yet. Thus, an ab initio modelling approach was employed here to build a three-dimensional structure of GPR151. Overall, the ab initio GPR151 model presented herein constitutes the first structural hypothesis of GPR151 to be experimentally tested in future with previously published, currently ongoing and future GPR151 studies.

Download Full-text

Support for a three-dimensional structure predicting a Cys-Glu-Lys catalytic triad for Pseudomonas aeruginosa amidase comes from site-directed mutagenesis and mutations altering substrate specificity

Biochemical Journal ◽

10.1042/bj20011714 ◽

2002 ◽

Vol 365 (3) ◽

pp. 731-738 ◽

Cited By ~ 32

Author(s):

Carlos NOVO ◽

Sebastien FARNAUD ◽

Renée TATA ◽

Alda CLEMENTE ◽

Paul R. BROWN

Keyword(s):

Pseudomonas Aeruginosa ◽

Substrate Specificity ◽

Catalytic Mechanism ◽

Three Dimensional ◽

Data Bank ◽

Catalytic Triad ◽

Site Directed Mutagenesis ◽

Dimensional Structure ◽

Amino Acid Residues ◽

Three Dimensional Models

The aliphatic amidase from Pseudomonas aeruginosa belongs to the nitrilase superfamily, and Cys166 is the nucleophile of the catalytic mechanism. A model of amidase was built by comparative modelling using the crystal structure of the worm nitrilase—fragile histidine triad fusion protein (NitFhit; Protein Data Bank accession number 1EMS) as a template. The amidase model predicted a catalytic triad (Cys-Glu-Lys) situated at the bottom of a pocket and identical with the presumptive catalytic triad of NitFhit. Three-dimensional models for other amidases belonging to the nitrilase superfamily also predicted Cys-Glu-Lys catalytic triads. Support for the structure for the P. aeruginosa amidase came from site-direct mutagenesis and from the locations of amino acid residues that altered substrate specificity or binding when mutated.

Download Full-text

A COMPARATIVE STUDY OF PROTEIN TERTIARY STRUCTURE PREDICTION METHODS

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2014.1168 ◽

2014 ◽

pp. 15-18

Author(s):

CHANDRAYANI N. ROKDE ◽

DR.MANALI KSHIRSAGAR

Keyword(s):

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Sequence Data ◽

Protein Structures ◽

Three Dimensional ◽

Data Bank ◽

Dimensional Structure ◽

X Ray Crystallography ◽

Protein Tertiary Structure Prediction

Protein structure prediction (PSP) from amino acid sequence is one of the high focus problems in bioinformatics today. This is due to the fact that the biological function of the protein is determined by its three dimensional structure. The understanding of protein structures is vital to determine the function of a protein and its interaction with DNA, RNA and enzyme. Thus, protein structure is a fundamental area of computational biology. Its importance is intensed by large amounts of sequence data coming from PDB (Protein Data Bank) and the fact that experimentally methods such as X-ray crystallography or Nuclear Magnetic Resonance (NMR)which are used to determining protein structures remains very expensive and time consuming. In this paper, different types of protein structures and methods for its prediction are described.

Download Full-text

Mapping Mutations in Proteins of SARS CoV-2 Indian Isolates on to the Three-Dimensional Structures

10.26434/chemrxiv.12683771 ◽

2020 ◽

Author(s):

Kunchur Guruprasad

Keyword(s):

Rna Polymerase ◽

Binding Sites ◽

Three Dimensional ◽

Data Bank ◽

Drug Binding ◽

Dimensional Structure ◽

Rna Dependent Rna Polymerase ◽

Indian Isolates ◽

Drug Binding Sites ◽

Structural Insights

<p>The amino acid residue mutations observed in SARS CoV-2 RNA dependent RNA polymerase, helicase, endoRNAse and spike proteins from Indian isolates, relative to the reference SARS CoV-2 proteins from the Wuhan Hu-1 isolate, were mapped onto the protein three-dimensional structure templates available in the Protein Data Bank.<b> </b>The secondary structure conformations corresponding to the mutations, their locations and proximity to functionally important residues in these proteins and to the drug binding sites in RNA dependent RNA polymerase and endoRNAse targets were analysed. Our analyses provide structural insights into the mutations in these SARS CoV-2 proteins.</p>

Download Full-text

Structure of Looped Regions in β-α- and α-β-Arches in Abcd-Units of Globular Proteins

Математическая биология и биоинформатика ◽

10.17537/2016.11.159 ◽

2016 ◽

Vol 11 (2) ◽

pp. 159-169

Author(s):

Е.В. Бражников ◽

E.V. Brazhnikov

Keyword(s):

Amino Acid ◽

De Novo ◽

Protein Structures ◽

Three Dimensional ◽

Structural Motif ◽

Dimensional Structure ◽

Amino Acid Residues ◽

Homologous Proteins ◽

Reverse Turn ◽

Structure Of Proteins

Conformations of about 600 looped regions (loops) in β-α- and α-β-arches of a structural motif occurring in the abCd-unit of proteins were analyzed. On the whole, 258 abCd-units with a reverse turn of the polypeptide chain (236 PDB files) and 69 abCd-units with a direct turn (65 PDB files) were selected in non-homologous proteins. Four types of arches were studied: β-α- and α-β-ones at a direct turn of the chain; β-α- and α-β-ones at a reverse turn of the chain. For each type of arches, frequencies of loops occurrence of different lengths were determined and corresponding histograms were plotted. It was found that abCd-units with loops up to three amino acid residues long occur most frequently (57 %). In β-α-arches with a direct turn of the chain, loops consisting of two amino acid residues occur most often (44 %) and in 86% cases they have the βmαβαn - conformation. They have no Gly and Pro residues, and in position β there is an Asn residue. In such type of arches, the loops of one residue (βmεαn- or βmαLαn- conformation) contain the Gly residue most frequently. α-β-Arches with a direct turn of the chain have most commonly (18 %) loops of four amino acid residues. In this case, there is no predominant conformation of the loops. In β-α-arches with a reverse turn of the chain, most common are loops of seven amino acid residues (17%), and most part of them (88 %) have the βmαLββααββαn - conformation. α-β-Arches with a reverse turn of the chain contain most frequently (32%) loops of one amino acid residue (all Gly ones) with arch conformations αmεβn or αmαLβn. The above structural analysis of the abCd-unit has useful information for prediction of the three-dimensional structure of proteins and for molecular simulation of the de novo design of protein structures.

Download Full-text