scholarly journals BiRDS - Binding Residue Detection from Protein Sequences using Deep ResNets

Author(s):  
Vineeth Chelur ◽  
U. Deva Priyakumar

Protein-drug interactions play important roles in many biological processes and therapeutics. Prediction of the active binding site of a protein helps discover and optimise these interactions leading to the design of better ligand molecules. The tertiary structure of a protein determines the binding sites available to the drug molecule. A quick and accurate prediction of the binding site from sequence alone without utilising the three-dimensional structure is challenging. Deep Learning has been used in a variety of biochemical tasks and has been hugely successful. In this paper, a Residual Neural Network (leveraging skip connections) is implemented to predict a protein's most active binding site. An Annotated Database of Druggable Binding Sites from the Protein DataBank, sc-PDB, is used for training the network. Features extracted from the Multiple Sequence Alignments (MSAs) of the protein generated using DeepMSA, such as Position-Specific Scoring Matrix (PSSM), Secondary Structure (SS3), and Relative Solvent Accessibility (RSA), are provided as input to the network. A weighted binary cross-entropy loss function is used to counter the substantial imbalance in the two classes of binding and non-binding residues. The network performs very well on single-chain proteins, providing a pocket that has good interactions with a ligand.

2020 ◽  
Vol 36 (11) ◽  
pp. 3372-3378
Author(s):  
Alexander Gress ◽  
Olga V Kalinina

Abstract Motivation In proteins, solvent accessibility of individual residues is a factor contributing to their importance for protein function and stability. Hence one might wish to calculate solvent accessibility in order to predict the impact of mutations, their pathogenicity and for other biomedical applications. A direct computation of solvent accessibility is only possible if all atoms of a protein three-dimensional structure are reliably resolved. Results We present SphereCon, a new precise measure that can estimate residue relative solvent accessibility (RSA) from limited data. The measure is based on calculating the volume of intersection of a sphere with a cone cut out in the direction opposite of the residue with surrounding atoms. We propose a method for estimating the position and volume of residue atoms in cases when they are not known from the structure, or when the structural data are unreliable or missing. We show that in cases of reliable input structures, SphereCon correlates almost perfectly with the directly computed RSA, and outperforms other previously suggested indirect methods. Moreover, SphereCon is the only measure that yields accurate results when the identities of amino acids are unknown. A significant novel feature of SphereCon is that it can estimate RSA from inter-residue distance and contact matrices, without any information about the actual atom coordinates. Availability and implementation https://github.com/kalininalab/spherecon. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Shambhu Malleshappa Gowder ◽  
Jhinuk Chatterjee ◽  
Tanusree Chaudhuri ◽  
Kusum Paul

The analysis of protein structures provides plenty of information about the factors governing the folding and stability of proteins, the preferred amino acids in the protein environment, the location of the residues in the interior/surface of a protein and so forth. In general, hydrophobic residues such as Val, Leu, Ile, Phe, and Met tend to be buried in the interior and polar side chains exposed to solvent. The present work depends on sequence as well as structural information of the protein and aims to understand nature of hydrophobic residues on the protein surfaces. It is based on the nonredundant data set of 218 monomeric proteins. Solvent accessibility of each protein was determined using NACCESS software and then obtained the homologous sequences to understand how well solvent exposed and buried hydrophobic residues are evolutionarily conserved and assigned the confidence scores to hydrophobic residues to be buried or solvent exposed based on the information obtained from conservation score and knowledge of flanking regions of hydrophobic residues. In the absence of a three-dimensional structure, the ability to predict surface accessibility of hydrophobic residues directly from the sequence is of great help in choosing the sites of chemical modification or specific mutations and in the studies of protein stability and molecular interactions.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Mingjian Jiang ◽  
Zhen Li ◽  
Yujie Bian ◽  
Zhiqiang Wei

Abstract Background Binding sites are the pockets of proteins that can bind drugs; the discovery of these pockets is a critical step in drug design. With the help of computers, protein pockets prediction can save manpower and financial resources. Results In this paper, a novel protein descriptor for the prediction of binding sites is proposed. Information on non-bonded interactions in the three-dimensional structure of a protein is captured by a combination of geometry-based and energy-based methods. Moreover, due to the rapid development of deep learning, all binding features are extracted to generate three-dimensional grids that are fed into a convolution neural network. Two datasets were introduced into the experiment. The sc-PDB dataset was used for descriptor extraction and binding site prediction, and the PDBbind dataset was used only for testing and verification of the generalization of the method. The comparison with previous methods shows that the proposed descriptor is effective in predicting the binding sites. Conclusions A new protein descriptor is proposed for the prediction of the drug binding sites of proteins. This method combines the three-dimensional structure of a protein and non-bonded interactions with small molecules to involve important factors influencing the formation of binding site. Analysis of the experiments indicates that the descriptor is robust for site prediction.


2021 ◽  
Author(s):  
Allan Costa ◽  
Manvitha Ponnapati ◽  
Joseph M Jacobson ◽  
Pranam Chatterjee

Determining the structure of proteins has been a long-standing goal in biology. Language models have been recently deployed to capture the evolutionary semantics of protein sequences. Enriched with multiple sequence alignments (MSA), these models can encode protein tertiary structure. In this work, we introduce an attention-based graph architecture that exploits MSA Transformer embeddings to directly produce three-dimensional folded structures from protein sequences. We envision that this pipeline will provide a basis for efficient, end-to-end protein structure prediction.


2010 ◽  
Vol 2010 ◽  
pp. 1-9 ◽  
Author(s):  
Adeel Malik ◽  
Ahmad Firoz ◽  
Vivekanand Jha ◽  
Shandar Ahmad

Understanding of the three-dimensional structures of proteins that interact with carbohydrates covalently (glycoproteins) as well as noncovalently (protein-carbohydrate complexes) is essential to many biological processes and plays a significant role in normal and disease-associated functions. It is important to have a central repository of knowledge available about these protein-carbohydrate complexes as well as preprocessed data of predicted structures. This can be significantly enhanced by tools de novo which can predict carbohydrate-binding sites for proteins in the absence of structure of experimentally known binding site. PROCARB is an open-access database comprising three independently working components, namely, (i) Core PROCARB module, consisting of three-dimensional structures of protein-carbohydrate complexes taken from Protein Data Bank (PDB), (ii) Homology Models module, consisting of manually developed three-dimensional models of N-linked and O-linked glycoproteins of unknown three-dimensional structure, and (iii) CBS-Pred prediction module, consisting of web servers to predict carbohydrate-binding sites using single sequence or server-generated PSSM. Several precomputed structural and functional properties of complexes are also included in the database for quick analysis. In particular, information about function, secondary structure, solvent accessibility, hydrogen bonds and literature reference, and so forth, is included. In addition, each protein in the database is mapped to Uniprot, Pfam, PDB, and so forth.


2021 ◽  
Author(s):  
Richard John Wheeler

AbstractAlphaFold2 and RoseTTAfold represent a transformative advance for predicting protein structure. They are able to make very high-quality predictions given a high-quality alignment of the protein sequence with related proteins. These predictions are now readily available via the AlphaFold database of predicted structures and AlphaFold/RoseTTAfold Colaboratory notebooks for custom predictions. However, predictions for some species tend to be lower confidence than model organisms. This includes Trypanosoma cruzi and Leishmania infantum: important unicellular eukaryotic human parasites in an early-branching eukaryotic lineage. The cause appears to be due to poor sampling of this branch of life in the protein sequences databases used for the AlphaFold database and ColabFold. Here, by comprehensively gathering openly available protein sequence data for species from this lineage, significant improvements to AlphaFold2 protein structure prediction over the AlphaFold database and ColabFold are demonstrated. This is made available as an easy-to-use tool for the parasitology community in the form of Colaboratory notebooks for generating multiple sequence alignments and AlphaFold2 predictions of protein structure for Trypanosoma, Leishmania and related species.


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0259871
Author(s):  
Richard John Wheeler

AlphaFold2 and RoseTTAfold represent a transformative advance for predicting protein structure. They are able to make very high-quality predictions given a high-quality alignment of the protein sequence with related proteins. These predictions are now readily available via the AlphaFold database of predicted structures and AlphaFold or RoseTTAfold Colaboratory notebooks for custom predictions. However, predictions for some species tend to be lower confidence than model organisms. Problematic species include Trypanosoma cruzi and Leishmania infantum: important unicellular eukaryotic human parasites in an early-branching eukaryotic lineage. The cause appears to be due to poor sampling of this branch of life (Discoba) in the protein sequences databases used for the AlphaFold database and ColabFold. Here, by comprehensively gathering openly available protein sequence data for Discoba species, significant improvements to AlphaFold2 protein structure prediction over the AlphaFold database and ColabFold are demonstrated. This is made available as an easy-to-use tool for the parasitology community in the form of Colaboratory notebooks for generating multiple sequence alignments and AlphaFold2 predictions of protein structure for Trypanosoma, Leishmania and related species.


Author(s):  
M. Boublik ◽  
W. Hellmann ◽  
F. Jenkins

The present knowledge of the three-dimensional structure of ribosomes is far too limited to enable a complete understanding of the various roles which ribosomes play in protein biosynthesis. The spatial arrangement of proteins and ribonuclec acids in ribosomes can be analysed in many ways. Determination of binding sites for individual proteins on ribonuclec acid and locations of the mutual positions of proteins on the ribosome using labeling with fluorescent dyes, cross-linking reagents, neutron-diffraction or antibodies against ribosomal proteins seem to be most successful approaches. Structure and function of ribosomes can be correlated be depleting the complete ribosomes of some proteins to the functionally inactive core and by subsequent partial reconstitution in order to regain active ribosomal particles.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Farhan Quadir ◽  
Raj S. Roy ◽  
Randal Halfmann ◽  
Jianlin Cheng

AbstractDeep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.


1994 ◽  
Vol 126 (2) ◽  
pp. 433-443 ◽  
Author(s):  
A McGough ◽  
M Way ◽  
D DeRosier

The three-dimensional structure of actin filaments decorated with the actin-binding domain of chick smooth muscle alpha-actinin (alpha A1-2) has been determined to 21-A resolution. The shape and location of alpha A1-2 was determined by subtracting maps of F-actin from the reconstruction of decorated filaments. alpha A1-2 resembles a bell that measures approximately 38 A at its base and extends 42 A from its base to its tip. In decorated filaments, the base of alpha A1-2 is centered about the outer face of subdomain 2 of actin and contacts subdomain 1 of two neighboring monomers along the long-pitch (two-start) helical strands. Using the atomic model of F-actin (Lorenz, M., D. Popp, and K. C. Holmes. 1993. J. Mol. Biol. 234:826-836.), we have been able to test directly the likelihood that specific actin residues, which have been previously identified by others, interact with alpha A1-2. Our results indicate that residues 86-117 and 350-375 comprise distinct binding sites for alpha-actinin on adjacent actin monomers.


Sign in / Sign up

Export Citation Format

Share Document