BiRDS - Binding Residue Detection from Protein Sequences using Deep ResNets

Mapping Intimacies ◽

10.33774/chemrxiv-2021-013gn-v2 ◽

2021 ◽

Author(s):

Vineeth Chelur ◽

U. Deva Priyakumar

Keyword(s):

Binding Site ◽

Binding Sites ◽

Tertiary Structure ◽

Solvent Accessibility ◽

Three Dimensional ◽

Dimensional Structure ◽

Relative Solvent Accessibility ◽

Single Chain ◽

Sequence Alignments ◽

Multiple Sequence

Protein-drug interactions play important roles in many biological processes and therapeutics. Prediction of the active binding site of a protein helps discover and optimise these interactions leading to the design of better ligand molecules. The tertiary structure of a protein determines the binding sites available to the drug molecule. A quick and accurate prediction of the binding site from sequence alone without utilising the three-dimensional structure is challenging. Deep Learning has been used in a variety of biochemical tasks and has been hugely successful. In this paper, a Residual Neural Network (leveraging skip connections) is implemented to predict a protein's most active binding site. An Annotated Database of Druggable Binding Sites from the Protein DataBank, sc-PDB, is used for training the network. Features extracted from the Multiple Sequence Alignments (MSAs) of the protein generated using DeepMSA, such as Position-Specific Scoring Matrix (PSSM), Secondary Structure (SS3), and Relative Solvent Accessibility (RSA), are provided as input to the network. A weighted binary cross-entropy loss function is used to counter the substantial imbalance in the two classes of binding and non-binding residues. The network performs very well on single-chain proteins, providing a pocket that has good interactions with a ligand.

Download Full-text

SphereCon—a method for precise estimation of residue relative solvent accessible area from limited structural information

Bioinformatics ◽

10.1093/bioinformatics/btaa159 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3372-3378

Author(s):

Alexander Gress ◽

Olga V Kalinina

Keyword(s):

Protein Function ◽

Structural Information ◽

Solvent Accessibility ◽

Three Dimensional ◽

Structural Data ◽

Supplementary Information ◽

Dimensional Structure ◽

Relative Solvent Accessibility ◽

Precise Measure ◽

The Impact

Abstract Motivation In proteins, solvent accessibility of individual residues is a factor contributing to their importance for protein function and stability. Hence one might wish to calculate solvent accessibility in order to predict the impact of mutations, their pathogenicity and for other biomedical applications. A direct computation of solvent accessibility is only possible if all atoms of a protein three-dimensional structure are reliably resolved. Results We present SphereCon, a new precise measure that can estimate residue relative solvent accessibility (RSA) from limited data. The measure is based on calculating the volume of intersection of a sphere with a cone cut out in the direction opposite of the residue with surrounding atoms. We propose a method for estimating the position and volume of residue atoms in cases when they are not known from the structure, or when the structural data are unreliable or missing. We show that in cases of reliable input structures, SphereCon correlates almost perfectly with the directly computed RSA, and outperforms other previously suggested indirect methods. Moreover, SphereCon is the only measure that yields accurate results when the identities of amino acids are unknown. A significant novel feature of SphereCon is that it can estimate RSA from inter-residue distance and contact matrices, without any information about the actual atom coordinates. Availability and implementation https://github.com/kalininalab/spherecon. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Prediction and Analysis of Surface Hydrophobic Residues in Tertiary Structure of Proteins

The Scientific World JOURNAL ◽

10.1155/2014/971258 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 19

Author(s):

Shambhu Malleshappa Gowder ◽

Jhinuk Chatterjee ◽

Tanusree Chaudhuri ◽

Kusum Paul

Keyword(s):

Tertiary Structure ◽

Structural Information ◽

Solvent Accessibility ◽

Conservation Score ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Data Set ◽

Hydrophobic Residues ◽

Monomeric Proteins

The analysis of protein structures provides plenty of information about the factors governing the folding and stability of proteins, the preferred amino acids in the protein environment, the location of the residues in the interior/surface of a protein and so forth. In general, hydrophobic residues such as Val, Leu, Ile, Phe, and Met tend to be buried in the interior and polar side chains exposed to solvent. The present work depends on sequence as well as structural information of the protein and aims to understand nature of hydrophobic residues on the protein surfaces. It is based on the nonredundant data set of 218 monomeric proteins. Solvent accessibility of each protein was determined using NACCESS software and then obtained the homologous sequences to understand how well solvent exposed and buried hydrophobic residues are evolutionarily conserved and assigned the confidence scores to hydrophobic residues to be buried or solvent exposed based on the information obtained from conservation score and knowledge of flanking regions of hydrophobic residues. In the absence of a three-dimensional structure, the ability to predict surface accessibility of hydrophobic residues directly from the sequence is of great help in choosing the sites of chemical modification or specific mutations and in the studies of protein stability and molecular interactions.

Download Full-text

A novel protein descriptor for the prediction of drug binding sites

BMC Bioinformatics ◽

10.1186/s12859-019-3058-0 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Mingjian Jiang ◽

Zhen Li ◽

Yujie Bian ◽

Zhiqiang Wei

Keyword(s):

Binding Site ◽

Binding Sites ◽

Three Dimensional ◽

Drug Binding ◽

Dimensional Structure ◽

Three Dimensional Structure ◽

Site Prediction ◽

Drug Binding Sites ◽

Protein Descriptor ◽

Novel Protein

Abstract Background Binding sites are the pockets of proteins that can bind drugs; the discovery of these pockets is a critical step in drug design. With the help of computers, protein pockets prediction can save manpower and financial resources. Results In this paper, a novel protein descriptor for the prediction of binding sites is proposed. Information on non-bonded interactions in the three-dimensional structure of a protein is captured by a combination of geometry-based and energy-based methods. Moreover, due to the rapid development of deep learning, all binding features are extracted to generate three-dimensional grids that are fed into a convolution neural network. Two datasets were introduced into the experiment. The sc-PDB dataset was used for descriptor extraction and binding site prediction, and the PDBbind dataset was used only for testing and verification of the generalization of the method. The comparison with previous methods shows that the proposed descriptor is effective in predicting the binding sites. Conclusions A new protein descriptor is proposed for the prediction of the drug binding sites of proteins. This method combines the three-dimensional structure of a protein and non-bonded interactions with small molecules to involve important factors influencing the formation of binding site. Analysis of the experiments indicates that the descriptor is robust for site prediction.

Download Full-text

Distillation of MSA Embeddings to Folded Protein Structures with Graph Transformers

10.1101/2021.06.02.446809 ◽

2021 ◽

Author(s):

Allan Costa ◽

Manvitha Ponnapati ◽

Joseph M Jacobson ◽

Pranam Chatterjee

Keyword(s):

Structure Prediction ◽

Tertiary Structure ◽

Protein Structures ◽

Three Dimensional ◽

Protein Sequences ◽

Language Models ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Folded Structures

Determining the structure of proteins has been a long-standing goal in biology. Language models have been recently deployed to capture the evolutionary semantics of protein sequences. Enriched with multiple sequence alignments (MSA), these models can encode protein tertiary structure. In this work, we introduce an attention-based graph architecture that exploits MSA Transformer embeddings to directly produce three-dimensional folded structures from protein sequences. We envision that this pipeline will provide a basis for efficient, end-to-end protein structure prediction.

Download Full-text

PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools

Advances in Bioinformatics ◽

10.1155/2010/436036 ◽

2010 ◽

Vol 2010 ◽

pp. 1-9 ◽

Cited By ~ 14

Author(s):

Adeel Malik ◽

Ahmad Firoz ◽

Vivekanand Jha ◽

Shandar Ahmad

Keyword(s):

Binding Sites ◽

De Novo ◽

Solvent Accessibility ◽

Protein Structures ◽

Three Dimensional ◽

Data Bank ◽

Dimensional Structure ◽

Carbohydrate Binding ◽

Structural And Functional Properties ◽

Three Dimensional Models

Understanding of the three-dimensional structures of proteins that interact with carbohydrates covalently (glycoproteins) as well as noncovalently (protein-carbohydrate complexes) is essential to many biological processes and plays a significant role in normal and disease-associated functions. It is important to have a central repository of knowledge available about these protein-carbohydrate complexes as well as preprocessed data of predicted structures. This can be significantly enhanced by tools de novo which can predict carbohydrate-binding sites for proteins in the absence of structure of experimentally known binding site. PROCARB is an open-access database comprising three independently working components, namely, (i) Core PROCARB module, consisting of three-dimensional structures of protein-carbohydrate complexes taken from Protein Data Bank (PDB), (ii) Homology Models module, consisting of manually developed three-dimensional models of N-linked and O-linked glycoproteins of unknown three-dimensional structure, and (iii) CBS-Pred prediction module, consisting of web servers to predict carbohydrate-binding sites using single sequence or server-generated PSSM. Several precomputed structural and functional properties of complexes are also included in the database for quick analysis. In particular, information about function, secondary structure, solvent accessibility, hydrogen bonds and literature reference, and so forth, is included. In addition, each protein in the database is mapped to Uniprot, Pfam, PDB, and so forth.

Download Full-text

A resource for improved predictions of Trypanosoma and Leishmania protein three-dimensional structure

10.1101/2021.09.02.458674 ◽

2021 ◽

Author(s):

Richard John Wheeler

Keyword(s):

Protein Structure ◽

Protein Sequence ◽

Structure Prediction ◽

Sequence Data ◽

Three Dimensional ◽

Model Organisms ◽

Dimensional Structure ◽

Sequence Alignments ◽

High Quality ◽

Multiple Sequence

AbstractAlphaFold2 and RoseTTAfold represent a transformative advance for predicting protein structure. They are able to make very high-quality predictions given a high-quality alignment of the protein sequence with related proteins. These predictions are now readily available via the AlphaFold database of predicted structures and AlphaFold/RoseTTAfold Colaboratory notebooks for custom predictions. However, predictions for some species tend to be lower confidence than model organisms. This includes Trypanosoma cruzi and Leishmania infantum: important unicellular eukaryotic human parasites in an early-branching eukaryotic lineage. The cause appears to be due to poor sampling of this branch of life in the protein sequences databases used for the AlphaFold database and ColabFold. Here, by comprehensively gathering openly available protein sequence data for species from this lineage, significant improvements to AlphaFold2 protein structure prediction over the AlphaFold database and ColabFold are demonstrated. This is made available as an easy-to-use tool for the parasitology community in the form of Colaboratory notebooks for generating multiple sequence alignments and AlphaFold2 predictions of protein structure for Trypanosoma, Leishmania and related species.

Download Full-text

A resource for improved predictions of Trypanosoma and Leishmania protein three-dimensional structure

PLoS ONE ◽

10.1371/journal.pone.0259871 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0259871

Author(s):

Richard John Wheeler

Keyword(s):

Protein Structure ◽

Protein Sequence ◽

Structure Prediction ◽

Sequence Data ◽

Three Dimensional ◽

Model Organisms ◽

Dimensional Structure ◽

Sequence Alignments ◽

High Quality ◽

Multiple Sequence

AlphaFold2 and RoseTTAfold represent a transformative advance for predicting protein structure. They are able to make very high-quality predictions given a high-quality alignment of the protein sequence with related proteins. These predictions are now readily available via the AlphaFold database of predicted structures and AlphaFold or RoseTTAfold Colaboratory notebooks for custom predictions. However, predictions for some species tend to be lower confidence than model organisms. Problematic species include Trypanosoma cruzi and Leishmania infantum: important unicellular eukaryotic human parasites in an early-branching eukaryotic lineage. The cause appears to be due to poor sampling of this branch of life (Discoba) in the protein sequences databases used for the AlphaFold database and ColabFold. Here, by comprehensively gathering openly available protein sequence data for Discoba species, significant improvements to AlphaFold2 protein structure prediction over the AlphaFold database and ColabFold are demonstrated. This is made available as an easy-to-use tool for the parasitology community in the form of Colaboratory notebooks for generating multiple sequence alignments and AlphaFold2 predictions of protein structure for Trypanosoma, Leishmania and related species.

Download Full-text

Conformation of Ribosomes from Escherichia Coli

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100051062 ◽

1974 ◽

Vol 32 ◽

pp. 210-211

Author(s):

M. Boublik ◽

W. Hellmann ◽

F. Jenkins

Keyword(s):

Binding Sites ◽

Ribosomal Proteins ◽

Fluorescent Dyes ◽

Protein Biosynthesis ◽

Three Dimensional ◽

Spatial Arrangement ◽

Dimensional Structure ◽

Complete Understanding ◽

And Function

The present knowledge of the three-dimensional structure of ribosomes is far too limited to enable a complete understanding of the various roles which ribosomes play in protein biosynthesis. The spatial arrangement of proteins and ribonuclec acids in ribosomes can be analysed in many ways. Determination of binding sites for individual proteins on ribonuclec acid and locations of the mutual positions of proteins on the ribosome using labeling with fluorescent dyes, cross-linking reagents, neutron-diffraction or antibodies against ribosomal proteins seem to be most successful approaches. Structure and function of ribosomes can be correlated be depleting the complete ribosomes of some proteins to the functionally inactive core and by subsequent partial reconstitution in order to regain active ribosomal particles.

Download Full-text

DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning

Scientific Reports ◽

10.1038/s41598-021-91827-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Farhan Quadir ◽

Raj S. Roy ◽

Randal Halfmann ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Tertiary Structure ◽

Protein Complexes ◽

Complex Structure ◽

Great Success ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Residue Contacts ◽

Evolutionary Features

AbstractDeep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.

Download Full-text

Determination of the alpha-actinin-binding site on actin filaments by cryoelectron microscopy and image analysis.

The Journal of Cell Biology ◽

10.1083/jcb.126.2.433 ◽

1994 ◽

Vol 126 (2) ◽

pp. 433-443 ◽

Cited By ~ 124

Author(s):

A McGough ◽

M Way ◽

D DeRosier

Keyword(s):

Image Analysis ◽

Smooth Muscle ◽

Binding Sites ◽

Actin Filaments ◽

Three Dimensional ◽

Actin Binding ◽

Dimensional Structure ◽

Outer Face ◽

Actin Binding Domain

The three-dimensional structure of actin filaments decorated with the actin-binding domain of chick smooth muscle alpha-actinin (alpha A1-2) has been determined to 21-A resolution. The shape and location of alpha A1-2 was determined by subtracting maps of F-actin from the reconstruction of decorated filaments. alpha A1-2 resembles a bell that measures approximately 38 A at its base and extends 42 A from its base to its tip. In decorated filaments, the base of alpha A1-2 is centered about the outer face of subdomain 2 of actin and contacts subdomain 1 of two neighboring monomers along the long-pitch (two-start) helical strands. Using the atomic model of F-actin (Lorenz, M., D. Popp, and K. C. Holmes. 1993. J. Mol. Biol. 234:826-836.), we have been able to test directly the likelihood that specific actin residues, which have been previously identified by others, interact with alpha A1-2. Our results indicate that residues 86-117 and 350-375 comprise distinct binding sites for alpha-actinin on adjacent actin monomers.

Download Full-text