SphereCon—a method for precise estimation of residue relative solvent accessible area from limited structural information

Alexander Gress; Olga V Kalinina

doi:10.1093/bioinformatics/btaa159

SphereCon—a method for precise estimation of residue relative solvent accessible area from limited structural information

Bioinformatics ◽

10.1093/bioinformatics/btaa159 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3372-3378

Author(s):

Alexander Gress ◽

Olga V Kalinina

Keyword(s):

Protein Function ◽

Structural Information ◽

Solvent Accessibility ◽

Three Dimensional ◽

Structural Data ◽

Supplementary Information ◽

Dimensional Structure ◽

Relative Solvent Accessibility ◽

Precise Measure ◽

The Impact

Abstract Motivation In proteins, solvent accessibility of individual residues is a factor contributing to their importance for protein function and stability. Hence one might wish to calculate solvent accessibility in order to predict the impact of mutations, their pathogenicity and for other biomedical applications. A direct computation of solvent accessibility is only possible if all atoms of a protein three-dimensional structure are reliably resolved. Results We present SphereCon, a new precise measure that can estimate residue relative solvent accessibility (RSA) from limited data. The measure is based on calculating the volume of intersection of a sphere with a cone cut out in the direction opposite of the residue with surrounding atoms. We propose a method for estimating the position and volume of residue atoms in cases when they are not known from the structure, or when the structural data are unreliable or missing. We show that in cases of reliable input structures, SphereCon correlates almost perfectly with the directly computed RSA, and outperforms other previously suggested indirect methods. Moreover, SphereCon is the only measure that yields accurate results when the identities of amino acids are unknown. A significant novel feature of SphereCon is that it can estimate RSA from inter-residue distance and contact matrices, without any information about the actual atom coordinates. Availability and implementation https://github.com/kalininalab/spherecon. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Prediction and Analysis of Surface Hydrophobic Residues in Tertiary Structure of Proteins

The Scientific World JOURNAL ◽

10.1155/2014/971258 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 19

Author(s):

Shambhu Malleshappa Gowder ◽

Jhinuk Chatterjee ◽

Tanusree Chaudhuri ◽

Kusum Paul

Keyword(s):

Tertiary Structure ◽

Structural Information ◽

Solvent Accessibility ◽

Conservation Score ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Data Set ◽

Hydrophobic Residues ◽

Monomeric Proteins

The analysis of protein structures provides plenty of information about the factors governing the folding and stability of proteins, the preferred amino acids in the protein environment, the location of the residues in the interior/surface of a protein and so forth. In general, hydrophobic residues such as Val, Leu, Ile, Phe, and Met tend to be buried in the interior and polar side chains exposed to solvent. The present work depends on sequence as well as structural information of the protein and aims to understand nature of hydrophobic residues on the protein surfaces. It is based on the nonredundant data set of 218 monomeric proteins. Solvent accessibility of each protein was determined using NACCESS software and then obtained the homologous sequences to understand how well solvent exposed and buried hydrophobic residues are evolutionarily conserved and assigned the confidence scores to hydrophobic residues to be buried or solvent exposed based on the information obtained from conservation score and knowledge of flanking regions of hydrophobic residues. In the absence of a three-dimensional structure, the ability to predict surface accessibility of hydrophobic residues directly from the sequence is of great help in choosing the sites of chemical modification or specific mutations and in the studies of protein stability and molecular interactions.

Download Full-text

BiRDS - Binding Residue Detection from Protein Sequences using Deep ResNets

10.33774/chemrxiv-2021-013gn-v2 ◽

2021 ◽

Author(s):

Vineeth Chelur ◽

U. Deva Priyakumar

Keyword(s):

Binding Site ◽

Binding Sites ◽

Tertiary Structure ◽

Solvent Accessibility ◽

Three Dimensional ◽

Dimensional Structure ◽

Relative Solvent Accessibility ◽

Single Chain ◽

Sequence Alignments ◽

Multiple Sequence

Protein-drug interactions play important roles in many biological processes and therapeutics. Prediction of the active binding site of a protein helps discover and optimise these interactions leading to the design of better ligand molecules. The tertiary structure of a protein determines the binding sites available to the drug molecule. A quick and accurate prediction of the binding site from sequence alone without utilising the three-dimensional structure is challenging. Deep Learning has been used in a variety of biochemical tasks and has been hugely successful. In this paper, a Residual Neural Network (leveraging skip connections) is implemented to predict a protein's most active binding site. An Annotated Database of Druggable Binding Sites from the Protein DataBank, sc-PDB, is used for training the network. Features extracted from the Multiple Sequence Alignments (MSAs) of the protein generated using DeepMSA, such as Position-Specific Scoring Matrix (PSSM), Secondary Structure (SS3), and Relative Solvent Accessibility (RSA), are provided as input to the network. A weighted binary cross-entropy loss function is used to counter the substantial imbalance in the two classes of binding and non-binding residues. The network performs very well on single-chain proteins, providing a pocket that has good interactions with a ligand.

Download Full-text

Streamlined use of protein structures in variant analysis

10.1101/2021.09.10.459756 ◽

2021 ◽

Author(s):

Sandeep Kaur ◽

Neblina Sikta ◽

Andrea Schafferhans ◽

Nicola Bordin ◽

Mark J. Cowley ◽

...

Keyword(s):

Protein Function ◽

Molecular Mechanisms ◽

Structural Information ◽

Protein Structures ◽

Structural Data ◽

Supplementary Information ◽

3D Structures ◽

Link Type ◽

Variant Analysis ◽

Many Sources

AbstractMotivationVariant analysis is a core task in bioinformatics that requires integrating data from many sources. This process can be helped by using 3D structures of proteins, which can provide a spatial context that can provide insight into how variants affect function. Many available tools can help with mapping variants onto structures; but each has specific restrictions, with the result that many researchers fail to benefit from valuable insights that could be gained from structural data.ResultsTo address this, we have created a streamlined system for incorporating 3D structures into variant analysis. Variants can be easily specified via URLs that are easily readable and writable, and use the notation recommended by the Human Genome Variation Society (HGVS). For example, ‘https://aquaria.app/SARS-CoV-2/S/?N501Y’ specifies the N501Y variant of SARS-CoV-2 S protein. In addition to mapping variants onto structures, our system provides summary information from multiple external resources, including COSMIC, CATH-FunVar, and PredictProtein. Furthermore, our system identifies and summarizes structures containing the variant, as well as the variant-position. Our system supports essentially any mutation for any well-studied protein, and uses all available structural data — including models inferred via very remote homology — integrated into a system that is fast and simple to use. By giving researchers easy, streamlined access to a wealth of structural information during variant analysis, our system will help in revealing novel insights into the molecular mechanisms underlying protein function in health and disease.AvailabilityOur resource is freely available at the project home page (https://aquaria.app). After peer review, the code will be openly available via a GPL version 2 license at https://github.com/ODonoghueLab/Aquaria. PSSH2, the database of sequence-to-structure alignments, is also freely available for download at https://zenodo.org/record/[email protected] informationNone.

Download Full-text

The impact of protein architecture on adaptive evolution

10.1101/560185 ◽

2019 ◽

Author(s):

Ana Filipa Moutinho ◽

Fernanda Fontes Trancoso ◽

Julien Yann Dutheil

Keyword(s):

Amino Acid ◽

Adaptive Evolution ◽

Protein Function ◽

Population Genomics ◽

Solvent Accessibility ◽

Protein Biosynthesis ◽

Relative Solvent Accessibility ◽

Adaptive Mutations ◽

Protein Architecture ◽

The Impact

AbstractAdaptive mutations play an important role in molecular evolution. However, the frequency and nature of these mutations at the intra-molecular level is poorly understood. To address this, we analysed the impact of protein architecture on the rate of adaptive substitutions, aiming to understand how protein biophysics influences fitness and adaptation. Using Drosophila melanogaster and Arabidopsis thaliana population genomics data, we fitted models of distribution of fitness effects and estimated the rate of adaptive amino-acid substitutions both at the protein and amino-acid residue level. We performed a comprehensive analysis covering genome, gene and protein structure, by exploring a multitude of factors with a plausible impact on the rate of adaptive evolution, such as intron number, protein length, secondary structure, relative solvent accessibility, intrinsic protein disorder, chaperone affinity, gene expression, protein function and protein-protein interactions. We found that the relative solvent accessibility is a major driver of adaptive evolution, with most adaptive mutations occurring at the surface of proteins. Moreover, we observe that the rate of adaptive substitutions differs between protein functional classes, with genes encoding for protein biosynthesis and degradation signalling exhibiting the fastest rates of protein adaptation. Overall, our results suggest that adaptive evolution in proteins is mainly driven by inter-molecular interactions, with host-pathogen coevolution likely playing a major role.

Download Full-text

The Impact of Protein Architecture on Adaptive Evolution

Molecular Biology and Evolution ◽

10.1093/molbev/msz134 ◽

2019 ◽

Vol 36 (9) ◽

pp. 2013-2028 ◽

Cited By ~ 5

Author(s):

Ana Filipa Moutinho ◽

Fernanda Fontes Trancoso ◽

Julien Yann Dutheil

Keyword(s):

Amino Acid ◽

Adaptive Evolution ◽

Protein Function ◽

Population Genomics ◽

Solvent Accessibility ◽

Protein Biosynthesis ◽

Relative Solvent Accessibility ◽

Adaptive Mutations ◽

Protein Architecture ◽

The Impact

Abstract Adaptive mutations play an important role in molecular evolution. However, the frequency and nature of these mutations at the intramolecular level are poorly understood. To address this, we analyzed the impact of protein architecture on the rate of adaptive substitutions, aiming to understand how protein biophysics influences fitness and adaptation. Using Drosophila melanogaster and Arabidopsis thaliana population genomics data, we fitted models of distribution of fitness effects and estimated the rate of adaptive amino-acid substitutions both at the protein and amino-acid residue level. We performed a comprehensive analysis covering genome, gene, and protein structure, by exploring a multitude of factors with a plausible impact on the rate of adaptive evolution, such as intron number, protein length, secondary structure, relative solvent accessibility, intrinsic protein disorder, chaperone affinity, gene expression, protein function, and protein–protein interactions. We found that the relative solvent accessibility is a major determinant of adaptive evolution, with most adaptive mutations occurring at the surface of proteins. Moreover, we observe that the rate of adaptive substitutions differs between protein functional classes, with genes encoding for protein biosynthesis and degradation signaling exhibiting the fastest rates of protein adaptation. Overall, our results suggest that adaptive evolution in proteins is mainly driven by intermolecular interactions, with host–pathogen coevolution likely playing a major role.

Download Full-text

Ancestral sequence reconstruction: accounting for structural information by averaging over replacement matrices

Bioinformatics ◽

10.1093/bioinformatics/bty1031 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2562-2568

Author(s):

Asher Moshe ◽

Tal Pupko

Keyword(s):

Structural Information ◽

Solvent Accessibility ◽

3D Structure ◽

Three Dimensional ◽

Ancestral Sequence ◽

Supplementary Information ◽

Ancestral Sequence Reconstruction ◽

Ancestral Sequences ◽

Sequence Reconstruction ◽

And Function

Abstract Motivation Ancestral sequence reconstruction (ASR) is widely used to understand protein evolution, structure and function. Current ASR methodologies do not fully consider differences in evolutionary constraints among positions imposed by the three-dimensional (3D) structure of the protein. Here, we developed an ASR algorithm that allows different protein sites to evolve according to different mixtures of replacement matrices. We show that assigning replacement matrices to protein positions based on their solvent accessibility leads to ASR with higher log-likelihoods compared to naïve models that assume a single replacement matrix for all sites. Improved ASR log-likelihoods are also demonstrated when solvent accessibility is predicted from protein sequences rather than inferred from a known 3D structure. Finally, we show that using such structure-aware mixture models results in substantial differences in the inferred ancestral sequences. Availability and implementation http://fastml.tau.ac.il. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Solvent Accessibility of Residues Undergoing Pathogenic Variations in Humans: From Protein Structures to Protein Sequences

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2020.626363 ◽

2021 ◽

Vol 7 ◽

Author(s):

Castrense Savojardo ◽

Matteo Manfredi ◽

Pier Luigi Martelli ◽

Rita Casadio

Keyword(s):

Solvent Accessibility ◽

Protein Structures ◽

Three Dimensional ◽

Protein Sequences ◽

Large Data ◽

Human Protein ◽

Dimensional Structure ◽

Wild Type ◽

Solvent Exposure ◽

Data Set

Solvent accessibility (SASA) is a key feature of proteins for determining their folding and stability. SASA is computed from protein structures with different algorithms, and from protein sequences with machine-learning based approaches trained on solved structures. Here we ask the question as to which extent solvent exposure of residues can be associated to the pathogenicity of the variation. By this, SASA of the wild-type residue acquires a role in the context of functional annotation of protein single-residue variations (SRVs). By mapping variations on a curated database of human protein structures, we found that residues targeted by disease related SRVs are less accessible to solvent than residues involved in polymorphisms. The disease association is not evenly distributed among the different residue types: SRVs targeting glycine, tryptophan, tyrosine, and cysteine are more frequently disease associated than others. For all residues, the proportion of disease related SRVs largely increases when the wild-type residue is buried and decreases when it is exposed. The extent of the increase depends on the residue type. With the aid of an in house developed predictor, based on a deep learning procedure and performing at the state-of-the-art, we are able to confirm the above tendency by analyzing a large data set of residues subjected to variations and occurring in some 12,494 human protein sequences still lacking three-dimensional structure (derived from HUMSAVAR). Our data support the notion that surface accessible area is a distinguished property of residues that undergo variation and that pathogenicity is more frequently associated to the buried property than to the exposed one.

Download Full-text

A current assessment of photosystem II structure

Bioscience Reports ◽

10.1007/bf01206204 ◽

1996 ◽

Vol 16 (2) ◽

pp. 159-187 ◽

Cited By ~ 10

Author(s):

William V. Nicholson ◽

Robert C. Ford ◽

Andreas Holzenburg

Keyword(s):

Photosystem Ii ◽

Structural Information ◽

Three Dimensional ◽

Structural Data ◽

X Ray Diffraction ◽

Membrane Protein Complex ◽

Fast Absorption ◽

Current Assessment ◽

Electron Microscopical ◽

A Current

This review covers the recent progress in the elucidation of the structure of photosystem II (PSII). Because much of the structural information for this membrane protein complex has been revealed by electron microscopy (EM), the review will also consider the specific technical and interpretation problems that arise with EM where they are of particular relevance to the structural data. Most recent reviews of photosystem II structure have concentrated on molecular studies of the PSII genes and on the likely roles of the subunits that they encode or they were mainly concerned with the biophysical data and fast absorption spectroscopy largely relating to electron transfer in various purified PSII preparations. In this review, we will focus on the approaches to the three-dimensional architecture of the complex and the lipid bilayer in which it is located (the thylakoid membrane) with special emphasis placed upon electron microscopical studies of PSII-containing thylakoid membranes. There are a few reports of 3D crystals of PSII and of associated X-ray diffraction measurements and although little structural information has so far been obtained from such studies (because of the lack of 3D crystals of sufficient quality), the prospects for such studies are also assessed.

Download Full-text

Protein Structure Determination in Living Cells

International Journal of Molecular Sciences ◽

10.3390/ijms20102442 ◽

2019 ◽

Vol 20 (10) ◽

pp. 2442 ◽

Cited By ~ 2

Author(s):

Teppei Ikeya ◽

Peter Güntert ◽

Yutaka Ito

Keyword(s):

Protein Structure ◽

Structure Determination ◽

Structure Prediction ◽

Structural Information ◽

Nuclear Overhauser Effect ◽

Protein Structures ◽

Three Dimensional ◽

Structural Data ◽

Sample Tube ◽

In Cells

To date, in-cell NMR has elucidated various aspects of protein behaviour by associating structures in physiological conditions. Meanwhile, current studies of this method mostly have deduced protein states in cells exclusively based on ‘indirect’ structural information from peak patterns and chemical shift changes but not ‘direct’ data explicitly including interatomic distances and angles. To fully understand the functions and physical properties of proteins inside cells, it is indispensable to obtain explicit structural data or determine three-dimensional (3D) structures of proteins in cells. Whilst the short lifetime of cells in a sample tube, low sample concentrations, and massive background signals make it difficult to observe NMR signals from proteins inside cells, several methodological advances help to overcome the problems. Paramagnetic effects have an outstanding potential for in-cell structural analysis. The combination of a limited amount of experimental in-cell data with software for ab initio protein structure prediction opens an avenue to visualise 3D protein structures inside cells. Conventional nuclear Overhauser effect spectroscopy (NOESY)-based structure determination is advantageous to elucidate the conformations of side-chain atoms of proteins as well as global structures. In this article, we review current progress for the structure analysis of proteins in living systems and discuss the feasibility of its future works.

Download Full-text

Susceptibility of protein therapeutics to spontaneous chemical modifications by oxidation, cyclization, and elimination reactions

Amino Acids ◽

10.1007/s00726-019-02787-2 ◽

2019 ◽

Vol 51 (10-12) ◽

pp. 1409-1431 ◽

Cited By ~ 5

Author(s):

Luigi Grassi ◽

Chiara Cabrele

Keyword(s):

Small Molecules ◽

Three Dimensional ◽

Drug Market ◽

Structure And Function ◽

Dimensional Structure ◽

Chemical Modifications ◽

Small Molecule Drugs ◽

And Function ◽

The Impact

Abstract Peptides and proteins are preponderantly emerging in the drug market, as shown by the increasing number of biopharmaceutics already approved or under development. Biomolecules like recombinant monoclonal antibodies have high therapeutic efficacy and offer a valuable alternative to small-molecule drugs. However, due to their complex three-dimensional structure and the presence of many functional groups, the occurrence of spontaneous conformational and chemical changes is much higher for peptides and proteins than for small molecules. The characterization of biotherapeutics with modern and sophisticated analytical methods has revealed the presence of contaminants that mainly arise from oxidation- and elimination-prone amino-acid side chains. This review focuses on protein chemical modifications that may take place during storage due to (1) oxidation (methionine, cysteine, histidine, tyrosine, tryptophan, and phenylalanine), (2) intra- and inter-residue cyclization (aspartic and glutamic acid, asparagine, glutamine, N-terminal dipeptidyl motifs), and (3) β-elimination (serine, threonine, cysteine, cystine) reactions. It also includes some examples of the impact of such modifications on protein structure and function.

Download Full-text