How do I get the most out of my protein sequence using bioinformatics tools?

Biochemical and biophysical experiments are essential for uncovering the three-dimensional structure and biological role of a protein of interest. However, meaningful predictions can frequently also be made using bioinformatics resources that transfer knowledge from a well studied protein to an uncharacterized protein based on their evolutionary relatedness. These predictions are helpful in developing specific hypotheses to guide wet-laboratory experiments. Commonly used bioinformatics resources include methods to identify and predict conserved sequence motifs, protein domains, transmembrane segments, signal sequences, and secondary as well as tertiary structure. Here, several such methods available through the MPI Bioinformatics Toolkit (https://toolkit.tuebingen.mpg.de) are described and how their combined use can provide meaningful information on a protein of unknown function is demonstrated. In particular, the identification of homologs of known structure using HHpred, internal repeats using HHrepID, coiled coils using PCOILS and DeepCoil, and transmembrane segments using Quick2D are focused on.

Download Full-text

The first crystal structure of the peptidase domain of the U32 peptidase family

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s1399004715019549 ◽

2015 ◽

Vol 71 (12) ◽

pp. 2505-2512 ◽

Cited By ~ 4

Author(s):

Magdalena Schacherl ◽

Angelika A. M. Montada ◽

Elena Brunstein ◽

Ulrich Baumann

Keyword(s):

Crystal Structure ◽

Catalytic Mechanism ◽

Quaternary Structure ◽

Catalytic Domain ◽

Three Dimensional ◽

Zinc Ion ◽

Crystal Structure Analysis ◽

Dimensional Structure ◽

Sequence Motifs ◽

Conserved Sequence

The U32 family is a collection of over 2500 annotated peptidases in the MEROPS database with unknown catalytic mechanism. They mainly occur in bacteria and archaea, but a few representatives have also been identified in eukarya. Many of the U32 members have been linked to pathogenicity, such as proteins fromHelicobacterandSalmonella. The first crystal structure analysis of a U32 catalytic domain fromMethanopyrus kandleri(genemk0906) reveals a modified (βα)8TIM-barrel fold with some unique features. The connecting segment between strands β7 and β8 is extended and helix α7 is located on top of the C-terminal end of the barrel body. The protein exhibits a dimeric quaternary structure in which a zinc ion is symmetrically bound by histidine and cysteine side chains from both monomers. These residues reside in conserved sequence motifs. No typical proteolytic motifs are discernible in the three-dimensional structure, and biochemical assays failed to demonstrate proteolytic activity. A tunnel in which an acetate ion is bound is located in the C-terminal part of the β-barrel. Two hydrophobic grooves lead to a tunnel at the C-terminal end of the barrel in which an acetate ion is bound. One of the grooves binds to aStrep-Tag II of another dimer in the crystal lattice. Thus, these grooves may be binding sites for hydrophobic peptides or other ligands.

Download Full-text

Protein Folding: Search for Basic Physical Models

The Scientific World JOURNAL ◽

10.1100/tsw.2003.50 ◽

2003 ◽

Vol 3 ◽

pp. 623-635 ◽

Cited By ~ 3

Author(s):

Ivan Y. Torshin ◽

Robert W. Harrison

Keyword(s):

Protein Folding ◽

Tertiary Structure ◽

Three Dimensional ◽

Physical Models ◽

Dimensional Structure ◽

Computational Techniques ◽

Linear Sequence ◽

Physical Forces

How a unique three-dimensional structure is rapidly formed from the linear sequence of a polypeptide is one of the important questions in contemporary science. Apart from biological context ofin vivoprotein folding (which has been studied only for a few proteins), the roles of the fundamental physical forces in thein vitrofolding remain largely unstudied. Despite a degree of success in using descriptions based on statistical and/or thermodynamic approaches, few of the current models explicitly include more basic physical forces (such as electrostatics and Van Der Waals forces). Moreover, the present-day models rarely take into account that the protein folding is, essentially, a rapid process that produces a highly specific architecture. This review considers several physical models that may provide more direct links between sequence and tertiary structure in terms of the physical forces. In particular, elaboration of such simple models is likely to produce extremely effective computational techniques with value for modern genomics.

Download Full-text

Construction of A Preliminary Three-Dimensional Structure Simian betaretrovirus Serotype-2 (SRV-2) Reverse Transcriptase Isolated from Indonesian Cynomolgus Monkey

Tropical Life Sciences Research ◽

10.21315/tlsr2020.31.3.4 ◽

2020 ◽

Vol 31 (3) ◽

pp. 47-61

Author(s):

Uus Saepuloh ◽

Diah Iskandriati ◽

Joko Pamungkas ◽

Dedy Duryadi Solihin ◽

Sela Septima Mariya ◽

...

Keyword(s):

Reverse Transcriptase ◽

Cynomolgus Monkey ◽

Tertiary Structure ◽

Vaccine Development ◽

Three Dimensional ◽

Dimensional Structure ◽

Nucleotide Position ◽

Structure Model ◽

Three Dimensional Structure ◽

Hiv 1

Simian betaretrovirus serotype-2 (SRV-2) is an important pathogenic agent in Asian macaques. It is a potential confounding variable in biomedical research. SRV-2 also provides a valuable viral model compared to other retroviruses which can be used for understanding many aspects of retroviral-host interactions and immunosuppression, infection mechanism, retroviral structure, antiretroviral and vaccine development. In this study, we isolated the gene encoding reverse transcriptase enzyme (RT) of SRV-2 that infected Indonesian cynomolgus monkey (Mf ET1006) and predicted the three dimensional structure model using the iterative threading assembly refinement (I-TASSER) computational programme. This SRV-2 RT Mf ET1006 consisted of 547 amino acids at nucleotide position 3284–4925 of whole genome SRV-2. The polymerase active site located in the finger/palm subdomain characterised by three conserved catalytic aspartates (Asp90, Asp165, Asp166), and has a highly conserved YMDD motif as Tyr163, Met164, Asp165 and Asp166. We estimated that this SRV-2 RT Mf ET1006 structure has the accuracy of template modelling score (TM-score 0.90 ± 0.06) and root mean square deviation (RMSD) 4.7 ± 3.1Å, indicating that this model can be trusted and the accuracy can be seen from the appearance of protein folding in tertiary structure. The superpositionings between SRV-2 RT Mf ET1006 and Human Immunodeficiency Virus-1 (HIV-1) RT were performed to predict the structural in details and to optimise the best fits for illustrations. This SRV-2 RT Mf ET1006 structure model has the highest homology to HIV-1 RT (2B6A.pdb) with estimated accuracy at TM-score 0.911, RMSD 1.85 Å, and coverage of 0.953. This preliminary study of SRV-2 RT Mf ET1006 structure modelling is intriguing and provide some information to explore the molecular characteristic and biochemical mechanism of this enzyme.

Download Full-text

Selection of sequence motifs and generative Hopfield-Potts models for protein families

10.1101/652784 ◽

2019 ◽

Cited By ~ 1

Author(s):

Kai Shimagaki ◽

Martin Weigt

Keyword(s):

Amino Acid ◽

Ad Hoc ◽

Three Dimensional ◽

Generative Models ◽

Amino Acid Sequences ◽

Dimensional Structure ◽

Sequence Motifs ◽

Restricted Boltzmann Machines ◽

Potts Models ◽

Maximum Entropy Models

Statistical models for families of evolutionary related proteins have recently gained interest: in particular pairwise Potts models, as those inferred by the Direct-Coupling Analysis, have been able to extract information about the three-dimensional structure of folded proteins, and about the effect of amino-acid substitutions in proteins. These models are typically requested to reproduce the one- and two-point statistics of the amino-acid usage in a protein family, i.e. to capture the so-called residue conservation and covariation statistics of proteins of common evolutionary origin. Pairwise Potts models are the maximum-entropy models achieving this. While being successful, these models depend on huge numbers of ad hoc introduced parameters, which have to be estimated from finite amount of data and whose biophysical interpretation remains unclear. Here we propose an approach to parameter reduction, which is based on selecting collective sequence motifs. It naturally leads to the formulation of statistical sequence models in terms of Hopfield-Potts models. These models can be accurately inferred using a mapping to restricted Boltzmann machines and persistent contrastive divergence. We show that, when applied to protein data, even 20-40 patterns are sufficient to obtain statistically close-to-generative models. The Hopfield patterns form interpretable sequence motifs and may be used to clusterize amino-acid sequences into functional sub-families. However, the distributed collective nature of these motifs intrinsically limits the ability of Hopfield-Potts models in predicting contact maps, showing the necessity of developing models going beyond the Hopfield-Potts models discussed here.

Download Full-text

Prediction of Structural and Functional Aspects of Protein

Advances in Secure Computing, Internet Services, and Applications - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-4940-8.ch016 ◽

2014 ◽

pp. 317-333

Author(s):

Arun G. Ingale

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Sequence Information ◽

Predict Protein Structure ◽

Basic Ideas

To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.

Download Full-text

An Uncharacterized Member of the Ribokinase Family in Thermococcus kodakarensis Exhibits myo-Inositol Kinase Activity

Journal of Biological Chemistry ◽

10.1074/jbc.m113.457259 ◽

2013 ◽

Vol 288 (29) ◽

pp. 20856-20867 ◽

Cited By ~ 7

Author(s):

Takaaki Sato ◽

Masahiro Fujihashi ◽

Yukika Miyamoto ◽

Keiko Kuwata ◽

Eriko Kusaka ◽

...

Keyword(s):

Kinase Activity ◽

Three Dimensional ◽

Fold Increase ◽

Adenosine Kinase ◽

Dimensional Structure ◽

Uncharacterized Protein ◽

Thermococcus Kodakarensis ◽

Sugar Phosphates ◽

Biochemical Analyses ◽

Physiological Substrates

Here we performed structural and biochemical analyses on the TK2285 gene product, an uncharacterized protein annotated as a member of the ribokinase family, from the hyperthermophilic archaeon Thermococcus kodakarensis. The three-dimensional structure of the TK2285 protein resembled those of previously characterized members of the ribokinase family including ribokinase, adenosine kinase, and phosphofructokinase. Conserved residues characteristic of this protein family were located in a cleft of the TK2285 protein as in other members whose structures have been determined. We thus examined the kinase activity of the TK2285 protein toward various sugars recognized by well characterized ribokinase family members. Although activity with sugar phosphates and nucleosides was not detected, kinase activity was observed toward d-allose, d-lyxose, d-tagatose, d-talose, d-xylose, and d-xylulose. Kinetic analyses with the six sugar substrates revealed high Km values, suggesting that they were not the true physiological substrates. By examining activity toward amino sugars, sugar alcohols, and disaccharides, we found that the TK2285 protein exhibited prominent kinase activity toward myo-inositol. Kinetic analyses with myo-inositol revealed a greater kcat and much lower Km value than those obtained with the monosaccharides, resulting in over a 2,000-fold increase in kcat/Km values. TK2285 homologs are distributed among members of Thermococcales, and in most species, the gene is positioned close to a myo-inositol monophosphate synthase gene. Our results suggest the presence of a novel subfamily of the ribokinase family whose members are present in Archaea and recognize myo-inositol as a substrate.

Download Full-text

Prediction and Analysis of Surface Hydrophobic Residues in Tertiary Structure of Proteins

The Scientific World JOURNAL ◽

10.1155/2014/971258 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 19

Author(s):

Shambhu Malleshappa Gowder ◽

Jhinuk Chatterjee ◽

Tanusree Chaudhuri ◽

Kusum Paul

Keyword(s):

Tertiary Structure ◽

Structural Information ◽

Solvent Accessibility ◽

Conservation Score ◽

Protein Structures ◽

Three Dimensional ◽

Dimensional Structure ◽

Data Set ◽

Hydrophobic Residues ◽

Monomeric Proteins

The analysis of protein structures provides plenty of information about the factors governing the folding and stability of proteins, the preferred amino acids in the protein environment, the location of the residues in the interior/surface of a protein and so forth. In general, hydrophobic residues such as Val, Leu, Ile, Phe, and Met tend to be buried in the interior and polar side chains exposed to solvent. The present work depends on sequence as well as structural information of the protein and aims to understand nature of hydrophobic residues on the protein surfaces. It is based on the nonredundant data set of 218 monomeric proteins. Solvent accessibility of each protein was determined using NACCESS software and then obtained the homologous sequences to understand how well solvent exposed and buried hydrophobic residues are evolutionarily conserved and assigned the confidence scores to hydrophobic residues to be buried or solvent exposed based on the information obtained from conservation score and knowledge of flanking regions of hydrophobic residues. In the absence of a three-dimensional structure, the ability to predict surface accessibility of hydrophobic residues directly from the sequence is of great help in choosing the sites of chemical modification or specific mutations and in the studies of protein stability and molecular interactions.

Download Full-text

Single particle cryo-EM structure of the outer hair cell motor protein prestin

10.1101/2021.08.03.454998 ◽

2021 ◽

Author(s):

Carmen BUtan ◽

Qiang Song ◽

Jun-ping Bai ◽

Winston Tan ◽

Dhasakumar S Navaratnam ◽

...

Keyword(s):

Hair Cell ◽

Binding Site ◽

Outer Hair Cell ◽

Three Dimensional ◽

Structural Features ◽

Meriones Unguiculatus ◽

Dimensional Structure ◽

Transmembrane Segments ◽

Anion Transporter ◽

Mechanistic Basis

The mammalian outer hair cell (OHC) protein prestin (Slc26a5), a member of the solute carrier 26 (Slc26) family of membrane proteins, differs from other members of the family owing to its unique piezoelectric-like property that drives OHC electromotility. Prestin is required by OHCs for cochlear amplification, a process that enhances mammalian hearing. Despite substantial biophysical characterization, the mechanistic basis for the prestins electro-mechanical behavior is not fully understood. To gain insight into such behavior, we have used cryo-electron microscopy at subnanometer resolution (overall resolution of 4.0 Å) to investigate the three-dimensional structure of prestin from gerbil (Meriones unguiculatus). Our studies show that prestin dimerizes with a 3D architecture strikingly similar to the dimeric conformation observed in the Slc26a9 anion transporter in an inside open/intermediate state, which we infer, based on patch clamp recordings, to reflect the contracted state of prestin. The structure shows two well separated transmembrane (TM) subunits and two cytoplasmic sulfate transporter and anti-sigma factor antagonist (STAS) domains forming a swapped dimer. The dimerization interface is defined by interactions between the domain-swapped STAS dimer and the transmembrane domains of the opposing half unit, further strengthened by an antiparallel beta strand at its N terminus. The structure also shows that each one of its two transmembrane subunits consists of 14 transmembrane segments organized in two inverted 7-segment repeats with a topology that was first observed in the structure of the bacterial symporter UraA (Lu F, et al., Nature 472, 2011). Finally, the solved anion binding site structural features of prestin are quite similar to that of SLC26a9 and other family members. Despite this similarity, we find that SLC26a9 lacks the characteristic displacement currents (or NonLinear Capacitance(NLC)) found with prestin, and we show that mutation of prestins Cl- binding site removes salicylate competition with anions in the face of normal NLC, thus refuting the yet accepted extrinsic voltage sensor hypothesis and any associated transport-like requirements for voltage-driven electromotility.

Download Full-text

BiRDS - Binding Residue Detection from Protein Sequences using Deep ResNets

10.33774/chemrxiv-2021-013gn-v2 ◽

2021 ◽

Author(s):

Vineeth Chelur ◽

U. Deva Priyakumar

Keyword(s):

Binding Site ◽

Binding Sites ◽

Tertiary Structure ◽

Solvent Accessibility ◽

Three Dimensional ◽

Dimensional Structure ◽

Relative Solvent Accessibility ◽

Single Chain ◽

Sequence Alignments ◽

Multiple Sequence

Protein-drug interactions play important roles in many biological processes and therapeutics. Prediction of the active binding site of a protein helps discover and optimise these interactions leading to the design of better ligand molecules. The tertiary structure of a protein determines the binding sites available to the drug molecule. A quick and accurate prediction of the binding site from sequence alone without utilising the three-dimensional structure is challenging. Deep Learning has been used in a variety of biochemical tasks and has been hugely successful. In this paper, a Residual Neural Network (leveraging skip connections) is implemented to predict a protein's most active binding site. An Annotated Database of Druggable Binding Sites from the Protein DataBank, sc-PDB, is used for training the network. Features extracted from the Multiple Sequence Alignments (MSAs) of the protein generated using DeepMSA, such as Position-Specific Scoring Matrix (PSSM), Secondary Structure (SS3), and Relative Solvent Accessibility (RSA), are provided as input to the network. A weighted binary cross-entropy loss function is used to counter the substantial imbalance in the two classes of binding and non-binding residues. The network performs very well on single-chain proteins, providing a pocket that has good interactions with a ligand.

Download Full-text

Efficient targeting to storage granules of human proinsulins with altered propeptide domain.

The Journal of Cell Biology ◽

10.1083/jcb.106.6.1843 ◽

1988 ◽

Vol 106 (6) ◽

pp. 1843-1851 ◽

Cited By ~ 44

Author(s):

S K Powell ◽

L Orci ◽

C S Craik ◽

H P Moore

Keyword(s):

High Efficiency ◽

Tertiary Structure ◽

Three Dimensional ◽

Dimensional Structure ◽

Structural Domains ◽

C Peptide ◽

Three Dimensional Structure ◽

Morphological Criteria ◽

Cellular Machinery ◽

Storage Granules

In neuronal and endocrine cells, peptide hormones are selectively segregated into storage granules, while other proteins are exported continuously without storage. Sorting of hormones by cellular machinery involves the recognition of specific structural domains on prohormone molecules. Since the propeptide of insulin is known to play an important role in its three-dimensional structure, it is reasonable to speculate that targeting of proinsulin to storage granules would require a functional connecting peptide. To test this hypothesis, we constructed two mutations in human proinsulin with different predicted structures. In one mutation, Ins delta C, the entire C peptide was deleted, resulting in an altered insulin in which the B and the A chains are joined contiguously. In the other mutation, Ins/IGF, the C peptide of proinsulin was replaced with the unrelated 12-amino acid connecting peptide of human insulin-like growth factor-I; this substitution should permit correct folding of the B and A chains to form a tertiary structure similar to that of proinsulin. By several biochemical and morphological criteria, we found that Ins/IGF is efficiently targeted to storage granules, suggesting that the C peptide of proinsulin does not contain necessary sorting information. Unexpectedly, Ins delta C, which presumably cannot fold properly, is also targeted to granules at a high efficiency. These results imply that either the targeting machinery can tolerate changes in the tertiary structure of transported proteins, or that the B and A chains of insulin can form a relatively intact three-dimensional structure even in the absence of C peptide.

Download Full-text