A Web Database IR-PDB for Sequence Repeats of Proteins in the Protein Data Bank

Author(s):  
Selvaraj Samuel ◽  
Mary Rajathei

Amino acid repeats play significant roles in the evolution of structure and function of many large proteins. Analysis of internal repeats of protein with known structure helps to understand the importance of repeats of the protein. A database IR-PDB for repeats in sequence of the proteins in the PDB has been developed for the analysis of impact of repeats in proteins. Using the state of the art repeat detection method RADAR, internal repeats in 148202 sequences out of 285714 sequences belonging to 115031 PDB structures were detected. The identified sequence repeats were annotated with secondary structural information with a view to analyze the structural consequence and conservation of the repeats. The tertiary structure of the repeats and their functional involvements can be found out through web links to PDB, PDBsum and Pfam. IR-PDB is systematically annotated for the the proteins in the PDB with sequence repeats and their structure with the possibility to access the dataset interactively through web services.

Biotechnology ◽  
2019 ◽  
pp. 1166-1176
Author(s):  
Selvaraj Samuel ◽  
Mary Rajathei

Amino acid repeats play significant roles in the evolution of structure and function of many large proteins. Analysis of internal repeats of protein with known structure helps to understand the importance of repeats of the protein. A database IR-PDB for repeats in sequence of the proteins in the PDB has been developed for the analysis of impact of repeats in proteins. Using the state of the art repeat detection method RADAR, internal repeats in 148202 sequences out of 285714 sequences belonging to 115031 PDB structures were detected. The identified sequence repeats were annotated with secondary structural information with a view to analyze the structural consequence and conservation of the repeats. The tertiary structure of the repeats and their functional involvements can be found out through web links to PDB, PDBsum and Pfam. IR-PDB is systematically annotated for the proteins in the PDB with sequence repeats and their structure with the possibility to access the dataset interactively through web services.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
David Mary Rajathei ◽  
Subbiah Parthasarathy ◽  
Samuel Selvaraj

AbstractAmino acid repeats are found to play important roles in both structures and functions of the proteins. These are commonly found in all kingdoms of life, especially in eukaryotes and a larger fraction of human proteins composed of repeats. Further, the abnormal expansions of shorter repeats cause various diseases to humans. Therefore, the analysis of repeats of the entire human proteome along with functional, mutational and disease information would help to better understand their roles in proteins. To fulfill this need, we developed a web database HPREP (http://bioinfo.bdu.ac.in/hprep) for human proteome repeats using Perl and HTML programming. We identified different categories of well-characterized repeats and domain repeats that are present in the human proteome of UniProtKB/Swiss-Prot by using in-house Perl programming and novel repeats by using the repeat detection T-REKS tool as well as XSTREAM web server. Further, these proteins are annotated with functional, mutational and disease information and grouped according to specific repeat types. The developed database enables the users to search by specific repeat type in order to understand their involvement in proteins. Thus, the HPREP database is expected to be a useful resource to gain better insight regarding the different repeats in human proteome and their biological roles.


2005 ◽  
Vol 11 (5) ◽  
pp. 535-546 ◽  
Author(s):  
Anna Kondakov ◽  
Buko Lindner

Bacterial glycolipids are complex amphiphilic molecules which are, on the one hand, of utmost importance for the organization and function of bacterial membranes and which, on the other hand, play a major role in the activation of cells of the innate and adaptive immune system of the host. Already small alterations to their chemical structure may influence the biological activity tremendously. Due to their intrinsic biological heterogeneity [number and type of fatty acids, saccharide structures and substitution with for example, phosphate ( P), 2-aminoethyl-(pyro)phosphate groups ( P-Etn) or 4-amino-4-deoxyarabinose (Ara4N)], separation of the different components are a prerequisite for unequivocal chemical and nuclear magnetic resonance structural analyses. In this contribution, the structural information which can be obtained from heterogenous samples of glycolipids by Fourier transform (FT) ion cyclotron resonance mass spectrometric methods is described. By means of recently analysed complex biological samples, the possibilities of high-resolution electrospray ionization FT-MS are demonstrated. Capillary skimmer dissociation, as well as tandem mass spectrometry (MS/MS) analysis utilizing collision-induced dissociation and infrared multiphoton dissociation, are compared and their advantages in providing structural information of diagnostic importance are discussed.


1996 ◽  
Vol 270 (4) ◽  
pp. L650-L658 ◽  
Author(s):  
M. Ikegami ◽  
T. Ueda ◽  
W. Hull ◽  
J. A. Whitsett ◽  
R. C. Mulligan ◽  
...  

Mice made granulocyte macrophage-colony stimulating factor (GM-CSF)-deficient by homologous recombination maintain normal steady-state hematopoiesis but have an alveolar accumulation of surfactant lipids and protein that is similar to pulmonary alveolar proteinosis in humans. We asked how GM-CSF deficiency alters surfactant metabolism and function in mice. Alveolar and lung tissue saturated phosphatidylcholine (Sat PC) were increased six- to eightfold in 7- to 9-wk-old GM-CSF-deficient mice relative to controls. Incorporation of radiolabeled palmitate and choline into Sat PC was higher in GM-CSF deficient mice than control mice, and no loss of labeled Sat PC occurred from the lungs of GM-CSF-deficient mice. Secretion of radiolabeled Sat PC to the alveolus was similar in GM-CSF-deficient and control mice. Labeled Sat PC and surfactant protein A (SP-A) given by tracheal instillation were cleared rapidly in control mice, but there was no measurable loss from the lungs of GM-CSF-deficient mice. The function of the surfactant from GM-CSF-deficient mice was normal when tested in preterm surfactant-deficient rabbits. GM-CSF deficiency results in a catabolic defect for Sat PC and SP-A.


2002 ◽  
Vol 66 (3) ◽  
pp. 460-485 ◽  
Author(s):  
M. Clelia Ganoza ◽  
Michael C. Kiel ◽  
Hiroyuki Aoki

SUMMARY Current X-ray diffraction and cryoelectron microscopic data of ribosomes of eubacteria have shed considerable light on the molecular mechanisms of translation. Structural studies of the protein factors that activate ribosomes also point to many common features in the primary sequence and tertiary structure of these proteins. The reconstitution of the complex apparatus of translation has also revealed new information important to the mechanisms. Surprisingly, the latter approach has uncovered a number of proteins whose sequence and/or structure and function are conserved in all cells, indicating that the mechanisms are indeed conserved. The possible mechanisms of a new initiation factor and two elongation factors are discussed in this context.


2014 ◽  
Vol 70 (a1) ◽  
pp. C491-C491
Author(s):  
Jürgen Haas ◽  
Alessandro Barbato ◽  
Tobias Schmidt ◽  
Steven Roth ◽  
Andrew Waterhouse ◽  
...  

Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing goal in structural biology. Over the last two decades, a paradigm shift has occurred: starting from a large "knowledge gap" between the huge number of protein sequences compared to a small number of experimentally known structures, today, some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Methods for structure modeling and prediction have made substantial progress of the last decades, and template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. However, computational modeling and prediction techniques often fall short in accuracy compared to high-resolution experimental structures, and it is often difficult to convey the expected accuracy and structural variability of a specific model. Retrospectively assessing the quality of blind structure prediction in comparison to experimental reference structures allows benchmarking the state-of-the-art in structure prediction and identifying areas which need further development. The Critical Assessment of Structure Prediction (CASP) experiment has for the last 20 years assessed the progress in the field of protein structure modeling based on predictions for ca. 100 blind prediction targets per experiment which are carefully evaluated by human experts. The "Continuous Model EvaluatiOn" (CAMEO) project aims to provide a fully automated blind assessment for prediction servers based on weekly pre-released sequences of the Protein Data Bank PDB. CAMEO has been made possible by the development of novel scoring methods such as lDDT, which are robust against domain movements to allow for automated continuous structure comparison without human intervention.


2021 ◽  
Author(s):  
Joseph H. Lubin ◽  
Christopher Markosian ◽  
D. Balamurugan ◽  
Renata Pasqualini ◽  
Wadih Arap ◽  
...  

There is enormous ongoing interest in characterizing the binding properties of the SARS-CoV-2 Omicron Variant of Concern (VOC) (B.1.1.529), which continues to spread towards potential dominance worldwide. To aid these studies, based on the wealth of available structural information about several SARS-CoV-2 variants in the Protein Data Bank (PDB) and a modeling pipeline we have previously developed for tracking the ongoing global evolution of SARS-CoV-2 proteins, we provide a set of computed structural models (henceforth models) of the Omicron VOC receptor-binding domain (omRBD) bound to its corresponding receptor Angiotensin-Converting Enzyme (ACE2) and a variety of therapeutic entities, including neutralizing and therapeutic antibodies targeting previously-detected viral strains. We generated bound omRBD models using both experimentally-determined structures in the PDB as well as machine learning-based structure predictions as starting points. Examination of ACE2-bound omRBD models reveals an interdigitated multi-residue interaction network formed by omRBD-specific substituted residues (R493, S496, Y501, R498) and ACE2 residues at the interface, which was not present in the original Wuhan-Hu-1 RBD-ACE2 complex. Emergence of this interaction network suggests optimization of a key region of the binding interface, and positive cooperativity among various sites of residue substitutions in omRBD mediating ACE2 binding. Examination of neutralizing antibody complexes for Barnes Class 1 and Class 2 antibodies modeled with omRBD highlights an overall loss of interfacial interactions (with gain of new interactions in rare cases) mediated by substituted residues. Many of these substitutions have previously been found to independently dampen or even ablate antibody binding, and perhaps mediate antibody-mediated neutralization escape (e.g., K417N). We observe little compensation of corresponding interaction loss at interfaces when potential escape substitutions occur in combination. A few selected antibodies (e.g., Barnes Class 3 S309), however, feature largely unaltered or modestly affected protein-protein interfaces. While we stress that only qualitative insights can be obtained directly from our models at this time, we anticipate that they can provide starting points for more detailed and quantitative computational characterization, and, if needed, redesign of monoclonal antibodies for targeting the Omicron VOC Spike protein. In the broader context, the computational pipeline we developed provides a framework for rapidly and efficiently generating retrospective and prospective models for other novel variants of SARS-CoV-2 bound to entities of virological and therapeutic interest, in the setting of a global pandemic.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1395
Author(s):  
Shahram Mesdaghi ◽  
David L. Murphy ◽  
Filomeno Sánchez Rodríguez ◽  
J. Javier Burgos-Mármol ◽  
Daniel J. Rigden

Background: Recent strides in computational structural biology have opened up an opportunity to understand previously uncharacterised proteins.  The under-representation of transmembrane proteins in the Protein Data Bank highlights the need to apply new and advanced bioinformatics methods to shed light on their structure and function.  This study focuses on a family of transmembrane proteins containing the Pfam domain PF09335 ('SNARE_ASSOC'/ ‘VTT ‘/’Tvp38’). One prominent member, Tmem41b, has been shown to be involved in early stages of autophagosome formation and is vital in mouse embryonic development as well as being identified as a viral host factor of SARS-CoV-2. Methods: We used evolutionary covariance-derived information to construct and validate ab initio models, make domain boundary predictions and infer local structural features.  Results: The results from the structural bioinformatics analysis of Tmem41b and its homologues showed that they contain a tandem repeat that is clearly visible in evolutionary covariance data but much less so by sequence analysis.  Furthermore, cross-referencing of other prediction data with covariance analysis showed that the internal repeat features two-fold rotational symmetry.  Ab initio modelling of Tmem41b and homologues reinforces these structural predictions.  Local structural features predicted to be present in Tmem41b were also present in Cl-/H+ antiporters.  Conclusions: The results of this study strongly point to Tmem41b and its homologues being transporters for an as-yet uncharacterised substrate and possibly using H+ antiporter activity as its mechanism for transport.


Author(s):  
Miroslaw Gilski ◽  
Jianbo Zhao ◽  
Marcin Kowiel ◽  
Dariusz Brzezinski ◽  
Douglas H. Turner ◽  
...  

Geometrical restraints provide key structural information for the determination of biomolecular structures at lower resolution by experimental methods such as crystallography or cryo-electron microscopy. In this work, restraint targets for nucleic acids bases are derived from three different sources and compared: small-molecule crystal structures in the Cambridge Structural Database (CSD), ultrahigh-resolution structures in the Protein Data Bank (PDB) and quantum-mechanical (QM) calculations. The best parameters are those based on CSD structures. After over two decades, the standard library of Parkinson et al. [(1996), Acta Cryst. D52, 57–64] is still valid, but improvements are possible with the use of the current CSD database. The CSD-derived geometry is fully compatible with Watson–Crick base pairs, as comparisons with QM results for isolated and paired bases clearly show that the CSD targets closely correspond to proper base pairing. While the QM results are capable of distinguishing between single and paired bases, their level of accuracy is, on average, nearly two times lower than for the CSD-derived targets when gauged by root-mean-square deviations from ultrahigh-resolution structures in the PDB. Nevertheless, the accuracy of QM results appears sufficient to provide stereochemical targets for synthetic base pairs where no reliable experimental structural information is available. To enable future tests for this approach, QM calculations are provided for isocytosine, isoguanine and the iCiG base pair.


2020 ◽  
Author(s):  
Junwen Luo ◽  
Yi Cai ◽  
Jialin Wu ◽  
Hongmin Cai ◽  
Xiaofeng Yang ◽  
...  

AbstractIn recent years, deep learning has been increasingly used to decipher the relationships among protein sequence, structure, and function. Thus far deep learning of proteins has mostly utilized protein primary sequence information, while the vast amount of protein tertiary structural information remains unused. In this study, we devised a self-supervised representation learning framework to extract the fundamental features of unlabeled protein tertiary structures (PtsRep), and the embedded representations were transferred to two commonly recognized protein engineering tasks, protein stability and GFP fluorescence prediction. On both tasks, PtsRep significantly outperformed the two benchmark methods (UniRep and TAPE-BERT), which are based on protein primary sequences. Protein clustering analyses demonstrated that PtsRep can capture the structural signals in proteins. PtsRep reveals an avenue for general protein structural representation learning, and for exploring protein structural space for protein engineering and drug design.


Sign in / Sign up

Export Citation Format

Share Document