scholarly journals Matching of PDB chain sequences to information in public databases as a prerequisite for 3D functional site visualization

2004 ◽  
Vol 1 (1) ◽  
pp. 80-89
Author(s):  
Guido Dieterich ◽  
Dirk W. Heinz ◽  
Joachim Reichelt

Abstract The 3D structures of biomacromolecules stored in the Protein Data Bank [1] were correlated with different external, biological information from public databases. We have matched the feature table of SWISS-PROT [2] entries as well InterPro [3] domains and function sites with the corresponding 3D-structures. OMIM [4] (Online Mendelian Inheritance in Man) records, containing information of genetic disorders, were extracted and linked to the structures. The exhaustive all-against-all 3D structure comparison of protein structures stored in DALI [5] was condensed into single files for each PDB entry. Results are stored in XML format facilitating its incorporation into related software. The resulting annotation of the protein structures allows functional sites to be identified upon visualization.


2014 ◽  
Vol 70 (a1) ◽  
pp. C491-C491
Author(s):  
Jürgen Haas ◽  
Alessandro Barbato ◽  
Tobias Schmidt ◽  
Steven Roth ◽  
Andrew Waterhouse ◽  
...  

Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing goal in structural biology. Over the last two decades, a paradigm shift has occurred: starting from a large "knowledge gap" between the huge number of protein sequences compared to a small number of experimentally known structures, today, some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Methods for structure modeling and prediction have made substantial progress of the last decades, and template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. However, computational modeling and prediction techniques often fall short in accuracy compared to high-resolution experimental structures, and it is often difficult to convey the expected accuracy and structural variability of a specific model. Retrospectively assessing the quality of blind structure prediction in comparison to experimental reference structures allows benchmarking the state-of-the-art in structure prediction and identifying areas which need further development. The Critical Assessment of Structure Prediction (CASP) experiment has for the last 20 years assessed the progress in the field of protein structure modeling based on predictions for ca. 100 blind prediction targets per experiment which are carefully evaluated by human experts. The "Continuous Model EvaluatiOn" (CAMEO) project aims to provide a fully automated blind assessment for prediction servers based on weekly pre-released sequences of the Protein Data Bank PDB. CAMEO has been made possible by the development of novel scoring methods such as lDDT, which are robust against domain movements to allow for automated continuous structure comparison without human intervention.



2021 ◽  
Vol 8 ◽  
Author(s):  
Sundeep Chaitanya Vedithi ◽  
Sony Malhotra ◽  
Marta Acebrón-García-de-Eulate ◽  
Modestas Matusevicius ◽  
Pedro Henrique Monteiro Torres ◽  
...  

Leprosy, caused by Mycobacterium leprae (M. leprae), is treated with a multidrug regimen comprising Dapsone, Rifampicin, and Clofazimine. These drugs exhibit bacteriostatic, bactericidal and anti-inflammatory properties, respectively, and control the dissemination of infection in the host. However, the current treatment is not cost-effective, does not favor patient compliance due to its long duration (12 months) and does not protect against the incumbent nerve damage, which is a severe leprosy complication. The chronic infectious peripheral neuropathy associated with the disease is primarily due to the bacterial components infiltrating the Schwann cells that protect neuronal axons, thereby inducing a demyelinating phenotype. There is a need to discover novel/repurposed drugs that can act as short duration and effective alternatives to the existing treatment regimens, preventing nerve damage and consequent disability associated with the disease. Mycobacterium leprae is an obligate pathogen resulting in experimental intractability to cultivate the bacillus in vitro and limiting drug discovery efforts to repositioning screens in mouse footpad models. The dearth of knowledge related to structural proteomics of M. leprae, coupled with emerging antimicrobial resistance to all the three drugs in the multidrug therapy, poses a need for concerted novel drug discovery efforts. A comprehensive understanding of the proteomic landscape of M. leprae is indispensable to unravel druggable targets that are essential for bacterial survival and predilection of human neuronal Schwann cells. Of the 1,614 protein-coding genes in the genome of M. leprae, only 17 protein structures are available in the Protein Data Bank. In this review, we discussed efforts made to model the proteome of M. leprae using a suite of software for protein modeling that has been developed in the Blundell laboratory. Precise template selection by employing sequence-structure homology recognition software, multi-template modeling of the monomeric models and accurate quality assessment are the hallmarks of the modeling process. Tools that map interfaces and enable building of homo-oligomers are discussed in the context of interface stability. Other software is described to determine the druggable proteome by using information related to the chokepoint analysis of the metabolic pathways, gene essentiality, homology to human proteins, functional sites, druggable pockets and fragment hotspot maps.



2009 ◽  
Vol 07 (05) ◽  
pp. 755-771 ◽  
Author(s):  
ZAIXIN LU ◽  
ZHIYU ZHAO ◽  
SERGIO GARCIA ◽  
KRISHNAKUMAR KRISHNASWAMY ◽  
BIN FU

We have developed an algorithm and web tool to search similar protein structures in the PDB (Protein Data Bank). The algorithm is a combination of a series of methods including protein classification, geometric feature extraction, sequence alignment, and 3D structure alignment. Given a protein structure, the tool can efficiently discover similar structures from hundreds of thousands of structures stored in the PDB. Our experimental results show that it is more accurate than other well-known protein search systems including PSI-BLAST, 3D-BLAST, and SSM in finding proteins that are structurally similar to the query protein, and its speed is also competitive with those systems. The algorithm has been fully implemented and is accessible online at the address , which is supported by a cluster of computers.



2003 ◽  
Vol 01 (01) ◽  
pp. 119-138 ◽  
Author(s):  
LIPING WEI ◽  
RUSS B. ALTMAN

The increase in known three-dimensional protein structures enables us to build statistical profiles of important functional sites in protein molecules. These profiles can then be used to recognize sites in large-scale automated annotations of new protein structures. We report an improved FEATURE system which recognizes functional sites in protein structures. FEATURE defines multi-level physico-chemical properties and recognizes sites based on the spatial distribution of these properties in the sites' microenvironments. It uses a Bayesian scoring function to compare a query region with the statistical profile built from known examples of sites and control nonsites. We have previously shown that FEATURE can accurately recognize calcium-binding sites and have reported interesting results scanning for calcium-binding sites in the entire Protein Data Bank. Here we report the ability of the improved FEATURE to characterize and recognize geometrically complex and asymmetric sites such as ATP-binding sites and disulfide bond-forming sites. FEATURE does not rely on conserved residues or conserved residue geometry of the sites. We also demonstrate that, in the absence of a statistical profile of the sites, FEATURE can use an artificially constructed profile based on a priori knowledge to recognize the sites in new structures, using redoxin active sites as an example.



2000 ◽  
Vol 33 (1) ◽  
pp. 176-183 ◽  
Author(s):  
Guoguang Lu

In order to facilitate the three-dimensional structure comparison of proteins, software for making comparisons and searching for similarities to protein structures in databases has been developed. The program identifies the residues that share similar positions of both main-chain and side-chain atoms between two proteins. The unique functions of the software also include database processingviaInternet- and Web-based servers for different types of users. The developed method and its friendly user interface copes with many of the problems that frequently occur in protein structure comparisons, such as detecting structurally equivalent residues, misalignment caused by coincident match of Cαatoms, circular sequence permutations, tedious repetition of access, maintenance of the most recent database, and inconvenience of user interface. The program is also designed to cooperate with other tools in structural bioinformatics, such as the 3DB Browser software [Prilusky (1998).Protein Data Bank Q. Newslett.84, 3–4] and the SCOP database [Murzin, Brenner, Hubbard & Chothia (1995).J. Mol. Biol.247, 536–540], for convenient molecular modelling and protein structure analysis. A similarity ranking score of `structure diversity' is proposed in order to estimate the evolutionary distance between proteins based on the comparisons of their three-dimensional structures. The function of the program has been utilized as a part of an automated program for multiple protein structure alignment. In this paper, the algorithm of the program and results of systematic tests are presented and discussed.



Molecules ◽  
2020 ◽  
Vol 25 (7) ◽  
pp. 1522 ◽  
Author(s):  
Mikhail Yu. Lobanov ◽  
Ilya V. Likhachev ◽  
Oxana V. Galzitskaya

We created a new library of disordered patterns and disordered residues in the Protein Data Bank (PDB). To obtain such datasets, we clustered the PDB and obtained the groups of chains with different identities and marked disordered residues. We elaborated a new procedure for finding disordered patterns and created a new version of the library. This library includes three sets of patterns: unique patterns, patterns consisting of two kinds of amino acids, and homo-repeats. Using this database, the user can: (1) find homologues in the entire Protein Data Bank; (2) perform a statistical analysis of disordered residues in protein structures; (3) search for disordered patterns and homo-repeats; (4) search for disordered regions in different chains of the same protein; (5) download clusters of protein chains with different identity from our database and library of disordered patterns; and (6) observe 3D structure interactively using MView. A new library of disordered patterns will help improve the accuracy of predictions for residues that will be structured or unstructured in a given region.



2009 ◽  
Vol 43 (1) ◽  
pp. 196-199 ◽  
Author(s):  
K. Hemavathi ◽  
M. Kalaivani ◽  
A. Udayakumar ◽  
G. Sowmiya ◽  
J. Jeyakanthan ◽  
...  

MIPS (metal interactions in protein structures) is a database of metals in the three-dimensional macromolecular structures available in the Protein Data Bank. Bound metal ions in proteins have both catalytic and structural functions. The proposed database serves as an open resource for the analysis and visualization of all metals and their interactions with macromolecular (protein and nucleic acid) structures. MIPS can be searchedviaa user-friendly interface, and the interactions between metals and protein molecules, and the geometric parameters, can be viewed in both textual and graphical format using the freely available graphics plug-inJmol. MIPS is updated regularly, by means of programmed scripts to find metal-containing proteins from newly released protein structures. The database is useful for studying the properties of coordination between metals and protein molecules. It also helps to improve understanding of the relationship between macromolecular structure and function. This database is intended to serve the scientific community working in the areas of chemical and structural biology, and is freely available to all users, around the clock, at http://dicsoft2.physics.iisc.ernet.in/mips/.



2015 ◽  
Vol 112 (40) ◽  
pp. E5486-E5495 ◽  
Author(s):  
Atanas Kamburov ◽  
Michael S. Lawrence ◽  
Paz Polak ◽  
Ignaty Leshchiner ◽  
Kasper Lage ◽  
...  

Large-scale tumor sequencing projects enabled the identification of many new cancer gene candidates through computational approaches. Here, we describe a general method to detect cancer genes based on significant 3D clustering of mutations relative to the structure of the encoded protein products. The approach can also be used to search for proteins with an enrichment of mutations at binding interfaces with a protein, nucleic acid, or small molecule partner. We applied this approach to systematically analyze the PanCancer compendium of somatic mutations from 4,742 tumors relative to all known 3D structures of human proteins in the Protein Data Bank. We detected significant 3D clustering of missense mutations in several previously known oncoproteins including HRAS, EGFR, and PIK3CA. Although clustering of missense mutations is often regarded as a hallmark of oncoproteins, we observed that a number of tumor suppressors, including FBXW7, VHL, and STK11, also showed such clustering. Beside these known cases, we also identified significant 3D clustering of missense mutations in NUF2, which encodes a component of the kinetochore, that could affect chromosome segregation and lead to aneuploidy. Analysis of interaction interfaces revealed enrichment of mutations in the interfaces between FBXW7-CCNE1, HRAS-RASA1, CUL4B-CAND1, OGT-HCFC1, PPP2R1A-PPP2R5C/PPP2R2A, DICER1-Mg2+, MAX-DNA, SRSF2-RNA, and others. Together, our results indicate that systematic consideration of 3D structure can assist in the identification of cancer genes and in the understanding of the functional role of their mutations.



2019 ◽  
Author(s):  
Michael A Stiffler ◽  
Frank J Poelwijk ◽  
Kelly Brock ◽  
Richard R Stein ◽  
Joan Teyra ◽  
...  

AbstractNatural evolution encodes rich information about the structure and function of biomolecules in the genetic record. Previously, statistical analysis of co-variation patterns in natural protein families has enabled the accurate computation of 3D structures. Here, we explored whether similar information can be generated by laboratory evolution, starting from a single gene and performing multiple cycles of mutagenesis and functional selection. We evolved two bacterial antibiotic resistance proteins, β-lactamase PSE1 and acetyltransferase AAC6, and obtained hundreds of thousands of diverse functional sequences. Using evolutionary coupling analysis, we inferred residue interactions in good agreement with contacts in the crystal structures, confirming genetic encoding of structural constraints in the selected sequences. Computational protein folding with contact constraints yielded 3D structures with the same fold as that of natural relatives. Evolution experiments combined with inference of residue interactions from sequence information opens the door to a new experimental method for the determination of protein structures.



2017 ◽  
Author(s):  
Yang Liu ◽  
Qing Ye ◽  
Liwei Wang ◽  
Jian Peng

AbstractMotivationUnderstanding the relationship between protein structure and function is a fundamental problem in protein science. Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a “bag of fragments”, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library.ResultsHere we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. Similar to FragBag, DeepFold represents each protein structure or fold using a vector of learned structural motif features. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs.Availabilityhttps://github.com/largelymfs/[email protected]



Sign in / Sign up

Export Citation Format

Share Document