scholarly journals Learning structural motif representations for efficient protein structure search

2017 ◽  
Author(s):  
Yang Liu ◽  
Qing Ye ◽  
Liwei Wang ◽  
Jian Peng

AbstractMotivationUnderstanding the relationship between protein structure and function is a fundamental problem in protein science. Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a “bag of fragments”, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library.ResultsHere we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. Similar to FragBag, DeepFold represents each protein structure or fold using a vector of learned structural motif features. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs.Availabilityhttps://github.com/largelymfs/[email protected]


Author(s):  
Mark Lorch

This chapter examines proteins, the dominant proportion of cellular machinery, and the relationship between protein structure and function. The multitude of biological processes needed to keep cells functioning are managed in the organism or cell by a massive cohort of proteins, together known as the proteome. The twenty amino acids that make up the bulk of proteins produce the vast array of protein structures. However, amino acids alone do not provide quite enough chemical variety to complete all of the biochemical activity of a cell, so the chapter also explores post-translation modifications. It finishes by looking as some dynamic aspects of proteins, including enzyme kinetics and the protein folding problem.



2020 ◽  
Vol 49 (D1) ◽  
pp. D452-D457
Author(s):  
Lisanna Paladin ◽  
Martina Bevilacqua ◽  
Sara Errigo ◽  
Damiano Piovesan ◽  
Ivan Mičetić ◽  
...  

Abstract The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.



2000 ◽  
Vol 33 (1) ◽  
pp. 176-183 ◽  
Author(s):  
Guoguang Lu

In order to facilitate the three-dimensional structure comparison of proteins, software for making comparisons and searching for similarities to protein structures in databases has been developed. The program identifies the residues that share similar positions of both main-chain and side-chain atoms between two proteins. The unique functions of the software also include database processingviaInternet- and Web-based servers for different types of users. The developed method and its friendly user interface copes with many of the problems that frequently occur in protein structure comparisons, such as detecting structurally equivalent residues, misalignment caused by coincident match of Cαatoms, circular sequence permutations, tedious repetition of access, maintenance of the most recent database, and inconvenience of user interface. The program is also designed to cooperate with other tools in structural bioinformatics, such as the 3DB Browser software [Prilusky (1998).Protein Data Bank Q. Newslett.84, 3–4] and the SCOP database [Murzin, Brenner, Hubbard & Chothia (1995).J. Mol. Biol.247, 536–540], for convenient molecular modelling and protein structure analysis. A similarity ranking score of `structure diversity' is proposed in order to estimate the evolutionary distance between proteins based on the comparisons of their three-dimensional structures. The function of the program has been utilized as a part of an automated program for multiple protein structure alignment. In this paper, the algorithm of the program and results of systematic tests are presented and discussed.



2004 ◽  
Vol 1 (1) ◽  
pp. 80-89
Author(s):  
Guido Dieterich ◽  
Dirk W. Heinz ◽  
Joachim Reichelt

Abstract The 3D structures of biomacromolecules stored in the Protein Data Bank [1] were correlated with different external, biological information from public databases. We have matched the feature table of SWISS-PROT [2] entries as well InterPro [3] domains and function sites with the corresponding 3D-structures. OMIM [4] (Online Mendelian Inheritance in Man) records, containing information of genetic disorders, were extracted and linked to the structures. The exhaustive all-against-all 3D structure comparison of protein structures stored in DALI [5] was condensed into single files for each PDB entry. Results are stored in XML format facilitating its incorporation into related software. The resulting annotation of the protein structures allows functional sites to be identified upon visualization.



2019 ◽  
Vol 5 (8) ◽  
pp. eaax4621 ◽  
Author(s):  
Hongyi Xu ◽  
Hugo Lebrette ◽  
Max T. B. Clabbers ◽  
Jingjing Zhao ◽  
Julia J. Griese ◽  
...  

Microcrystal electron diffraction (MicroED) has recently shown potential for structural biology. It enables the study of biomolecules from micrometer-sized 3D crystals that are too small to be studied by conventional x-ray crystallography. However, to date, MicroED has only been applied to redetermine protein structures that had already been solved previously by x-ray diffraction. Here, we present the first new protein structure—an R2lox enzyme—solved using MicroED. The structure was phased by molecular replacement using a search model of 35% sequence identity. The resulting electrostatic scattering potential map at 3.0-Å resolution was of sufficient quality to allow accurate model building and refinement. The dinuclear metal cofactor could be located in the map and was modeled as a heterodinuclear Mn/Fe center based on previous studies. Our results demonstrate that MicroED has the potential to become a widely applicable tool for revealing novel insights into protein structure and function.



2020 ◽  
Vol 37 (9) ◽  
pp. 2711-2726
Author(s):  
Ashar J Malik ◽  
Anthony M Poole ◽  
Jane R Allison

Abstract For evaluating the deepest evolutionary relationships among proteins, sequence similarity is too low for application of sequence-based homology search or phylogenetic methods. In such cases, comparison of protein structures, which are often better conserved than sequences, may provide an alternative means of uncovering deep evolutionary signal. Although major protein structure databases such as SCOP and CATH hierarchically group protein structures, they do not describe the specific evolutionary relationships within a hierarchical level. Structural phylogenies have the potential to fill this gap. However, it is difficult to assess evolutionary relationships derived from structural phylogenies without some means of assessing confidence in such trees. We therefore address two shortcomings in the application of structural data to deep phylogeny. First, we examine whether phylogenies derived from pairwise structural comparisons are sensitive to differences in protein length and shape. We find that structural phylogenetics is best employed where structures have very similar lengths, and that shape fluctuations generated during molecular dynamics simulations impact pairwise comparisons, but not so drastically as to eliminate evolutionary signal. Second, we address the absence of statistical support for structural phylogeny. We present a method for assessing confidence in a structural phylogeny using shape fluctuations generated via molecular dynamics or Monte Carlo simulations of proteins. Our approach will aid the evolutionary reconstruction of relationships across structurally defined protein superfamilies. With the Protein Data Bank now containing in excess of 158,000 entries (December 2019), we predict that structural phylogenetics will become a useful tool for ordering the protein universe.



2009 ◽  
Vol 43 (1) ◽  
pp. 196-199 ◽  
Author(s):  
K. Hemavathi ◽  
M. Kalaivani ◽  
A. Udayakumar ◽  
G. Sowmiya ◽  
J. Jeyakanthan ◽  
...  

MIPS (metal interactions in protein structures) is a database of metals in the three-dimensional macromolecular structures available in the Protein Data Bank. Bound metal ions in proteins have both catalytic and structural functions. The proposed database serves as an open resource for the analysis and visualization of all metals and their interactions with macromolecular (protein and nucleic acid) structures. MIPS can be searchedviaa user-friendly interface, and the interactions between metals and protein molecules, and the geometric parameters, can be viewed in both textual and graphical format using the freely available graphics plug-inJmol. MIPS is updated regularly, by means of programmed scripts to find metal-containing proteins from newly released protein structures. The database is useful for studying the properties of coordination between metals and protein molecules. It also helps to improve understanding of the relationship between macromolecular structure and function. This database is intended to serve the scientific community working in the areas of chemical and structural biology, and is freely available to all users, around the clock, at http://dicsoft2.physics.iisc.ernet.in/mips/.



2020 ◽  
Vol 36 (12) ◽  
pp. 3758-3765 ◽  
Author(s):  
Xiaoqiang Huang ◽  
Robin Pearce ◽  
Yang Zhang

Abstract Motivation Protein structure and function are essentially determined by how the side-chain atoms interact with each other. Thus, accurate protein side-chain packing (PSCP) is a critical step toward protein structure prediction and protein design. Despite the importance of the problem, however, the accuracy and speed of current PSCP programs are still not satisfactory. Results We present FASPR for fast and accurate PSCP by using an optimized scoring function in combination with a deterministic searching algorithm. The performance of FASPR was compared with four state-of-the-art PSCP methods (CISRR, RASP, SCATD and SCWRL4) on both native and non-native protein backbones. For the assessment on native backbones, FASPR achieved a good performance by correctly predicting 69.1% of all the side-chain dihedral angles using a stringent tolerance criterion of 20°, compared favorably with SCWRL4, CISRR, RASP and SCATD which successfully predicted 68.8%, 68.6%, 67.8% and 61.7%, respectively. Additionally, FASPR achieved the highest speed for packing the 379 test protein structures in only 34.3 s, which was significantly faster than the control methods. For the assessment on non-native backbones, FASPR showed an equivalent or better performance on I-TASSER predicted backbones and the backbones perturbed from experimental structures. Detailed analyses showed that the major advantage of FASPR lies in the optimal combination of the dead-end elimination and tree decomposition with a well optimized scoring function, which makes FASPR of practical use for both protein structure modeling and protein design studies. Availability and implementation The web server, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/FASPR and https://github.com/tommyhuangthu/FASPR. Supplementary information Supplementary data are available at Bioinformatics online.



2017 ◽  
Vol 2017 ◽  
pp. 1-12 ◽  
Author(s):  
Ke Yan ◽  
Bing Wang ◽  
Holun Cheng ◽  
Zhiwei Ji ◽  
Jing Huang ◽  
...  

Molecular skin surface (MSS), proposed by Edelsbrunner, is a C2 continuous smooth surface modeling approach of biological macromolecules. Compared to the traditional methods of molecular surface representations (e.g., the solvent exclusive surface), MSS has distinctive advantages including having no self-intersection and being decomposable and transformable. For further promoting MSS to the field of bioinformatics, transformation between different MSS representations mimicking the macromolecular dynamics is demanded. The transformation process helps biologists understand the macromolecular dynamics processes visually in the atomic level, which is important in studying the protein structures and binding sites for optimizing drug design. However, modeling the transformation between different MSSs suffers from high computational cost while the traditional approaches reconstruct every intermediate MSS from respective intermediate union of balls. In this study, we propose a novel computational framework named general MSS transformation framework (GMSSTF) between two MSSs without the assistance of union of balls. To evaluate the effectiveness of GMSSTF, we applied it on a popular public database PDB (Protein Data Bank) and compared the existing MSS algorithms with and without GMSSTF. The simulation results show that the proposed GMSSTF effectively improves the computational efficiency and is potentially useful for macromolecular dynamic simulations.



2011 ◽  
Vol 09 (03) ◽  
pp. 367-382 ◽  
Author(s):  
ALEKSANDAR POLEKSIC

The problem of finding an optimal structural alignment for a pair of superimposed proteins is often amenable to the Smith–Waterman dynamic programming algorithm, which runs in time proportional to the product of lengths of the sequences being aligned. While the quadratic running time is acceptable for computing a single alignment of two fixed protein structures, the time complexity becomes a bottleneck when running the Smith–Waterman routine multiple times in order to find a globally optimal superposition and alignment of the input proteins. We present a subquadratic running time algorithm capable of computing an alignment that optimizes one of the most widely used measures of protein structure similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. The algorithm presented in this article can be used to significantly improve the speed–accuracy tradeoff in a number of popular protein structure alignment methods.



Sign in / Sign up

Export Citation Format

Share Document