scholarly journals DeepTracer: Predicting Backbone Atomic Structure from High Resolution Cryo-EM Density Maps of Protein Complexes

2020 ◽  
Author(s):  
Jonas Pfab ◽  
Dong Si

AbstractMotivationAccurately determining the atomic structure of proteins represents a fundamental problem in the field of structural bioinformatics. A solution would be significant as protein structure information could be utilized in the medical field, e.g. in the development of vaccines for new viruses. This paper focuses on predicting the protein structure based on 3D images of the proteins captured through cryogenic electron microscopes (cryo-EM). A fully automated computationally efficient protein structure prediction method would be particularly beneficial in the field of cryo-EM as the technology allows researchers to photograph multiple large protein complexes in a single study, which means that a fast prediction method could allow for a high throughput of derived protein structures. We present a deep learning approach, DeepTracer, for predicting locations of the backbone atoms, secondary structure elements, and the amino acid types. In order to connect the predicted amino acids into chains, we applied a modified traveling salesman algorithm.ResultsWe trained our deep learning model on experimental cryo-EM density maps and tested it on a set of 50 density maps. We found that our new approach predicted protein structures with an average RMSD value of 1.18 and a coverage of 87.5%. Furthermore, we detected secondary structure information for 87.2% of amino acids correctly. We also showed preliminarily that 25.2% of amino acid types could be predicted directly from the 3D cryo-EM density map, considering 20 different types in total. Finally, we noted that the prediction runtime of DeepTracer is significantly improved compared to other methods. It predicts a large protein complex structure of more than 30,000 amino acids in only 2 hours.AvailabilityThe repository of this project will be [email protected] informationSupplementary data will be available at Bioinformatics online.

2020 ◽  
Author(s):  
Arian R. Jamasb ◽  
Pietro Lió ◽  
Tom L. Blundell

AbstractGraphein is a python library for constructing graph and surface-mesh representations of protein structures for computational analysis. The library interfaces with popular geometric deep learning libraries: DGL, PyTorch Geometric and PyTorch3D. Geometric deep learning is emerging as a popular methodology in computational structural biology. As feature engineering is a vital step in a machine learning project, the library is designed to be highly flexible, allowing the user to parameterise the graph construction, scaleable to facilitate working with large protein complexes, and containing useful pre-processing tools for preparing experimental structure files. Graphein is also designed to facilitate network-based and graph-theoretic analyses of protein structures in a high-throughput manner. As example workflows, we make available two new protein structure-related datasets, previously unused by the geometric deep learning community.Availability and implementationGraphein is written in python. Source code, example usage and datasets, and documentation are made freely available under a MIT License at the following URL: https://github.com/a-r-j/graphein


2020 ◽  
Author(s):  
Xing Zhang ◽  
Junwen Luo ◽  
Yi Cai ◽  
Wei Zhu ◽  
Xiaofeng Yang ◽  
...  

AbstractDeep learning has been increasingly used in protein tertiary structure prediction, a major goal in life science. However, all the algorithms developed so far mostly use protein sequences as input, whereas the vast amount of protein tertiary structure information available in the Protein Data Bank (PDB) database remains largely unused, because of the inherent complexity of 3D data computation. In this study, we propose Protein Structure Camera (PSC) as an approach to convert protein structures into images. As a case study, we developed a deep learning method incorporating PSC (DeepPSC) to reconstruct protein backbone structures from alpha carbon traces. DeepPSC outperformed all the methods currently available for this task. This PSC approach provides a useful tool for protein structure representation, and for the application of deep learning in protein structure prediction and protein engineering.


2020 ◽  
Vol 15 (7) ◽  
pp. 732-740
Author(s):  
Neetu Kumari ◽  
Anshul Verma

Background: The basic building block of a body is protein which is a complex system whose structure plays a key role in activation, catalysis, messaging and disease states. Therefore, careful investigation of protein structure is necessary for the diagnosis of diseases and for the drug designing. Protein structures are described at their different levels of complexity: primary (chain), secondary (helical), tertiary (3D), and quaternary structure. Analyzing complex 3D structure of protein is a difficult task but it can be analyzed as a network of interconnection between its component, where amino acids are considered as nodes and interconnection between them are edges. Objective: Many literature works have proven that the small world network concept provides many new opportunities to investigate network of biological systems. The objective of this paper is analyzing the protein structure using small world concept. Methods: Protein is analyzed using small world network concept, specifically where extreme condition is having a degree distribution which follows power law. For the correct verification of the proposed approach, dataset of the Oncogene protein structure is analyzed using Python programming. Results: Protein structure is plotted as network of amino acids (Residue Interaction Graph (RIG)) using distance matrix of nodes with given threshold, then various centrality measures (i.e., degree distribution, Degree-Betweenness correlation, and Betweenness-Closeness correlation) are calculated for 1323 nodes and graphs are plotted. Conclusion: Ultimately, it is concluded that there exist hubs with higher centrality degree but less in number, and they are expected to be robust toward harmful effects of mutations with new functions.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Pablo Mier ◽  
Miguel A. Andrade-Navarro

Abstract According to the amino acid composition of natural proteins, it could be expected that all possible sequences of three or four amino acids will occur at least once in large protein datasets purely by chance. However, in some species or cellular context, specific short amino acid motifs are missing due to unknown reasons. We describe these as Avoided Motifs, short amino acid combinations missing from biological sequences. Here we identify 209 human and 154 bacterial Avoided Motifs of length four amino acids, and discuss their possible functionality according to their presence in other species. Furthermore, we determine two Avoided Motifs of length three amino acids in human proteins specifically located in the cytoplasm, and two more in secreted proteins. Our results support the hypothesis that the characterization of Avoided Motifs in particular contexts can provide us with information about functional motifs, pointing to a new approach in the use of molecular sequences for the discovery of protein function.


Author(s):  
Mark Lorch

This chapter examines proteins, the dominant proportion of cellular machinery, and the relationship between protein structure and function. The multitude of biological processes needed to keep cells functioning are managed in the organism or cell by a massive cohort of proteins, together known as the proteome. The twenty amino acids that make up the bulk of proteins produce the vast array of protein structures. However, amino acids alone do not provide quite enough chemical variety to complete all of the biochemical activity of a cell, so the chapter also explores post-translation modifications. It finishes by looking as some dynamic aspects of proteins, including enzyme kinetics and the protein folding problem.


2019 ◽  
Vol 20 (4) ◽  
pp. 931 ◽  
Author(s):  
Jean-Marc Jeckelmann ◽  
Dimitrios Fotiadis

Heteromeric amino acid transporters (HATs) are protein complexes that catalyze the transport of amino acids across plasma membranes. HATs are composed of two subunits, a heavy and a light subunit, which belong to the solute carrier (SLC) families SLC3 and SLC7. The two subunits are linked by a conserved disulfide bridge. Several human diseases are associated with loss of function or overexpression of specific HATs making them drug targets. The human HAT 4F2hc-LAT2 (SLC3A2-SLC7A8) is specific for the transport of large neutral L-amino acids and specific amino acid-related compounds. Human 4F2hc-LAT2 can be functionally overexpressed in the methylotrophic yeast Pichia pastoris and pure recombinant protein purified. Here we present the first cryo-electron microscopy (cryo-EM) 3D-map of a HAT, i.e., of the human 4F2hc-LAT2 complex. The structure could be determined at ~13 Å resolution using direct electron detector and Volta phase plate technologies. The 3D-map displays two prominent densities of different sizes. The available X-ray structure of the 4F2hc ectodomain fitted nicely into the smaller density revealing the relative position of 4F2hc with respect to LAT2 and the membrane plane.


2019 ◽  
Vol 20 (18) ◽  
pp. 4436 ◽  
Author(s):  
Piotr Fabian ◽  
Katarzyna Stapor ◽  
Mateusz Banach ◽  
Magdalena Ptak-Kaczor ◽  
Leszek Konieczny ◽  
...  

Protein structure is the result of the high synergy of all amino acids present in the protein. This synergy is the result of an overall strategy for adapting a specific protein structure. It is a compromise between two trends: The optimization of non-binding interactions and the directing of the folding process by an external force field, whose source is the water environment. The geometric parameters of the structural form of the polypeptide chain in the form of a local radius of curvature that is dependent on the orientation of adjacent peptide bond planes (result of the respective Phi and Psi rotation) allow for a comparative analysis of protein structures. Certain levels of their geometry are the criteria for comparison. In particular, they can be used to assess the differences between the structural form of biologically active proteins and their amyloid forms. On the other hand, the application of the fuzzy oil drop model allows the assessment of the role of amino acids in the construction of tertiary structure through their participation in the construction of a hydrophobic core. The combination of these two models—the geometric structure of the backbone and the determining of the participation in the construction of the tertiary structure that is applied for the comparative analysis of biologically active and amyloid forms—is presented.


Author(s):  
Toshio Iwasaki ◽  
Yoshiharu Miyajima-Nakano ◽  
Risako Fukazawa ◽  
Myat T Lin ◽  
Shin-Ichi Matsushita ◽  
...  

Abstract A set of C43(DE3) and BL21(DE3) Escherichia coli host strains that are auxotrophic for various amino acids is briefly reviewed. These strains require the addition of a defined set of one or more amino acids in the growth medium, and have been specifically designed for overproduction of membrane or water-soluble proteins selectively labeled with stable isotopes such as 2H, 13C and 15N. The strains described here are available for use and have been deposited into public strain banks. Although they cannot fully eliminate the possibility of isotope dilution and mixing, metabolic scrambling of the different amino acid types can be minimized through a careful consideration of the bacterial metabolic pathways. The use of a suitable auxotrophic expression host strain with an appropriately isotopically labeled growth medium ensures high levels of isotope labeling efficiency as well as selectivity for providing deeper insight into protein structure-function relationships.


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 217 ◽  
Author(s):  
Sandeep Chakraborty ◽  
Basuthkar J. Rao ◽  
Bjarni Asgeirsson ◽  
Ravindra Venkatramani ◽  
Abhaya M. Dandekar

The remarkable diversity in biological systems is rooted in the ability of the twenty naturally occurring amino acids to perform multifarious catalytic functions by creating unique structural scaffolds known as the active site. Finding such structrual motifs within the protein structure is a key aspect of many computational methods. The algorithm for obtaining combinations of motifs of a certain length, although polynomial in complexity, runs in non-trivial computer time. Also, the search space expands considerably if stereochemically equivalent residues are allowed to replace an amino acid in the motif. In the present work, we propose a method to precompile all possible motifs comprising of a set (n=4 in this case) of predefined amino acid residues from a protein structure that occur within a specified distance (R) of each other (PREMONITION). PREMONITION rolls a sphere of radius R along the protein fold centered at the C atom of each residue, and all possible motifs are extracted within this sphere. The number of residues that can occur within a sphere centered around a residue is bounded by physical constraints, thus setting an upper limit on the processing times. After such a pre-compilation step, the computational time required for querying a protein structure with multiple motifs is considerably reduced. Previously, we had proposed a computational method to estimate the promiscuity of proteins with known active site residues and 3D structure using a database of known active sites in proteins (CSA) by querying each protein with the active site motif of every other residue. The runtimes for such a comparison is reduced from days to hours using the PREMONITION methodology.


Sign in / Sign up

Export Citation Format

Share Document