DeepTracer: Predicting Backbone Atomic Structure from High Resolution Cryo-EM Density Maps of Protein Complexes

Mapping Intimacies ◽

10.1101/2020.02.12.946772 ◽

2020 ◽

Author(s):

Jonas Pfab ◽

Dong Si

Keyword(s):

Amino Acids ◽

Deep Learning ◽

Protein Structure ◽

Amino Acid ◽

Protein Complexes ◽

Protein Structures ◽

Prediction Method ◽

Large Protein ◽

Structure Information ◽

Density Maps

AbstractMotivationAccurately determining the atomic structure of proteins represents a fundamental problem in the field of structural bioinformatics. A solution would be significant as protein structure information could be utilized in the medical field, e.g. in the development of vaccines for new viruses. This paper focuses on predicting the protein structure based on 3D images of the proteins captured through cryogenic electron microscopes (cryo-EM). A fully automated computationally efficient protein structure prediction method would be particularly beneficial in the field of cryo-EM as the technology allows researchers to photograph multiple large protein complexes in a single study, which means that a fast prediction method could allow for a high throughput of derived protein structures. We present a deep learning approach, DeepTracer, for predicting locations of the backbone atoms, secondary structure elements, and the amino acid types. In order to connect the predicted amino acids into chains, we applied a modified traveling salesman algorithm.ResultsWe trained our deep learning model on experimental cryo-EM density maps and tested it on a set of 50 density maps. We found that our new approach predicted protein structures with an average RMSD value of 1.18 and a coverage of 87.5%. Furthermore, we detected secondary structure information for 87.2% of amino acids correctly. We also showed preliminarily that 25.2% of amino acid types could be predicted directly from the 3D cryo-EM density map, considering 20 different types in total. Finally, we noted that the prediction runtime of DeepTracer is significantly improved compared to other methods. It predicts a large protein complex structure of more than 30,000 amino acids in only 2 hours.AvailabilityThe repository of this project will be [email protected] informationSupplementary data will be available at Bioinformatics online.

Download Full-text

Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures

10.1101/2020.07.15.204701 ◽

2020 ◽

Author(s):

Arian R. Jamasb ◽

Pietro Lió ◽

Tom L. Blundell

Keyword(s):

Deep Learning ◽

Learning Community ◽

Protein Complexes ◽

Protein Structures ◽

Surface Mesh ◽

Large Protein ◽

Graph Theoretic ◽

Experimental Structure ◽

New Protein ◽

High Throughput Manner

AbstractGraphein is a python library for constructing graph and surface-mesh representations of protein structures for computational analysis. The library interfaces with popular geometric deep learning libraries: DGL, PyTorch Geometric and PyTorch3D. Geometric deep learning is emerging as a popular methodology in computational structural biology. As feature engineering is a vital step in a machine learning project, the library is designed to be highly flexible, allowing the user to parameterise the graph construction, scaleable to facilitate working with large protein complexes, and containing useful pre-processing tools for preparing experimental structure files. Graphein is also designed to facilitate network-based and graph-theoretic analyses of protein structures in a high-throughput manner. As example workflows, we make available two new protein structure-related datasets, previously unused by the geometric deep learning community.Availability and implementationGraphein is written in python. Source code, example usage and datasets, and documentation are made freely available under a MIT License at the following URL: https://github.com/a-r-j/graphein

Download Full-text

DeepPSC (protein structure camera): computer vision-based protein backbone structure reconstruction from alpha carbon trace as a case study

10.1101/2020.08.12.247312 ◽

2020 ◽

Author(s):

Xing Zhang ◽

Junwen Luo ◽

Yi Cai ◽

Wei Zhu ◽

Xiaofeng Yang ◽

...

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Protein Structures ◽

Protein Backbone ◽

Protein Tertiary Structure ◽

Structure Information ◽

Alpha Carbon

AbstractDeep learning has been increasingly used in protein tertiary structure prediction, a major goal in life science. However, all the algorithms developed so far mostly use protein sequences as input, whereas the vast amount of protein tertiary structure information available in the Protein Data Bank (PDB) database remains largely unused, because of the inherent complexity of 3D data computation. In this study, we propose Protein Structure Camera (PSC) as an approach to convert protein structures into images. As a case study, we developed a deep learning method incorporating PSC (DeepPSC) to reconstruct protein backbone structures from alpha carbon traces. DeepPSC outperformed all the methods currently available for this task. This PSC approach provides a useful tool for protein structure representation, and for the application of deep learning in protein structure prediction and protein engineering.

Download Full-text

Analysis of Oncogene Protein Structure Using Small World Network Concept

Current Bioinformatics ◽

10.2174/1574893614666191113143840 ◽

2020 ◽

Vol 15 (7) ◽

pp. 732-740

Author(s):

Neetu Kumari ◽

Anshul Verma

Keyword(s):

Amino Acids ◽

Protein Structure ◽

Degree Distribution ◽

Protein Structures ◽

Small World ◽

Extreme Condition ◽

Centrality Measures ◽

Small World Network ◽

Network Concept ◽

Oncogene Protein

Background: The basic building block of a body is protein which is a complex system whose structure plays a key role in activation, catalysis, messaging and disease states. Therefore, careful investigation of protein structure is necessary for the diagnosis of diseases and for the drug designing. Protein structures are described at their different levels of complexity: primary (chain), secondary (helical), tertiary (3D), and quaternary structure. Analyzing complex 3D structure of protein is a difficult task but it can be analyzed as a network of interconnection between its component, where amino acids are considered as nodes and interconnection between them are edges. Objective: Many literature works have proven that the small world network concept provides many new opportunities to investigate network of biological systems. The objective of this paper is analyzing the protein structure using small world concept. Methods: Protein is analyzed using small world network concept, specifically where extreme condition is having a degree distribution which follows power law. For the correct verification of the proposed approach, dataset of the Oncogene protein structure is analyzed using Python programming. Results: Protein structure is plotted as network of amino acids (Residue Interaction Graph (RIG)) using distance matrix of nodes with given threshold, then various centrality measures (i.e., degree distribution, Degree-Betweenness correlation, and Betweenness-Closeness correlation) are calculated for 1323 nodes and graphs are plotted. Conclusion: Ultimately, it is concluded that there exist hubs with higher centrality degree but less in number, and they are expected to be robust toward harmful effects of mutations with new functions.

Download Full-text

Avoided motifs: short amino acid strings missing from protein datasets

Biological Chemistry ◽

10.1515/hsz-2020-0383 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Pablo Mier ◽

Miguel A. Andrade-Navarro

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Protein Function ◽

Large Protein ◽

New Approach ◽

Cellular Context ◽

Human Proteins ◽

Context Specific ◽

Protein Datasets

Abstract According to the amino acid composition of natural proteins, it could be expected that all possible sequences of three or four amino acids will occur at least once in large protein datasets purely by chance. However, in some species or cellular context, specific short amino acid motifs are missing due to unknown reasons. We describe these as Avoided Motifs, short amino acid combinations missing from biological sequences. Here we identify 209 human and 154 bacterial Avoided Motifs of length four amino acids, and discuss their possible functionality according to their presence in other species. Furthermore, we determine two Avoided Motifs of length three amino acids in human proteins specifically located in the cytoplasm, and two more in secreted proteins. Our results support the hypothesis that the characterization of Avoided Motifs in particular contexts can provide us with information about functional motifs, pointing to a new approach in the use of molecular sequences for the discovery of protein function.

Download Full-text

3. Proteins

Biochemistry: A Very Short Introduction ◽

10.1093/actrade/9780198833871.003.0003 ◽

2021 ◽

pp. 34-51

Author(s):

Mark Lorch

Keyword(s):

Amino Acids ◽

Protein Folding ◽

Protein Structure ◽

Protein Structures ◽

Structure And Function ◽

Vast Array ◽

A Cell ◽

Cellular Machinery ◽

And Function ◽

The Relationship

This chapter examines proteins, the dominant proportion of cellular machinery, and the relationship between protein structure and function. The multitude of biological processes needed to keep cells functioning are managed in the organism or cell by a massive cohort of proteins, together known as the proteome. The twenty amino acids that make up the bulk of proteins produce the vast array of protein structures. However, amino acids alone do not provide quite enough chemical variety to complete all of the biochemical activity of a cell, so the chapter also explores post-translation modifications. It finishes by looking as some dynamic aspects of proteins, including enzyme kinetics and the protein folding problem.

Download Full-text

Volta Phase Plate Cryo-EM Structure of the Human Heterodimeric Amino Acid Transporter 4F2hc-LAT2

International Journal of Molecular Sciences ◽

10.3390/ijms20040931 ◽

2019 ◽

Vol 20 (4) ◽

pp. 931 ◽

Cited By ~ 9

Author(s):

Jean-Marc Jeckelmann ◽

Dimitrios Fotiadis

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Drug Targets ◽

Plasma Membranes ◽

Protein Complexes ◽

Methylotrophic Yeast ◽

Disulfide Bridge ◽

Phase Plate ◽

Amino Acid Transporters ◽

Loss Of Function

Heteromeric amino acid transporters (HATs) are protein complexes that catalyze the transport of amino acids across plasma membranes. HATs are composed of two subunits, a heavy and a light subunit, which belong to the solute carrier (SLC) families SLC3 and SLC7. The two subunits are linked by a conserved disulfide bridge. Several human diseases are associated with loss of function or overexpression of specific HATs making them drug targets. The human HAT 4F2hc-LAT2 (SLC3A2-SLC7A8) is specific for the transport of large neutral L-amino acids and specific amino acid-related compounds. Human 4F2hc-LAT2 can be functionally overexpressed in the methylotrophic yeast Pichia pastoris and pure recombinant protein purified. Here we present the first cryo-electron microscopy (cryo-EM) 3D-map of a HAT, i.e., of the human 4F2hc-LAT2 complex. The structure could be determined at ~13 Å resolution using direct electron detector and Volta phase plate technologies. The 3D-map displays two prominent densities of different sizes. The available X-ray structure of the 4F2hc ectodomain fitted nicely into the smaller density revealing the relative position of 4F2hc with respect to LAT2 and the membrane plane.

Download Full-text

Different Synergy in Amyloids and Biologically Active Forms of Proteins

International Journal of Molecular Sciences ◽

10.3390/ijms20184436 ◽

2019 ◽

Vol 20 (18) ◽

pp. 4436 ◽

Cited By ~ 2

Author(s):

Piotr Fabian ◽

Katarzyna Stapor ◽

Mateusz Banach ◽

Magdalena Ptak-Kaczor ◽

Leszek Konieczny ◽

...

Keyword(s):

Amino Acids ◽

Comparative Analysis ◽

Protein Structure ◽

Tertiary Structure ◽

Protein Structures ◽

Water Environment ◽

Biologically Active ◽

Radius Of Curvature ◽

Hydrophobic Core ◽

Structural Form

Protein structure is the result of the high synergy of all amino acids present in the protein. This synergy is the result of an overall strategy for adapting a specific protein structure. It is a compromise between two trends: The optimization of non-binding interactions and the directing of the folding process by an external force field, whose source is the water environment. The geometric parameters of the structural form of the polypeptide chain in the form of a local radius of curvature that is dependent on the orientation of adjacent peptide bond planes (result of the respective Phi and Psi rotation) allow for a comparative analysis of protein structures. Certain levels of their geometry are the criteria for comparison. In particular, they can be used to assess the differences between the structural form of biologically active proteins and their amyloid forms. On the other hand, the application of the fuzzy oil drop model allows the assessment of the role of amino acids in the construction of tertiary structure through their participation in the construction of a hydrophobic core. The combination of these two models—the geometric structure of the backbone and the determining of the participation in the construction of the tertiary structure that is applied for the comparative analysis of biologically active and amyloid forms—is presented.

Download Full-text

Escherichia coli amino acid auxotrophic expression host strains for investigating protein structure-function relationships

The Journal of Biochemistry ◽

10.1093/jb/mvaa140 ◽

2020 ◽

Author(s):

Toshio Iwasaki ◽

Yoshiharu Miyajima-Nakano ◽

Risako Fukazawa ◽

Myat T Lin ◽

Shin-Ichi Matsushita ◽

...

Keyword(s):

Escherichia Coli ◽

Amino Acids ◽

Protein Structure ◽

Amino Acid ◽

Structure Function ◽

Growth Medium ◽

Isotope Labeling ◽

Water Soluble ◽

Expression Host ◽

Host Strains

Abstract A set of C43(DE3) and BL21(DE3) Escherichia coli host strains that are auxotrophic for various amino acids is briefly reviewed. These strains require the addition of a defined set of one or more amino acids in the growth medium, and have been specifically designed for overproduction of membrane or water-soluble proteins selectively labeled with stable isotopes such as 2H, 13C and 15N. The strains described here are available for use and have been deposited into public strain banks. Although they cannot fully eliminate the possibility of isotope dilution and mixing, metabolic scrambling of the different amino acid types can be minimized through a careful consideration of the bacterial metabolic pathways. The use of a suitable auxotrophic expression host strain with an appropriately isotopically labeled growth medium ensures high levels of isotope labeling efficiency as well as selectivity for providing deeper insight into protein structure-function relationships.

Download Full-text

3G1124 Development of prediction method for β-sheet formation using amino acid pairing propensity(2)(3G Protein: Structure 3,The 49th Annual Meeting of the Biophysical Society of Japan)

Seibutsu Butsuri ◽

10.2142/biophys.51.s130_2 ◽

2011 ◽

Vol 51 (supplement) ◽

pp. S130

Author(s):

Hiromi Suzuki

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Annual Meeting ◽

Prediction Method ◽

Biophysical Society ◽

Β Sheet

Download Full-text

PREMONITION - Preprocessing motifs in protein structures for search acceleration

F1000Research ◽

10.12688/f1000research.5166.1 ◽

2014 ◽

Vol 3 ◽

pp. 217 ◽

Cited By ~ 3

Author(s):

Sandeep Chakraborty ◽

Basuthkar J. Rao ◽

Bjarni Asgeirsson ◽

Ravindra Venkatramani ◽

Abhaya M. Dandekar

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Active Site ◽

Active Sites ◽

Protein Structures ◽

3D Structure ◽

Search Space ◽

Computational Method ◽

Computational Time ◽

Active Site Residues

The remarkable diversity in biological systems is rooted in the ability of the twenty naturally occurring amino acids to perform multifarious catalytic functions by creating unique structural scaffolds known as the active site. Finding such structrual motifs within the protein structure is a key aspect of many computational methods. The algorithm for obtaining combinations of motifs of a certain length, although polynomial in complexity, runs in non-trivial computer time. Also, the search space expands considerably if stereochemically equivalent residues are allowed to replace an amino acid in the motif. In the present work, we propose a method to precompile all possible motifs comprising of a set (n=4 in this case) of predefined amino acid residues from a protein structure that occur within a specified distance (R) of each other (PREMONITION). PREMONITION rolls a sphere of radius R along the protein fold centered at the C atom of each residue, and all possible motifs are extracted within this sphere. The number of residues that can occur within a sphere centered around a residue is bounded by physical constraints, thus setting an upper limit on the processing times. After such a pre-compilation step, the computational time required for querying a protein structure with multiple motifs is considerably reduced. Previously, we had proposed a computational method to estimate the promiscuity of proteins with known active site residues and 3D structure using a database of known active sites in proteins (CSA) by querying each protein with the active site motif of every other residue. The runtimes for such a comparison is reduced from days to hours using the PREMONITION methodology.

Download Full-text