Annotating precision for integrative structural models using deep learning

Mapping Intimacies ◽

10.1101/2021.06.22.449385 ◽

2021 ◽

Author(s):

Nikhil Kasukurthi ◽

Shruthi Viswanath

Keyword(s):

Deep Learning ◽

High Precision ◽

Protein Complexes ◽

Source Code ◽

Structural Models ◽

Protein Assemblies ◽

Large Protein ◽

Single Precision ◽

Input Information ◽

Gradient Based

Motivation: Integrative modeling of macromolecular structures usually results in an ensemble of models that satisfy the input information. The model precision, or variability among these models is estimated globally, i.e., a single precision value is reported for the model. However, it would be useful to identify regions of high and low precision. For instance, low-precision regions can suggest where the next experiments could be performed and high-precision regions can be used for further analysis, e.g., suggesting mutations. Results: We develop PrISM (Precision for Integrative Structural Models), using autoencoders to efficiently and accurately annotate precision for integrative models. The method is benchmarked and tested on five examples of binary protein complexes and five examples of large protein assemblies. The annotated precision is shown to be consistent with, and more informative than localization densities. The generated networks are also interpreted by gradient-based attention analysis. Availability: Source code is at https://github.com/isblab/prism.

Download Full-text

Nuclear Pore Scaffold Structure Analyzed by Super-Resolution Microscopy and Particle Averaging

Science ◽

10.1126/science.1240672 ◽

2013 ◽

Vol 341 (6146) ◽

pp. 655-658 ◽

Cited By ~ 274

Author(s):

Anna Szymborska ◽

Alex de Marco ◽

Nathalie Daigle ◽

Volker C. Cordes ◽

John A. G. Briggs ◽

...

Keyword(s):

Protein Complexes ◽

Super Resolution ◽

Molecular Organization ◽

Nuclear Pore ◽

Protein Assemblies ◽

Large Protein ◽

Whole Cells ◽

Molecular Machinery ◽

Scaffold Structure ◽

Super Resolution Microscopy

Much of life’s essential molecular machinery consists of large protein assemblies that currently pose challenges for structure determination. A prominent example is the nuclear pore complex (NPC), for which the organization of its individual components remains unknown. By combining stochastic super-resolution microscopy, to directly resolve the ringlike structure of the NPC, with single particle averaging, to use information from thousands of pores, we determined the average positions of fluorescent molecular labels in the NPC with a precision well below 1 nanometer. Applying this approach systematically to the largest building block of the NPC, the Nup107-160 subcomplex, we assessed the structure of the NPC scaffold. Thus, light microscopy can be used to study the molecular organization of large protein complexes in situ in whole cells.

Download Full-text

Structures of core eukaryotic protein complexes

10.1101/2021.09.30.462231 ◽

2021 ◽

Cited By ~ 2

Author(s):

Ian R. Humphreys ◽

Jimin Pei ◽

Minkyung Baek ◽

Aditya Krishnakumar ◽

Ivan Anishchenko ◽

...

Keyword(s):

Deep Learning ◽

Amino Acid ◽

Protein Complexes ◽

Structure Modeling ◽

Protein Assemblies ◽

Sequence Alignments ◽

Multiple Sequence ◽

Eukaryotic Protein ◽

Recent Advances ◽

Coevolution Analysis

AbstractProtein-protein interactions play critical roles in biology, but despite decades of effort, the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions that have not yet been identified. Here, we take advantage of recent advances in proteome-wide amino acid coevolution analysis and deep-learning-based structure modeling to systematically identify and build accurate models of core eukaryotic protein complexes, as represented within the Saccharomyces cerevisiae proteome. We use a combination of RoseTTAFold and AlphaFold to screen through paired multiple sequence alignments for 8.3 million pairs of S. cerevisiae proteins and build models for strongly predicted protein assemblies with two to five components. Comparison to existing interaction and structural data suggests that these predictions are likely to be quite accurate. We provide structure models spanning almost all key processes in Eukaryotic cells for 104 protein assemblies which have not been previously identified, and 608 which have not been structurally characterized.One-sentence summaryWe take advantage of recent advances in proteome-wide amino acid coevolution analysis and deep-learning-based structure modeling to systematically identify and build accurate models of core eukaryotic protein complexes.

Download Full-text

Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures

10.1101/2020.07.15.204701 ◽

2020 ◽

Author(s):

Arian R. Jamasb ◽

Pietro Lió ◽

Tom L. Blundell

Keyword(s):

Deep Learning ◽

Learning Community ◽

Protein Complexes ◽

Protein Structures ◽

Surface Mesh ◽

Large Protein ◽

Graph Theoretic ◽

Experimental Structure ◽

New Protein ◽

High Throughput Manner

AbstractGraphein is a python library for constructing graph and surface-mesh representations of protein structures for computational analysis. The library interfaces with popular geometric deep learning libraries: DGL, PyTorch Geometric and PyTorch3D. Geometric deep learning is emerging as a popular methodology in computational structural biology. As feature engineering is a vital step in a machine learning project, the library is designed to be highly flexible, allowing the user to parameterise the graph construction, scaleable to facilitate working with large protein complexes, and containing useful pre-processing tools for preparing experimental structure files. Graphein is also designed to facilitate network-based and graph-theoretic analyses of protein structures in a high-throughput manner. As example workflows, we make available two new protein structure-related datasets, previously unused by the geometric deep learning community.Availability and implementationGraphein is written in python. Source code, example usage and datasets, and documentation are made freely available under a MIT License at the following URL: https://github.com/a-r-j/graphein

Download Full-text

DeepTracer: Predicting Backbone Atomic Structure from High Resolution Cryo-EM Density Maps of Protein Complexes

10.1101/2020.02.12.946772 ◽

2020 ◽

Author(s):

Jonas Pfab ◽

Dong Si

Keyword(s):

Amino Acids ◽

Deep Learning ◽

Protein Structure ◽

Amino Acid ◽

Protein Complexes ◽

Protein Structures ◽

Prediction Method ◽

Large Protein ◽

Structure Information ◽

Density Maps

AbstractMotivationAccurately determining the atomic structure of proteins represents a fundamental problem in the field of structural bioinformatics. A solution would be significant as protein structure information could be utilized in the medical field, e.g. in the development of vaccines for new viruses. This paper focuses on predicting the protein structure based on 3D images of the proteins captured through cryogenic electron microscopes (cryo-EM). A fully automated computationally efficient protein structure prediction method would be particularly beneficial in the field of cryo-EM as the technology allows researchers to photograph multiple large protein complexes in a single study, which means that a fast prediction method could allow for a high throughput of derived protein structures. We present a deep learning approach, DeepTracer, for predicting locations of the backbone atoms, secondary structure elements, and the amino acid types. In order to connect the predicted amino acids into chains, we applied a modified traveling salesman algorithm.ResultsWe trained our deep learning model on experimental cryo-EM density maps and tested it on a set of 50 density maps. We found that our new approach predicted protein structures with an average RMSD value of 1.18 and a coverage of 87.5%. Furthermore, we detected secondary structure information for 87.2% of amino acids correctly. We also showed preliminarily that 25.2% of amino acid types could be predicted directly from the 3D cryo-EM density map, considering 20 different types in total. Finally, we noted that the prediction runtime of DeepTracer is significantly improved compared to other methods. It predicts a large protein complex structure of more than 30,000 amino acids in only 2 hours.AvailabilityThe repository of this project will be [email protected] informationSupplementary data will be available at Bioinformatics online.

Download Full-text

Faculty Opinions recommendation of Rapid analysis of large protein-protein complexes using NMR-derived orientational constraints: the 95 kDa complex of LpxA with acyl carrier protein.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1020790.250631 ◽

2004 ◽

Author(s):

Antonio Rosato

Keyword(s):

Acyl Carrier Protein ◽

Protein Complexes ◽

Rapid Analysis ◽

Carrier Protein ◽

Large Protein

Download Full-text

Faculty Opinions recommendation of Structural analysis of large protein complexes using solvent paramagnetic relaxation enhancements.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.10384956.11203054 ◽

2011 ◽

Author(s):

Gottfried Otting

Keyword(s):

Structural Analysis ◽

Protein Complexes ◽

Paramagnetic Relaxation ◽

Large Protein ◽

Paramagnetic Relaxation Enhancements

Download Full-text

Literature survey of deep learning-based vulnerability analysis on source code

IET Software ◽

10.1049/iet-sen.2020.0084 ◽

2020 ◽

Vol 14 (6) ◽

pp. 654-664

Author(s):

Abubakar Omari Abdallah Semasaba ◽

Wei Zheng ◽

Xiaoxue Wu ◽

Samuel Akwasi Agyemang

Keyword(s):

Deep Learning ◽

Source Code ◽

Vulnerability Analysis ◽

Literature Survey

Download Full-text

DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning

Scientific Reports ◽

10.1038/s41598-021-91827-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Farhan Quadir ◽

Raj S. Roy ◽

Randal Halfmann ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Tertiary Structure ◽

Protein Complexes ◽

Complex Structure ◽

Great Success ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Residue Contacts ◽

Evolutionary Features

AbstractDeep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.

Download Full-text