scholarly journals Annotating precision for integrative structural models using deep learning

2021 ◽  
Author(s):  
Nikhil Kasukurthi ◽  
Shruthi Viswanath

Motivation: Integrative modeling of macromolecular structures usually results in an ensemble of models that satisfy the input information. The model precision, or variability among these models is estimated globally, i.e., a single precision value is reported for the model. However, it would be useful to identify regions of high and low precision. For instance, low-precision regions can suggest where the next experiments could be performed and high-precision regions can be used for further analysis, e.g., suggesting mutations. Results: We develop PrISM (Precision for Integrative Structural Models), using autoencoders to efficiently and accurately annotate precision for integrative models. The method is benchmarked and tested on five examples of binary protein complexes and five examples of large protein assemblies. The annotated precision is shown to be consistent with, and more informative than localization densities. The generated networks are also interpreted by gradient-based attention analysis. Availability: Source code is at https://github.com/isblab/prism.

Science ◽  
2013 ◽  
Vol 341 (6146) ◽  
pp. 655-658 ◽  
Author(s):  
Anna Szymborska ◽  
Alex de Marco ◽  
Nathalie Daigle ◽  
Volker C. Cordes ◽  
John A. G. Briggs ◽  
...  

Much of life’s essential molecular machinery consists of large protein assemblies that currently pose challenges for structure determination. A prominent example is the nuclear pore complex (NPC), for which the organization of its individual components remains unknown. By combining stochastic super-resolution microscopy, to directly resolve the ringlike structure of the NPC, with single particle averaging, to use information from thousands of pores, we determined the average positions of fluorescent molecular labels in the NPC with a precision well below 1 nanometer. Applying this approach systematically to the largest building block of the NPC, the Nup107-160 subcomplex, we assessed the structure of the NPC scaffold. Thus, light microscopy can be used to study the molecular organization of large protein complexes in situ in whole cells.


Author(s):  
Ian R. Humphreys ◽  
Jimin Pei ◽  
Minkyung Baek ◽  
Aditya Krishnakumar ◽  
Ivan Anishchenko ◽  
...  

AbstractProtein-protein interactions play critical roles in biology, but despite decades of effort, the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions that have not yet been identified. Here, we take advantage of recent advances in proteome-wide amino acid coevolution analysis and deep-learning-based structure modeling to systematically identify and build accurate models of core eukaryotic protein complexes, as represented within the Saccharomyces cerevisiae proteome. We use a combination of RoseTTAFold and AlphaFold to screen through paired multiple sequence alignments for 8.3 million pairs of S. cerevisiae proteins and build models for strongly predicted protein assemblies with two to five components. Comparison to existing interaction and structural data suggests that these predictions are likely to be quite accurate. We provide structure models spanning almost all key processes in Eukaryotic cells for 104 protein assemblies which have not been previously identified, and 608 which have not been structurally characterized.One-sentence summaryWe take advantage of recent advances in proteome-wide amino acid coevolution analysis and deep-learning-based structure modeling to systematically identify and build accurate models of core eukaryotic protein complexes.


2020 ◽  
Author(s):  
Arian R. Jamasb ◽  
Pietro Lió ◽  
Tom L. Blundell

AbstractGraphein is a python library for constructing graph and surface-mesh representations of protein structures for computational analysis. The library interfaces with popular geometric deep learning libraries: DGL, PyTorch Geometric and PyTorch3D. Geometric deep learning is emerging as a popular methodology in computational structural biology. As feature engineering is a vital step in a machine learning project, the library is designed to be highly flexible, allowing the user to parameterise the graph construction, scaleable to facilitate working with large protein complexes, and containing useful pre-processing tools for preparing experimental structure files. Graphein is also designed to facilitate network-based and graph-theoretic analyses of protein structures in a high-throughput manner. As example workflows, we make available two new protein structure-related datasets, previously unused by the geometric deep learning community.Availability and implementationGraphein is written in python. Source code, example usage and datasets, and documentation are made freely available under a MIT License at the following URL: https://github.com/a-r-j/graphein


2020 ◽  
Author(s):  
Jonas Pfab ◽  
Dong Si

AbstractMotivationAccurately determining the atomic structure of proteins represents a fundamental problem in the field of structural bioinformatics. A solution would be significant as protein structure information could be utilized in the medical field, e.g. in the development of vaccines for new viruses. This paper focuses on predicting the protein structure based on 3D images of the proteins captured through cryogenic electron microscopes (cryo-EM). A fully automated computationally efficient protein structure prediction method would be particularly beneficial in the field of cryo-EM as the technology allows researchers to photograph multiple large protein complexes in a single study, which means that a fast prediction method could allow for a high throughput of derived protein structures. We present a deep learning approach, DeepTracer, for predicting locations of the backbone atoms, secondary structure elements, and the amino acid types. In order to connect the predicted amino acids into chains, we applied a modified traveling salesman algorithm.ResultsWe trained our deep learning model on experimental cryo-EM density maps and tested it on a set of 50 density maps. We found that our new approach predicted protein structures with an average RMSD value of 1.18 and a coverage of 87.5%. Furthermore, we detected secondary structure information for 87.2% of amino acids correctly. We also showed preliminarily that 25.2% of amino acid types could be predicted directly from the 3D cryo-EM density map, considering 20 different types in total. Finally, we noted that the prediction runtime of DeepTracer is significantly improved compared to other methods. It predicts a large protein complex structure of more than 30,000 amino acids in only 2 hours.AvailabilityThe repository of this project will be [email protected] informationSupplementary data will be available at Bioinformatics online.


IET Software ◽  
2020 ◽  
Vol 14 (6) ◽  
pp. 654-664
Author(s):  
Abubakar Omari Abdallah Semasaba ◽  
Wei Zheng ◽  
Xiaoxue Wu ◽  
Samuel Akwasi Agyemang

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Farhan Quadir ◽  
Raj S. Roy ◽  
Randal Halfmann ◽  
Jianlin Cheng

AbstractDeep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.


Mitochondrion ◽  
2015 ◽  
Vol 21 ◽  
pp. 27-32 ◽  
Author(s):  
Yang Xu ◽  
Ashim Malhotra ◽  
Steven M. Claypool ◽  
Mindong Ren ◽  
Michael Schlame

Sign in / Sign up

Export Citation Format

Share Document