DeepDriveMD: Deep-Learning Driven Adaptive Molecular Simulations for Protein Folding

Abstract Motivation From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches often cannot detect hidden structural relationships in the ‘twilight zone’ of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent ‘d’). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments using experimentally determined protein structures. Results To demonstrate that the folding code was learned, we first show that SAdLSA trained on pure α-helical proteins successfully recognizes pairs of structurally related pure β-sheet protein domains. Subsequent training and benchmarking on larger, highly challenging datasets show significant improvement over established approaches. For challenging cases, SAdLSA is ∼150% better than HHsearch for generating pairwise alignments and ∼50% better for identifying the proteins with the best alignments in a sequence library. The time complexity of SAdLSA is O(N) thanks to GPU acceleration. Availability and implementation Datasets and source codes of SAdLSA are available free of charge for academic users at http://sites.gatech.edu/cssb/sadlsa/. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TorchMD: A Deep Learning Framework for Molecular Simulations

Journal of Chemical Theory and Computation ◽

10.1021/acs.jctc.0c01343 ◽

2021 ◽

Author(s):

Stefan Doerr ◽

Maciej Majewski ◽

Adrià Pérez ◽

Andreas Krämer ◽

Cecilia Clementi ◽

...

Keyword(s):

Deep Learning ◽

Molecular Simulations ◽

Learning Framework

Download Full-text

Molecular simulations of cotranslational protein folding: fragment stabilities, folding cooperativity and trapping in the ribosome tunnel

PLoS Computational Biology ◽

10.1371/journal.pcbi.0020098.eor ◽

2005 ◽

Vol preprint (2006) ◽

pp. e98

Author(s):

Adrian Elcock

Keyword(s):

Protein Folding ◽

Molecular Simulations

Download Full-text

A generalized deep learning approach for local structure identification in molecular simulations

Chemical Science ◽

10.1039/c9sc02097g ◽

2019 ◽

Vol 10 (32) ◽

pp. 7503-7515 ◽

Cited By ~ 12

Author(s):

Ryan S. DeFever ◽

Colin Targonski ◽

Steven W. Hall ◽

Melissa C. Smith ◽

Sapna Sarupria

Keyword(s):

Deep Learning ◽

Local Structure ◽

Molecular Simulations ◽

Structure Identification ◽

Learning Approach ◽

Atomic Coordinates

We demonstrate a PointNet-based deep learning approach to classify local structure in molecular simulations, learning features directly from atomic coordinates.

Download Full-text

Molecular Simulations of Cotranslational Protein Folding: Fragment Stabilities, Folding Cooperativity, and Trapping in the Ribosome

PLoS Computational Biology ◽

10.1371/journal.pcbi.0020098 ◽

2006 ◽

Vol 2 (7) ◽

pp. e98 ◽

Cited By ~ 90

Author(s):

Adrian H Elcock

Keyword(s):

Protein Folding ◽

Molecular Simulations

Download Full-text

Distance-based protein folding powered by deep learning

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1821309116 ◽

2019 ◽

Vol 116 (34) ◽

pp. 16856-16865 ◽

Cited By ~ 72

Author(s):

Jinbo Xu

Keyword(s):

Protein Folding ◽

Deep Learning ◽

Family Size ◽

Experimental Validation ◽

Distance Matrix ◽

Data Bank ◽

3D Models ◽

Geometric Constraints ◽

Central Processing ◽

Direct Coupling Analysis

Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.

Download Full-text