Sampling the conformational landscapes of transporters and receptors with AlphaFold2

Equilibrium fluctuations and triggered conformational changes often underlie the functional cycles of membrane proteins. For example, transporters mediate the passage of molecules across cell membranes by alternating between inward-facing (IF) and outward-facing (OF) states, while receptors undergo intracellular structural rearrangements that initiate signaling cascades. Although the conformational plasticity of these proteins has historically posed a challenge for traditional de novo protein structure prediction pipelines, the recent success of AlphaFold2 (AF2) in CASP14 culminated in the modeling of a transporter in multiple conformations to high accuracy. Given that AF2 was designed to predict static structures of proteins, it remains unclear if this result represents an underexplored capability to accurately predict multiple conformations and/or structural heterogeneity. Here, we present an approach to drive AF2 to sample alternative conformations of topologically diverse transporters and G-protein coupled receptors (GPCRs) that are absent from the AF2 training set. Whereas models generated using the default AF2 pipeline are conformationally homogeneous and nearly identical to one another, reducing the depth of the input multiple sequence alignments (MSAs) led to the generation of accurate models in multiple conformations. In our benchmark, these conformations were observed to span the range between two experimental structures of interest, suggesting that our protocol allows sampling of the conformational landscape at the energy minimum. Nevertheless, our results also highlight the need for the next generation of deep learning algorithms to be designed to predict ensembles of biophysically relevant states.

Download Full-text

Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments

Bioinformatics ◽

10.1093/bioinformatics/btv592 ◽

2015 ◽

Vol 32 (6) ◽

pp. 814-820 ◽

Cited By ~ 14

Author(s):

Gearóid Fox ◽

Fabian Sievers ◽

Desmond G. Higgins

Keyword(s):

Protein Structure ◽

Structure Prediction ◽

De Novo ◽

Biological Data ◽

Supplementary Information ◽

Test Case ◽

Sequence Alignments ◽

Progressive Alignment ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Abstract Motivation: Multiple sequence alignments (MSAs) with large numbers of sequences are now commonplace. However, current multiple alignment benchmarks are ill-suited for testing these types of alignments, as test cases either contain a very small number of sequences or are based purely on simulation rather than empirical data. Results: We take advantage of recent developments in protein structure prediction methods to create a benchmark (ContTest) for protein MSAs containing many thousands of sequences in each test case and which is based on empirical biological data. We rank popular MSA methods using this benchmark and verify a recent result showing that chained guide trees increase the accuracy of progressive alignment packages on datasets with thousands of proteins. Availability and implementation: Benchmark data and scripts are available for download at http://www.bioinf.ucd.ie/download/ContTest.tar.gz. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

EVfold.org: Evolutionary Couplings and Protein 3D Structure Prediction

10.1101/021022 ◽

2015 ◽

Cited By ~ 14

Author(s):

Robert Sheridan ◽

Robert J. Fieldhouse ◽

Sikander Hayat ◽

Yichao Sun ◽

Yevgeniy Antipin ◽

...

Keyword(s):

Protein Function ◽

Structure Prediction ◽

De Novo ◽

3D Structure ◽

Sequence Information ◽

Major Advance ◽

Sequence Alignments ◽

Multiple Sequence ◽

Genomic Databases ◽

Multiple Sequence Alignments

Recently developed maximum entropy methods infer evolutionary constraints on protein function and structure from the millions of protein sequences available in genomic databases. The EVfold web server (at EVfold.org) makes these methods available to predict functional and structural interactions in proteins. The key algorithmic development has been to disentangle direct and indirect residue-residue correlations in large multiple sequence alignments and derive direct residue-residue evolutionary couplings (EVcouplings or ECs). For proteins of unknown structure, distance constraints obtained from evolutionarily couplings between residue pairs are used to de novo predict all-atom 3D structures, often to good accuracy. Given sufficient sequence information in a protein family, this is a major advance toward solving the problem of computing the native 3D fold of proteins from sequence information alone. Availability: EVfold server at http://evfold.org/ Contact: [email protected]

Download Full-text

Tertiary structure prediction of the KIX domain of CBP using Monte Carlo simulations driven by restraints derived from multiple sequence alignments

Proteins Structure Function and Bioinformatics ◽

10.1002/(sici)1097-0134(19980215)30:3<287::aid-prot8>3.0.co;2-h ◽

1998 ◽

Vol 30 (3) ◽

pp. 287-294 ◽

Cited By ~ 13

Author(s):

Angel R. Ortiz ◽

Andrzej Kolinski ◽

Jeffrey Skolnick

Keyword(s):

Monte Carlo ◽

Monte Carlo Simulations ◽

Structure Prediction ◽

Tertiary Structure ◽

Sequence Alignments ◽

Tertiary Structure Prediction ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Kix Domain

Download Full-text

QuanTest2: benchmarking multiple sequence alignments using secondary structure prediction

Bioinformatics ◽

10.1093/bioinformatics/btz552 ◽

2019 ◽

Cited By ~ 3

Author(s):

Fabian Sievers ◽

Desmond G Higgins

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Reference Sequence ◽

Supplementary Information ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Reference Sequences ◽

Selection Of

Abstract Motivation Secondary structure prediction accuracy (SSPA) in the QuanTest benchmark can be used to measure accuracy of a multiple sequence alignment. SSPA correlates well with the sum-of-pairs score, if the results are averaged over many alignments but not on an alignment-by-alignment basis. This is due to a sub-optimal selection of reference and non-reference sequences in QuanTest. Results We develop an improved strategy for selecting reference and non-reference sequences for a new benchmark, QuanTest2. In QuanTest2, SSPA and SP correlate better on an alignment-by-alignment basis than in QuanTest. Guide-trees for QuanTest2 are more balanced with respect to reference sequences than in QuanTest. QuanTest2 scores correlate well with other well-established benchmarks. Availability and implementation QuanTest2 is available at http://bioinf.ucd.ie/quantest2.tar, comprises of reference and non-reference sequence sets and a scoring script. Supplementary information Supplementary data are available at Bioinformatics online

Download Full-text

AttentiveDist: Protein Inter-Residue Distance Prediction Using Deep Learning with Attention on Quadruple Multiple Sequence Alignments

10.1101/2020.11.24.396770 ◽

2020 ◽

Author(s):

Aashish Jain ◽

Genki Terashi ◽

Yuki Kagaya ◽

Sai Raghavendra Maddhuri Venkata Subramaniya ◽

Charles Christoffer ◽

...

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Prediction Models ◽

3D Structure ◽

Evolutionary Information ◽

Sequence Alignments ◽

Multiple Sequence ◽

Contact Prediction ◽

Multiple Sequence Alignments ◽

Distance Prediction

ABSTRACTProtein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA’s feature at the inter-residue level, we added an attention layer to the deep neural network. The model is trained in a multi-task fashion to also predict backbone and orientation angles further improving the inter-residue distance prediction. We show that AttentiveDist outperforms the top methods for contact prediction in the CASP13 structure prediction competition. To aid in structure modeling we also developed two new deep learning-based sidechain center distance and peptide-bond nitrogen-oxygen distance prediction models. Together these led to a 12% increase in TM-score from the best server method in CASP13 for structure prediction.

Download Full-text

Computational Methods for Protein Secondary Structure Prediction Using Multiple Sequence Alignments

Current Protein and Peptide Science ◽

10.2174/1389203003381324 ◽

2000 ◽

Vol 1 (3) ◽

pp. 273-301 ◽

Cited By ~ 21

Author(s):

Jaap Heringa

Keyword(s):

Secondary Structure ◽

Computational Methods ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text

Faculty Opinions recommendation of QuanTest2: benchmarking multiple sequence alignments using secondary structure prediction.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.736183723.793577501 ◽

2020 ◽

Author(s):

Janusz Bujnicki ◽

Pritha Ghosh

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text

The influence of gapped positions in multiple sequence alignments on secondary structure prediction methods

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2004.09.005 ◽

2004 ◽

Vol 28 (5-6) ◽

pp. 351-366 ◽

Cited By ~ 13

Author(s):

V.A. Simossis ◽

J. Heringa

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Prediction Methods ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text

Analysis of the Effects of Multiple Sequence Alignments in Protein Secondary Structure Prediction

Advances in Bioinformatics and Computational Biology - Lecture Notes in Computer Science ◽

10.1007/11532323_14 ◽

2005 ◽

pp. 128-140

Author(s):

Georgios Joannis Pappas ◽

Shankar Subramaniam

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text

Remote homology search with hidden Potts models

10.1101/2020.06.23.168153 ◽

2020 ◽

Cited By ~ 3

Author(s):

Grey W. Wilburn ◽

Sean R. Eddy

Keyword(s):

Structure Prediction ◽

Statistical Physics ◽

Probability Model ◽

3D Structure ◽

Homology Search ◽

Sequence Alignments ◽

Multiple Sequence ◽

Potts Models ◽

Multiple Sequence Alignments ◽

Insertion And Deletion

AbstractMost methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.Author summaryComputational homology search and alignment tools are used to infer the functions and evolutionary histories of biological sequences. Most widely used tools for sequence homology searches, such as BLAST and HMMER, rely on primary sequence conservation alone. It should be possible to make more powerful search tools by also considering higher-order covariation patterns induced by 3D structure conservation. Recent advances in 3D protein structure prediction have used a class of statistical physics models called Potts models to infer pairwise correlation structure in multiple sequence alignments. However, Potts models assume alignments are given and cannot build new alignments, limiting their use in homology search. We have extended Potts models to include a probability model of insertion and deletion so they can be applied to sequence alignment and remote homology search using a new model we call a hidden Potts model (HPM). Tests of our prototype HPM software show promising results in initial benchmarking experiments, though more work will be needed to use HPMs in practical tools.

Download Full-text