Predicting direct physical interactions in multimeric proteins with deep learning
Accurate descriptions of protein-protein interactions are essential for understanding biological systems. Very recently, AlphaFold2 has been shown to be remarkably accurate for predicting the atomic structures of individual proteins. Here, we demonstrate that the same neural network models developed for AlphaFold2 can be adapted to predict the structures of multimeric protein complexes without retraining. In contrast to common approaches that require paired multiple sequence alignments, our method, AF2Complex, works without using such paired alignments. It achieves higher accuracy than complex strategies that combine AlphaFold2 and protein-protein docking. New metrics are then introduced for predicting direct protein-protein interactions between arbitrary protein pairs. The approach is successfully validated on some challenging CASP14 multimeric targets, a small but appropriate benchmark set, and the E. coli proteome. Lastly, using the cytochrome c biogenesis system as an example, we present high-confidence models of three sought-after assemblies formed by eight members of this system.