Structure-aware Protein Solubility Prediction From Sequence Through Graph Convolutional Network And Predicted Contact Map
AbstractMotivationProtein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are mostly based on the one-dimensional embedding of amino acids that is limited to catch spatially structural information.ResultsIn this study, we have developed a new structure-aware method to predict protein solubility by attentive graph convolutional network (GCN), where the protein topology attribute graph was constructed through predicted contact maps from the sequence. GraphSol was shown to substantially out-perform other sequence-based methods. The model was proven to be stable by consistent R2 of 0.48 in both the cross-validation and independent test of the eSOL dataset. To our best knowledge, this is the first study to utilize the GCN for sequence-based predictions. More importantly, this architecture could be extended to other protein prediction tasks.AvailabilityThe package is available at http://[email protected] informationSupplementary data are available at Bioinformatics online.