G-domain prediction across the diversity of G protein families
Abstract Guanine nucleotide binding proteins are characterized by a structurally and mechanistically conserved GTP-binding domain (G domain), indispensable for binding GTP. The G domain comprises five adjacent consensus motifs called G boxes, which are separated by amino acid spacers of different lengths. Several G proteins, discovered over time, are characterized by diverse function and sequence. This sequence diversity is also observed in the G box motifs (specifically the G5 box) as well as the inter-G box spacer length. The Spacers and Mismatch Algorithm (SMA) introduced in this study can predict G-domains in a given protein sequence, based on user-specified constraints for approximate G-box patterns and inter-box gaps in each G protein family. The SMA parameters can be customized as more G proteins are discovered and characterized structurally. Family-specific G box motifs including the less characterized G5 box were predicted with higher accuracy. Overall, our analysis suggests the possible classification of G protein families based on family-specific G box sequences and lengths of inter-G box spacers. SMA can be implemented via a web-based server at https://labs.iitgn.ac.in/datascience/gboxes/