Large Scale Discovery of Microbial Fibrillar Adhesins and Identification of Novel Adhesive Domain Families
Fibrillar adhesins are bacterial cell surface proteins that mediate interactions with host cells during colonisation and with other bacteria during biofilm formation. These proteins are characterised by a stalk that projects the adhesive domain closer to the binding target. Fibrillar adhesins evolve quickly and thus can be difficult to computationally identify, yet they represent an important component for understanding bacterial host interactions. To detect novel fibrillar adhesins we developed a random forest prediction approach based on common characteristics we identified for this protein class. We applied this approach to Firmicute and Actinobacterial proteomes, yielding over 4,000 confidently predicted fibrillar adhesins. To verify the approach we investigated predicted fibrillar adhesins that lacked a known adhesive domain. Based on these proteins, we identified 21 sequence clusters representing potential novel adhesive domains. We used AlphaFold to verify that 14 clusters showed structural similarity to known adhesive domains such as the TED domain. Overall our study has made a significant contribution to the number of known fibrillar adhesins and has enabled us to identify novel adhesive domain families involved in the bacterial pathogenesis.