Aim:
Both bacterial infection and viral infection involve a large number of protein-protein
interactions (PPIs) between a pathogen and its target host.
Background:
So far, many computational
methods have focused on predicting PPIs within the same species rather than PPIs across different species.
Methods:
From the extensive analysis of PPIs between Yersinia pestis bacteria and humans, we recently discovered an interesting relation; a linear relation between amino acid composition and sequence length was observed in many proteins involved in PPIs. We have built a support vector machine
(SVM) model, which predicts PPIs between human and bacteria using two feature types derived from
the relation. The two feature types used in the SVM are the amino acid composition group (AACG) and
the difference in amino acid composition between host and pathogen proteins.
Result:
The SVM model
achieved high performance in predicting bacteria-human PPIs. The model showed an accuracy of 96%,
sensitivity of 94%, and specificity of 98% in predicting PPIs between humans and Yersinia pestis, in
which there is a strong relation between amino acid composition and sequence length. The SVM model
was also tested in predicting PPIs between human and viruses, which include Ebola, HCV, and SARSCoV-2, and showed a good performance.
Conclusion:
The feature types identified in our study are simple yet powerful in predicting pathogen-human PPIs. Although preliminary, our method will be useful
for finding unknown target host proteins or pathogen proteins and designing in vitro or in vivo experiments.