Background:
Bioluminescence is a unique and significant phenomenon in nature. Bioluminescence
is important for the lifecycle of some organisms and is valuable in biomedical research, including for
gene expression analysis and bioluminescence imaging technology.In recent years, researchers have
identified a number of methods for predicting bioluminescent proteins (BLPs), which have increased
in accuracy, but could be further improved.
Method:
In this paper, we propose a new bioluminescent proteins prediction method based on a
voting algorithm. We used four methods of feature extraction based on the amino acid sequence. We
extracted 314 dimensional features in total from amino acid composition, physicochemical properties
and k-spacer amino acid pair composition. In order to obtain the highest MCC value to establish the
optimal prediction model, then used a voting algorithm to build the model.To create the best
performing model, we discuss the selection of base classifiers and vote counting rules.
Results:
Our proposed model achieved 93.4% accuracy, 93.4% sensitivity and 91.7% specificity in
the test set, which was better than any other method. We also improved a previous prediction of
bioluminescent proteins in three lineages using our model building method, resulting in greatly
improved accuracy.