Background:
Through the in-depth study of thermophilic protein heat resistance principle, it is of great significance for people to deeply understand the folding, structure and function, evolution of proteins and the directed design and modification of protein molecules in protein processing.
Objective:
Aiming at the problem of low accuracy and low efficiency of thermophilic protein prediction, a thermophilic protein prediction model based on Stacking method is proposed.
Method:
Based on the idea of Stacking, this paper uses five features extraction methods including amino acid composition, g-gap dipeptide, encoding based on grouped weight, entropy density, and autocorrelation coefficient to characterize protein sequences for the selected standard data set. Then use the SVM based on the Gaussian kernel function to design the classification prediction model, take the prediction results of the five methods as the second layer input, and use the logistic regression model to integrate the experimental results to build a thermophilic protein prediction model based on the Stacking method.
Results:
The accuracy of the proposed method is up to 93.75% when verified by the Jackknife method, and a number of performance evaluation indexes are higher than those of other models, and the overall performance is better than that of most of the reported methods.
Conclusion:
The model presented in this paper has strong robustness and can significantly improve the prediction performance of thermophilic proteins.