CSBPI_Site:Multi-Information Sources of Features to RNA Binding Sites Prediction
Background: RNA-binding proteins establish posttranscriptional gene regulation by coordinating the maturation, editing, transport, stability, and translation of cellular RNAs. The immunoprecipitation experiments could identify interaction between RNA and proteins, but they are limited due to the experimental environment and material. Therefore, it is essential to construct computational models to identify the function sites. Objective: Although some computational methods have been proposed to predict RNA binding sites, the accuracy could be further improved. Moreover, it is necessary to construct a dataset with more samples to design a reliable model. Here we present a computational model based on multi-information sources to identify RNA binding sites. Method: We construct an accurate computational model named CSBPI_Site, based on xtreme gradient boosting. The specifically designed 15-dimensional feature vector captures four types of information (chemical shift, chemical bond, chemical properties and position information). Results: The satisfied accuracy of 0.86 and AUC of 0.89 were obtained by leave-one-out cross validation. Meanwhile, the accuracies were slightly different (range from 0.83 to 0.85) among three classifiers algorithm, which showed the novel features are stable and fit to multiple classifiers. These results showed that the proposed method is effective and robust for noncoding RNA binding sites identification. Conclusion: Our method based on multi-information sources is effective to represent the binding sites information among ncRNAs. The satisfied prediction results of Diels-Alder riboz-yme based on CSBPI_Site indicates that our model is valuable to identify the function site.