Identifying protein subcellular location with embedding features learned from networks

2020 ◽  
Vol 17 ◽  
Author(s):  
Hongwei Liu ◽  
Bin Hu ◽  
Lei Chen ◽  
Lin Lu

Background: Identification of protein subcellular location is an important problem because the subcellular location is highly related to protein function. It is fundamental to determine the locations with biology experiments. However, these experiments are of high costs and time-consuming. The alternative way to address such problem is to design effective computational methods. Objective: To date, several computational methods have been proposed in this regard. However, these methods mainly adopted the features derived from proteins themselves. On the other hand, with the development of network technique, several embedding algorithms have been proposed, which can encode nodes in the network into feature vectors. Such algorithms connected the network and traditional classification algorithms. Thus, they provided a new way to construct models for the prediction of protein subcellular location. Method: In this study, we analyzed features produced by three network embedding algorithms (DeepWalk, Node2vec and Mashup) that were applied on one or multiple protein networks. Obtained features were learned by one machine learning algorithm (support vector machine or random forest) to construct the model. The cross-validation method was adopted to evaluate all constructed models. Results: After evaluating models with the cross-validation method, embedding features yielded by Mashup on multiple networks were quite informative for predicting protein subcellular location. The model based on these features were superior to some classic models. Conclusion: Embedding features yielded by a proper and powerful network embedding algorithm were effective for building the model for prediction of protein subcellular location, providing new pipelines to build more efficient models.

2015 ◽  
Vol 9 (1) ◽  
pp. 107-114
Author(s):  
Zhou Shengquan ◽  
Zhao Xiaolong ◽  
Yao Zhaoming

In order to forecast the displacement of deep foundation pit support, this document proposes a new method which combines the cross validation method and supports vector machine (SVM) based on random small samples. Because the random small monitoring data are difficult to fit and forecast, the cross validation method and different kernel function of support vector machine algorithm arerepeatedly used to establish and optimize the displacement prediction model of underground continuous wall, and then uses validation samples to test the accuracy of the models. The results show that this method can meet the requirements of precision relatively well, and Cauchy kernel function is better than the other. In the aspect of accuracy of model fitting and prediction, this method has great advantages, which can be applied to practical engineering.


2015 ◽  
Vol 9 (1) ◽  
pp. 53-60
Author(s):  
Zhou Shengquan ◽  
Zhao Xiaolong ◽  
Yao Zhaoming

In order to forecast the displacement of deep foundation pit support, this document proposes a new method which combines the cross validation method and supports vector machine (SVM) based on random small samples.Because the random small monitoring data are difficult to fit and forecast, the cross validation method and different kernel function of support vector machine algorithm arerepeatedly used to establish and optimize the displacement prediction model of underground continuous wall, and then uses validation samples to test the accuracy of the models. The results show that this method can meet the requirements of precision relatively well, and Cauchy kernel function is better than the other. In the aspect of accuracy of model fitting and prediction, this method has great advantages, which can be applied to practical engineering.


2020 ◽  
Vol 2020 ◽  
pp. 1-8 ◽  
Author(s):  
Feng-Min Li ◽  
Xiao-Wei Gao

There are a lot of bacteria in the environment, and Gram-positive bacteria are the most common ones. Some Gram-positive bacteria are very harmful to the human body, so it is significant to predict Gram-positive bacterial protein subcellular location. And identification of Gram-positive bacterial protein subcellular location is important for developing effective drugs. In this paper, a new Gram-positive bacterial protein subcellular location dataset was established. The amino acid composition, the gene ontology annotation information, the hydropathy dipeptide composition information, the amino acid dipeptide composition information, and the autocovariance average chemical shift information were selected as characteristic parameters, then these parameters were combined. The locations of Gram-positive bacterial proteins were predicted by the Support Vector Machine (SVM) algorithm, and the overall accuracy (OA) reached 86.1% under the Jackknife test. The overall accuracy (OA) in our predictive model was higher than those in existing methods. This improved method may be helpful for protein function prediction.


2020 ◽  
Vol 15 (6) ◽  
pp. 517-527
Author(s):  
Yunyun Liang ◽  
Shengli Zhang

Background: Apoptosis proteins have a key role in the development and the homeostasis of the organism, and are very important to understand the mechanism of cell proliferation and death. The function of apoptosis protein is closely related to its subcellular location. Objective: Prediction of apoptosis protein subcellular localization is a meaningful task. Methods: In this study, we predict the apoptosis protein subcellular location by using the PSSMbased second-order moving average descriptor, nonnegative matrix factorization based on Kullback-Leibler divergence and over-sampling algorithms. This model is named by SOMAPKLNMF- OS and constructed on the ZD98, ZW225 and CL317 benchmark datasets. Then, the support vector machine is adopted as the classifier, and the bias-free jackknife test method is used to evaluate the accuracy. Results: Our prediction system achieves the favorable and promising performance of the overall accuracy on the three datasets and also outperforms the other listed models. Conclusion: The results show that our model offers a high throughput tool for the identification of apoptosis protein subcellular localization.


2019 ◽  
Vol 20 (9) ◽  
pp. 2344
Author(s):  
Yang Yang ◽  
Huiwen Zheng ◽  
Chunhua Wang ◽  
Wanyue Xiao ◽  
Taigang Liu

To reveal the working pattern of programmed cell death, knowledge of the subcellular location of apoptosis proteins is essential. Besides the costly and time-consuming method of experimental determination, research into computational locating schemes, focusing mainly on the innovation of representation techniques on protein sequences and the selection of classification algorithms, has become popular in recent decades. In this study, a novel tri-gram encoding model is proposed, which is based on using the protein overlapping property matrix (POPM) for predicting apoptosis protein subcellular location. Next, a 1000-dimensional feature vector is built to represent a protein. Finally, with the help of support vector machine-recursive feature elimination (SVM-RFE), we select the optimal features and put them into a support vector machine (SVM) classifier for predictions. The results of jackknife tests on two benchmark datasets demonstrate that our proposed method can achieve satisfactory prediction performance level with less computing capacity required and could work as a promising tool to predict the subcellular locations of apoptosis proteins.


2019 ◽  
Vol 16 (5) ◽  
pp. 383-391 ◽  
Author(s):  
Hao Cui ◽  
Lei Chen

Background: Identification of Enzyme Commission (EC) number of enzymes is quite important for understanding the metabolic processes that produce enough energy to sustain life. Previous studies mainly focused on predicting six main functional classes or sub-functional classes, i.e., the first two digits of the EC number. Objective: In this study, a binary classifier was proposed to identify the full EC number (four digits) of enzymes. Methods: Enzymes and their known EC numbers were paired as positive samples and negative samples were randomly produced that were as many as positive samples. The associations between any two samples were evaluated by integrating the linkages between enzymes and EC numbers. The classic machining learning algorithm, Support Vector Machine (SVM), was adopted as the prediction engine. Results: The five-fold cross-validation test on five datasets indicated that the overall accuracy, Matthews correlation coefficient and F1-measure were about 0.786, 0.576 and 0.771, respectively, suggesting the utility of the proposed classifier. In addition, the effectiveness of the classifier was elaborated by comparing it with other classifiers that were based on other classic machine learning algorithms. Conclusion: The proposed classifier was quite effective for prediction of EC number of enzymes and was specially designed for dealing with the problem addressed in this study by testing it on five datasets containing randomly produced samples.


Sign in / Sign up

Export Citation Format

Share Document