Prediction of apoptosis protein subcellular location based on position-specific scoring matrix and isometric mapping algorithm

Background: Apoptosis proteins have a key role in the development and the homeostasis of the organism, and are very important to understand the mechanism of cell proliferation and death. The function of apoptosis protein is closely related to its subcellular location. Objective: Prediction of apoptosis protein subcellular localization is a meaningful task. Methods: In this study, we predict the apoptosis protein subcellular location by using the PSSMbased second-order moving average descriptor, nonnegative matrix factorization based on Kullback-Leibler divergence and over-sampling algorithms. This model is named by SOMAPKLNMF- OS and constructed on the ZD98, ZW225 and CL317 benchmark datasets. Then, the support vector machine is adopted as the classifier, and the bias-free jackknife test method is used to evaluate the accuracy. Results: Our prediction system achieves the favorable and promising performance of the overall accuracy on the three datasets and also outperforms the other listed models. Conclusion: The results show that our model offers a high throughput tool for the identification of apoptosis protein subcellular localization.

Download Full-text

iAPSL-IF: Identification of Apoptosis Protein Subcellular Location Using Integrative Features Captured from Amino Acid Sequences

International Journal of Molecular Sciences ◽

10.3390/ijms19041190 ◽

2018 ◽

Vol 19 (4) ◽

pp. 1190 ◽

Cited By ~ 1

Author(s):

Yadong Tang ◽

Lu Xie ◽

Lanming Chen

Keyword(s):

Amino Acid ◽

Subcellular Location ◽

Amino Acid Sequences ◽

Protein Subcellular Location ◽

Apoptosis Protein

Download Full-text

Predicting Apoptosis Protein Subcellular Location with PseAAC by Incorporating Tripeptide Composition

Protein and Peptide Letters ◽

10.2174/092986611797200931 ◽

2011 ◽

Vol 18 (11) ◽

pp. 1086-1092 ◽

Cited By ~ 20

Author(s):

Bo Liao ◽

Jun-Bao Jiang ◽

Qing-Guang Zeng ◽

Wen Zhu

Keyword(s):

Subcellular Location ◽

Protein Subcellular Location ◽

Apoptosis Protein

Download Full-text

Predicting Apoptosis Protein Subcellular Locations based on the Protein Overlapping Property Matrix and Tri-Gram Encoding

International Journal of Molecular Sciences ◽

10.3390/ijms20092344 ◽

2019 ◽

Vol 20 (9) ◽

pp. 2344

Author(s):

Yang Yang ◽

Huiwen Zheng ◽

Chunhua Wang ◽

Wanyue Xiao ◽

Taigang Liu

Keyword(s):

Support Vector Machine ◽

Subcellular Location ◽

Recursive Feature Elimination ◽

Support Vector ◽

Svm Classifier ◽

Protein Subcellular Location ◽

Promising Tool ◽

Apoptosis Protein ◽

Benchmark Datasets ◽

Apoptosis Proteins

To reveal the working pattern of programmed cell death, knowledge of the subcellular location of apoptosis proteins is essential. Besides the costly and time-consuming method of experimental determination, research into computational locating schemes, focusing mainly on the innovation of representation techniques on protein sequences and the selection of classification algorithms, has become popular in recent decades. In this study, a novel tri-gram encoding model is proposed, which is based on using the protein overlapping property matrix (POPM) for predicting apoptosis protein subcellular location. Next, a 1000-dimensional feature vector is built to represent a protein. Finally, with the help of support vector machine-recursive feature elimination (SVM-RFE), we select the optimal features and put them into a support vector machine (SVM) classifier for predictions. The results of jackknife tests on two benchmark datasets demonstrate that our proposed method can achieve satisfactory prediction performance level with less computing capacity required and could work as a promising tool to predict the subcellular locations of apoptosis proteins.

Download Full-text

Predictions of Apoptosis Proteins by Integrating Different Features Based on Improving Pseudo-Position-Specific Scoring Matrix

BioMed Research International ◽

10.1155/2020/4071508 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Xiaoli Ruan ◽

Dongming Zhou ◽

Rencan Nie ◽

Yanbu Guo

Keyword(s):

Prediction Accuracy ◽

Classification Model ◽

Position Specific Scoring Matrix ◽

Support Vector ◽

Data Imbalance ◽

Apoptosis Protein ◽

Scoring Matrix ◽

The Impact ◽

Apoptosis Proteins

Apoptosis proteins are strongly related to many diseases and play an indispensable role in maintaining the dynamic balance between cell death and division in vivo. Obtaining localization information on apoptosis proteins is necessary in understanding their function. To date, few researchers have focused on the problem of apoptosis data imbalance before classification, while this data imbalance is prone to misclassification. Therefore, in this work, we introduce a method to resolve this problem and to enhance prediction accuracy. Firstly, the features of the protein sequence are captured by combining Improving Pseudo-Position-Specific Scoring Matrix (IM-Psepssm) with the Bidirectional Correlation Coefficient (Bid-CC) algorithm from position-specific scoring matrix. Secondly, different features of fusion and resampling strategies are used to reduce the impact of imbalance on apoptosis protein datasets. Finally, the eigenvector adopts the Support Vector Machine (SVM) to the training classification model, and the prediction accuracy is evaluated by jackknife cross-validation tests. The experimental results indicate that, under the same feature vector, adopting resampling methods remarkably boosts many significant indicators in the unsampling method for predicting the localization of apoptosis proteins in the ZD98, ZW225, and CL317 databases. Additionally, we also present new user-friendly local software for readers to apply; the codes and software can be freely accessed at https://github.com/ruanxiaoli/Im-Psepssm.

Download Full-text

Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition

Journal of Theoretical Biology ◽

10.1016/j.jtbi.2007.05.019 ◽

2007 ◽

Vol 248 (2) ◽

pp. 377-381 ◽

Cited By ~ 131

Author(s):

Ying-Li Chen ◽

Qian-Zhong Li

Keyword(s):

Amino Acid ◽

Amino Acid Composition ◽

Acid Composition ◽

Hybrid Approach ◽

Subcellular Location ◽

Pseudo Amino Acid Composition ◽

Protein Subcellular Location ◽

Apoptosis Protein

Download Full-text

Predicting subcellular location of protein with evolution information and sequence-based deep learning

BMC Bioinformatics ◽

10.1186/s12859-021-04404-0 ◽

2021 ◽

Vol 22 (S10) ◽

Author(s):

Zhijun Liao ◽

Gaofeng Pan ◽

Chao Sun ◽

Jijun Tang

Keyword(s):

Deep Learning ◽

Short Term Memory ◽

Protein Sequences ◽

Subcellular Location ◽

Protein Subcellular Location ◽

Benchmark Datasets ◽

Memory Network ◽

Scoring Matrix ◽

Protein Subcellular Locations ◽

Protein Subcellular Localization Prediction

Abstract Background Protein subcellular localization prediction plays an important role in biology research. Since traditional methods are laborious and time-consuming, many machine learning-based prediction methods have been proposed. However, most of the proposed methods ignore the evolution information of proteins. In order to improve the prediction accuracy, we present a deep learning-based method to predict protein subcellular locations. Results Our method utilizes not only amino acid compositions sequence but also evolution matrices of proteins. Our method uses a bidirectional long short-term memory network that processes the entire protein sequence and a convolutional neural network that extracts features from protein sequences. The position specific scoring matrix is used as a supplement to protein sequences. Our method was trained and tested on two benchmark datasets. The experiment results show that our method yields accurate results on the two datasets with an average precision of 0.7901, ranking loss of 0.0758 and coverage of 1.2848. Conclusion The experiment results show that our method outperforms five methods currently available. According to those experiments, we can see that our method is an acceptable alternative to predict protein subcellular location.

Download Full-text

Advances in the Prediction of Protein Subcellular Locations with Machine Learning

Current Bioinformatics ◽

10.2174/1574893614666181217145156 ◽

2019 ◽

Vol 14 (5) ◽

pp. 406-421 ◽

Cited By ~ 3

Author(s):

Ting-He Zhang ◽

Shao-Wu Zhang

Keyword(s):

Machine Learning ◽

Feature Fusion ◽

Protein Sequences ◽

Subcellular Location ◽

Automated Analysis ◽

Cellular Level ◽

Machine Learning Algorithms ◽

Feature Representation ◽

Protein Subcellular Location ◽

Protein Subcellular Locations

Background: Revealing the subcellular location of a newly discovered protein can bring insight into their function and guide research at the cellular level. The experimental methods currently used to identify the protein subcellular locations are both time-consuming and expensive. Thus, it is highly desired to develop computational methods for efficiently and effectively identifying the protein subcellular locations. Especially, the rapidly increasing number of protein sequences entering the genome databases has called for the development of automated analysis methods. Methods: In this review, we will describe the recent advances in predicting the protein subcellular locations with machine learning from the following aspects: i) Protein subcellular location benchmark dataset construction, ii) Protein feature representation and feature descriptors, iii) Common machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web servers. Result & Conclusion: Concomitant with a large number of protein sequences generated by highthroughput technologies, four future directions for predicting protein subcellular locations with machine learning should be paid attention. One direction is the selection of novel and effective features (e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins. Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth one is the protein multiple location sites prediction.

Download Full-text

The Conservation of Low Complexity Regions in Bacterial Proteins Depends on the Pathogenicity of the Strain and Subcellular Location of the Protein

Genes ◽

10.3390/genes12030451 ◽

2021 ◽

Vol 12 (3) ◽

pp. 451

Author(s):

Pablo Mier ◽

Miguel A. Andrade-Navarro

Keyword(s):

Membrane Proteins ◽

Outer Membrane ◽

Bacterial Species ◽

Outer Membrane Proteins ◽

Subcellular Location ◽

Low Complexity ◽

Extracellular Proteins ◽

Bacterial Strains ◽

Bacterial Proteins ◽

Protein Subcellular Location

Low complexity regions (LCRs) in proteins are characterized by amino acid frequencies that differ from the average. These regions evolve faster and tend to be less conserved between homologs than globular domains. They are not common in bacteria, as compared to their prevalence in eukaryotes. Studying their conservation could help provide hypotheses about their function. To obtain the appropriate evolutionary focus for this rapidly evolving feature, here we study the conservation of LCRs in bacterial strains and compare their high variability to the closeness of the strains. For this, we selected 20 taxonomically diverse bacterial species and obtained the completely sequenced proteomes of two strains per species. We calculated all orthologous pairs for each of the 20 strain pairs. Per orthologous pair, we computed the conservation of two types of LCRs: compositionally biased regions (CBRs) and homorepeats (polyX). Our results show that, in bacteria, Q-rich CBRs are the most conserved, while A-rich CBRs and polyA are the most variable. LCRs have generally higher conservation when comparing pathogenic strains. However, this result depends on protein subcellular location: LCRs accumulate in extracellular and outer membrane proteins, with conservation increased in the extracellular proteins of pathogens, and decreased for polyX in the outer membrane proteins of pathogens. We conclude that these dependencies support the functional importance of LCRs in host–pathogen interactions.

Download Full-text