AllerCatPro – Prediction of protein allergenicity potential from the protein sequence

Background: N-Glycosylation is one of the most important post-translational mechanisms in eukaryotes. N-glycosylation predominantly occurs in N-X-[S/T] sequon where X is any amino acid other than proline. However, not all N-X-[S/T] sequons in proteins are glycosylated. Therefore, accurate prediction of N-glycosylation sites is essential to understand Nglycosylation mechanism. Objective: In this article, our motivation is to develop a computational method to predict Nglycosylation sites in eukaryotic protein sequences. Methods: In this article, we report a random forest method, Nglyc, to predict N-glycosylation site from protein sequence, using 315 sequence features. The method was trained using a dataset of 600 N-glycosylation sites and 600 non-glycosylation sites and tested on the dataset containing 295 Nglycosylation sites and 253 non-glycosylation sites. Nglyc prediction was compared with NetNGlyc, EnsembleGly and GPP methods. Further, the performance of Nglyc was evaluated using human and mouse N-glycosylation sites. Results: Nglyc method achieved an overall training accuracy of 0.8033 with all 315 features. Performance comparison with NetNGlyc, EnsembleGly and GPP methods shows that Nglyc performs better than the other methods with high sensitivity and specificity rate. Conclusion: Our method achieved an overall accuracy of 0.8248 with 0.8305 sensitivity and 0.8182 specificity. Comparison study shows that our method performs better than the other methods. Applicability and success of our method was further evaluated using human and mouse N-glycosylation sites. Nglyc method is freely available at https://github.com/bioinformaticsML/ Ngly.

Download Full-text

Prediction of Nitration Sites Based on FCBF Method and Stacking Ensemble Model

Current Proteomics ◽

10.2174/1570164618999210101222637 ◽

2021 ◽

Vol 18 ◽

Author(s):

Min Liu ◽

Lu Zhang ◽

Xinyi Qin ◽

Tao Huang ◽

Ziwei Xu ◽

...

Keyword(s):

Prediction Model ◽

Protein Sequence ◽

Cross Validation ◽

Transplant Rejection ◽

Sampling Technique ◽

Single Type ◽

Post Translational Modification ◽

Tyrosine Residues ◽

Under Sampling ◽

Fusion Features

Background: Nitration is one of the important Post-Translational Modification (PTM) occurring on the tyrosine residues of proteins. The occurrence of protein tyrosine nitration under disease conditions is inevitable and represents a shift from the signal transducing physiological actions of -NO to oxidative and potentially pathogenic pathways. Abnormal protein nitration modification can lead to serious human diseases, including neurodegenerative diseases, acute respiratory distress, organ transplant rejection and lung cancer. Objective: It is necessary and important to identify the nitration sites in protein sequences. Predicting that which tyrosine residues in the protein sequence are nitrated and which are not is of great significance for the study of nitration mechanism and related diseases. Methods: In this study, a prediction model of nitration sites based on the over-under sampling strategy and the FCBF method was proposed by stacking ensemble learning and fusing multiple features. Firstly, the protein sequence sample was encoded by 2701-dimensional fusion features (PseAAC, PSSM, AAIndex, CKSAAP, Disorder). Secondly, the ranked feature set was generated by the FCBF method according to the symmetric uncertainty metric. Thirdly, in the process of model training, use the over- and under- sampling technique was used to tackle the imbalanced dataset. Finally, the Incremental Feature Selection (IFS) method was adopted to extract an optimal classifier based on 10-fold cross-validation. Results and Conclusion: Results show that the model has significant performance advantages in indicators such as MCC, Recall and F1-score, no matter in what way the comparison was conducted with other classifiers on the independent test set, or made by cross-validation with single-type feature or with fusion-features on the training set. By integrating the FCBF feature ranking methods, over- and under- sampling technique and a stacking model composed of multiple base classifiers, an effective prediction model for nitration PTM sites was build, which can achieve a better recall rate when the ratio of positive and negative samples is highly imbalanced.

Download Full-text

Protein Structural Class Prediction Based on Distance-related Statistical Features from Graphical Representation of Predicted Secondary Structure

Letters in Organic Chemistry ◽

10.2174/1570178615666180914110451 ◽

2019 ◽

Vol 16 (4) ◽

pp. 317-324

Author(s):

Liang Kong ◽

Lichao Zhang ◽

Xiaodong Han ◽

Jinfeng Lv

Keyword(s):

Feature Extraction ◽

Secondary Structure ◽

Protein Sequence ◽

Function Analysis ◽

Superior Performance ◽

Support Vector ◽

Chaos Game Representation ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class

Protein structural class prediction is beneficial to protein structure and function analysis. Exploring good feature representation is a key step for this prediction task. Prior works have demonstrated the effectiveness of the secondary structure based feature extraction methods especially for lowsimilarity protein sequences. However, the prediction accuracies still remain limited. To explore the potential of secondary structure information, a novel feature extraction method based on a generalized chaos game representation of predicted secondary structure is proposed. Each protein sequence is converted into a 20-dimensional distance-related statistical feature vector to characterize the distribution of secondary structure elements and segments. The feature vectors are then fed into a support vector machine classifier to predict the protein structural class. Our experiments on three widely used lowsimilarity benchmark datasets (25PDB, 1189 and 640) show that the proposed method achieves superior performance to the state-of-the-art methods. It is anticipated that our method could be extended to other graphical representations of protein sequence and be helpful in future protein research.

Download Full-text

Biological activities of a synthetic peptide composed of two unlinked domains from a retroviral transmembrane protein sequence.

Journal of Virology ◽

10.1128/jvi.64.4.1429-1436.1990 ◽

1990 ◽

Vol 64 (4) ◽

pp. 1429-1436 ◽

Cited By ~ 9

Author(s):

D E Wegemer ◽

K G Kabat ◽

W S Kloetzer

Keyword(s):

Protein Sequence ◽

Synthetic Peptide ◽

Biological Activities ◽

Transmembrane Protein

Download Full-text