Enhancement of conformational B-cell epitope prediction using CluSMOTE

PeerJ Computer Science ◽

10.7717/peerj-cs.275 ◽

2020 ◽

Vol 6 ◽

pp. e275

Author(s):

Binti Solihah ◽

Azhari Azhari ◽

Aina Musdholifah

Keyword(s):

Support Vector Machine ◽

Decision Tree ◽

B Cell ◽

Prediction Models ◽

Cell Epitope ◽

Epitope Prediction ◽

Conformational Epitope ◽

Support Vector ◽

B Cell Epitope ◽

General Protein

Background A conformational B-cell epitope is one of the main components of vaccine design. It contains separate segments in its sequence, which are spatially close in the antigen chain. The availability of Ag-Ab complex data on the Protein Data Bank allows for the development predictive methods. Several epitope prediction models also have been developed, including learning-based methods. However, the performance of the model is still not optimum. The main problem in learning-based prediction models is class imbalance. Methods This study proposes CluSMOTE, which is a combination of a cluster-based undersampling method and Synthetic Minority Oversampling Technique. The approach is used to generate other sample data to ensure that the dataset of the conformational epitope is balanced. The Hierarchical DBSCAN algorithm is performed to identify the cluster in the majority class. Some of the randomly selected data is taken from each cluster, considering the oversampling degree, and combined with the minority class data. The balance data is utilized as the training dataset to develop a conformational epitope prediction. Furthermore, two binary classification methods, Support Vector Machine and Decision Tree, are separately used to develop model prediction and to evaluate the performance of CluSMOTE in predicting conformational B-cell epitope. The experiment is focused on determining the best parameter for optimal CluSMOTE. Two independent datasets are used to compare the proposed prediction model with state of the art methods. The first and the second datasets represent the general protein and the glycoprotein antigens respectively. Result The experimental result shows that CluSMOTE Decision Tree outperformed the Support Vector Machine in terms of AUC and Gmean as performance measurements. The mean AUC of CluSMOTE Decision Tree in the Kringelum and the SEPPA 3 test sets are 0.83 and 0.766, respectively. This shows that CluSMOTE Decision Tree is better than other methods in the general protein antigen, though comparable with SEPPA 3 in the glycoprotein antigen.

Download Full-text

Data curation to improve the pattern recognition performance of B-cell epitope prediction by support vector machine

Pure and Applied Chemistry ◽

10.1515/pac-2020-1107 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Li Cen Lim ◽

Yee Ying Lim ◽

Yee Siew Choong

Keyword(s):

Pattern Recognition ◽

Support Vector Machine ◽

B Cell ◽

Recognition Performance ◽

Cell Epitope ◽

Epitope Prediction ◽

Support Vector ◽

B Cell Epitopes ◽

Prediction Module ◽

B Cell Epitope

Abstract B-cell epitope will be recognized and attached to the surface of receptors in B-lymphocytes to trigger immune response, thus are the vital elements in the field of epitope-based vaccine design, antibody production and therapeutic development. However, the experimental approaches in mapping epitopes are time consuming and costly. Computational prediction could offer an unbiased preliminary selection to reduce the number of epitopes for experimental validation. The deposited B-cell epitopes in the databases are those with experimentally determined positive/negative peptides and some are ambiguous resulted from different experimental methods. Prior to the development of B-cell epitope prediction module, the available dataset need to be handled with care. In this work, we first pre-processed the B-cell epitope dataset prior to B-cell epitopes prediction based on pattern recognition using support vector machine (SVM). By using only the absolute epitopes and non-epitopes, the datasets were classified into five categories of pathogen and worked on the 6-mers peptide sequences. The pre-processing of the datasets have improved the B-cell epitope prediction performance up to 99.1 % accuracy and showed significant improvement in cross validation results. It could be useful when incorporated with physicochemical propensity ranking in the future for the development of B-cell epitope prediction module.

Download Full-text

Poster: Linear B-cell epitope prediction based on Support Vector Machine and propensity scales

2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences (ICCABS) ◽

10.1109/iccabs.2011.5729918 ◽

2011 ◽

Author(s):

Hsin-Wei Wang ◽

Ya-Chi Lin ◽

Tun-Wen Pai ◽

Hao-Teng Chang

Keyword(s):

Support Vector Machine ◽

B Cell ◽

Cell Epitope ◽

Epitope Prediction ◽

Support Vector ◽

B Cell Epitope ◽

Linear B

Download Full-text

The Empirical Comparison of Machine Learning Algorithm for the Class Imbalanced Problem in Conformational Epitope Prediction

JUITA Jurnal Informatika ◽

10.30595/juita.v9i1.9969 ◽

2021 ◽

Vol 9 (1) ◽

pp. 131

Author(s):

Binti Solihah ◽

Azhari Azhari ◽

Aina Musdholifah

Keyword(s):

Decision Tree ◽

B Cell ◽

Sampling Method ◽

Prediction Models ◽

Learning Algorithm ◽

Class Imbalance ◽

Epitope Prediction ◽

Conformational Epitope ◽

Ensemble Model ◽

Under Sampling

A conformational epitope is a part of a protein-based vaccine. It is challenging to identify using an experiment. A computational model is developed to support identification. However, the imbalance class is one of the constraints to achieving optimal performance on the conformational epitope B cell prediction. In this paper, we compare several conformational epitope B cell prediction models from non-ensemble and ensemble approaches. A sampling method from Random undersampling, SMOTE, and cluster-based undersampling is combined with a decision tree or SVM to build a non-ensemble model. A random forest model and several variants of the bagging method is used to construct the ensemble model. A 10-fold cross-validation method is used to validate the model. The experiment results show that the combination of the cluster-based under-sampling and decision tree outperformed the other sampling method when combined with the non-ensemble and the ensemble method. This study provides a baseline to improve existing models for dealing with the class imbalance in the conformational epitope prediction.

Download Full-text

PREDICTION OF B-CELL DISCONTINUOUS EPITOPES ON MATRIX PROTEIN OF H5N1 VIRUS

Science and Technology Development Journal ◽

10.32508/stdj.v12i9.2283 ◽

2009 ◽

Vol 12 (9) ◽

pp. 31-37

Author(s):

Vinh Ngoc Tran ◽

Quy Cam Vo ◽

Thuoc Linh Tran

Keyword(s):

B Cell ◽

H5n1 Virus ◽

Matrix Protein ◽

Cell Epitope ◽

Epitope Prediction ◽

Conformational Epitope ◽

Viral Antigens ◽

B Cell Epitope ◽

Electrically Charged ◽

Discontinuous Epitopes

Although discontinuous epitopes make up 90% of total number of B-cell epitopes, however, because of difficulties in the development of method for their prediction, most of the B-cell epitope prediction methods today focus on continuous epitopes. To serve for the development of vaccine against H5N1 virus, we have been studying on in silico prediction of T- and B-cell continuous as well as B-cell discontinuous epitopes on H5N1 viral antigens. In this study, using the homology modeling method, we have generated structures of matrix protein of the H5N1 virus and predicted B-cell discontinuous epitopes. 60 out of 72 predicted residues were similar with those reported by the CEP method (Conformational Epitope Prediction). All predicted aminoacid residues were hydrophilic, polar, electrically charged and located on the surface of the antigen structures.

Download Full-text