scholarly journals Data curation to improve the pattern recognition performance of B-cell epitope prediction by support vector machine

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Li Cen Lim ◽  
Yee Ying Lim ◽  
Yee Siew Choong

Abstract B-cell epitope will be recognized and attached to the surface of receptors in B-lymphocytes to trigger immune response, thus are the vital elements in the field of epitope-based vaccine design, antibody production and therapeutic development. However, the experimental approaches in mapping epitopes are time consuming and costly. Computational prediction could offer an unbiased preliminary selection to reduce the number of epitopes for experimental validation. The deposited B-cell epitopes in the databases are those with experimentally determined positive/negative peptides and some are ambiguous resulted from different experimental methods. Prior to the development of B-cell epitope prediction module, the available dataset need to be handled with care. In this work, we first pre-processed the B-cell epitope dataset prior to B-cell epitopes prediction based on pattern recognition using support vector machine (SVM). By using only the absolute epitopes and non-epitopes, the datasets were classified into five categories of pathogen and worked on the 6-mers peptide sequences. The pre-processing of the datasets have improved the B-cell epitope prediction performance up to 99.1 % accuracy and showed significant improvement in cross validation results. It could be useful when incorporated with physicochemical propensity ranking in the future for the development of B-cell epitope prediction module.

2020 ◽  
Vol 6 ◽  
pp. e275
Author(s):  
Binti Solihah ◽  
Azhari Azhari ◽  
Aina Musdholifah

Background A conformational B-cell epitope is one of the main components of vaccine design. It contains separate segments in its sequence, which are spatially close in the antigen chain. The availability of Ag-Ab complex data on the Protein Data Bank allows for the development predictive methods. Several epitope prediction models also have been developed, including learning-based methods. However, the performance of the model is still not optimum. The main problem in learning-based prediction models is class imbalance. Methods This study proposes CluSMOTE, which is a combination of a cluster-based undersampling method and Synthetic Minority Oversampling Technique. The approach is used to generate other sample data to ensure that the dataset of the conformational epitope is balanced. The Hierarchical DBSCAN algorithm is performed to identify the cluster in the majority class. Some of the randomly selected data is taken from each cluster, considering the oversampling degree, and combined with the minority class data. The balance data is utilized as the training dataset to develop a conformational epitope prediction. Furthermore, two binary classification methods, Support Vector Machine and Decision Tree, are separately used to develop model prediction and to evaluate the performance of CluSMOTE in predicting conformational B-cell epitope. The experiment is focused on determining the best parameter for optimal CluSMOTE. Two independent datasets are used to compare the proposed prediction model with state of the art methods. The first and the second datasets represent the general protein and the glycoprotein antigens respectively. Result The experimental result shows that CluSMOTE Decision Tree outperformed the Support Vector Machine in terms of AUC and Gmean as performance measurements. The mean AUC of CluSMOTE Decision Tree in the Kringelum and the SEPPA 3 test sets are 0.83 and 0.766, respectively. This shows that CluSMOTE Decision Tree is better than other methods in the general protein antigen, though comparable with SEPPA 3 in the glycoprotein antigen.


2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Lenka Potocnakova ◽  
Mangesh Bhide ◽  
Lucia Borszekova Pulzova

Identification of B-cell epitopes is a fundamental step for development of epitope-based vaccines, therapeutic antibodies, and diagnostic tools. Epitope-based antibodies are currently the most promising class of biopharmaceuticals. In the last decade, in-depth in silico analysis and categorization of the experimentally identified epitopes stimulated development of algorithms for epitope prediction. Recently, various in silico tools are employed in attempts to predict B-cell epitopes based on sequence and/or structural data. The main objective of epitope identification is to replace an antigen in the immunization, antibody production, and serodiagnosis. The accurate identification of B-cell epitopes still presents major challenges for immunologists. Advances in B-cell epitope mapping and computational prediction have yielded molecular insights into the process of biorecognition and formation of antigen-antibody complex, which may help to localize B-cell epitopes more precisely. In this paper, we have comprehensively reviewed state-of-the-art experimental methods for B-cell epitope identification, existing databases for epitopes, and novel in silico resources and prediction tools available online. We have also elaborated new trends in the antibody-based epitope prediction. The aim of this review is to assist researchers in identification of B-cell epitopes.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kanokporn Polyiam ◽  
Waranyoo Phoolcharoen ◽  
Namphueng Butkhot ◽  
Chanya Srisaowakarn ◽  
Arunee Thitithanyanont ◽  
...  

AbstractSARS-CoV-2 continues to infect an ever-expanding number of people, resulting in an increase in the number of deaths globally. With the emergence of new variants, there is a corresponding decrease in the currently available vaccine efficacy, highlighting the need for greater insights into the viral epitope profile for both vaccine design and assessment. In this study, three immunodominant linear B cell epitopes in the SARS-CoV-2 spike receptor-binding domain (RBD) were identified by immunoinformatics prediction, and confirmed by ELISA with sera from Macaca fascicularis vaccinated with a SARS-CoV-2 RBD subunit vaccine. Further immunoinformatics analyses of these three epitopes gave rise to a method of linear B cell epitope prediction and selection. B cell epitopes in the spike (S), membrane (M), and envelope (E) proteins were subsequently predicted and confirmed using convalescent sera from COVID-19 infected patients. Immunodominant epitopes were identified in three regions of the S2 domain, one region at the S1/S2 cleavage site and one region at the C-terminus of the M protein. Epitope mapping revealed that most of the amino acid changes found in variants of concern are located within B cell epitopes in the NTD, RBD, and S1/S2 cleavage site. This work provides insights into B cell epitopes of SARS-CoV-2 as well as immunoinformatics methods for B cell epitope prediction, which will improve and enhance SARS-CoV-2 vaccine development against emergent variants.


2010 ◽  
Vol 2010 ◽  
pp. 1-14 ◽  
Author(s):  
Salvador Eugenio C. Caoili

To better support the design of peptide-based vaccines, refinement of methods to predict B-cell epitopes necessitates meaningful benchmarking against empirical data on the cross-reactivity of polyclonal antipeptide antibodies with proteins, such that the positive data reflect functionally relevant cross-reactivity (which is consistent with antibody-mediated change in protein function) and the negative data reflect genuine absence of cross-reactivity (rather than apparent absence of cross-reactivity due to artifactual masking of B-cell epitopes in immunoassays). These data are heterogeneous in view of multiple factors that complicate B-cell epitope prediction, notably physicochemical factors that define key structural differences between immunizing peptides and their cognate proteins (e.g., unmatched electrical charges along the peptide-protein sequence alignments). If the data are partitioned with respect to these factors, iterative parallel benchmarking against the resulting subsets of data provides a basis for systematically identifying and addressing the limitations of methods for B-cell epitope prediction as applied to vaccine design.


2019 ◽  
Vol 14 (3) ◽  
pp. 226-233 ◽  
Author(s):  
Cangzhi Jia ◽  
Hongyan Gong ◽  
Yan Zhu ◽  
Yixia Shi

Background: B-cell epitope prediction is an essential tool for a variety of immunological studies. For identifying such epitopes, several computational predictors have been proposed in the past 10 years. Objective: In this review, we summarized the representative computational approaches developed for the identification of linear B-cell epitopes. </P><P> Methods: We mainly discuss the datasets, feature extraction methods and classification methods used in the previous work. Results: The performance of the existing methods was not very satisfying, and so more effective approaches should be proposed by considering the structural information of proteins. Conclusion: We consider existing challenges and future perspectives for developing reliable methods for predicting linear B-cell epitopes.


2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Pingping Sun ◽  
Haixu Ju ◽  
Zhenbang Liu ◽  
Qiao Ning ◽  
Jian Zhang ◽  
...  

Identification of epitopes which invoke strong humoral responses is an essential issue in the field of immunology. Localizing epitopes by experimental methods is expensive in terms of time, cost, and effort; therefore, computational methods feature for its low cost and high speed was employed to predict B-cell epitopes. In this paper, we review the recent advance of bioinformatics resources and tools in conformational B-cell epitope prediction, including databases, algorithms, web servers, and their applications in solving problems in related areas. To stimulate the development of better tools, some promising directions are also extensively discussed.


2014 ◽  
Vol 4 (3) ◽  
pp. 248
Author(s):  
Kevin Khoo Kit Keong ◽  
Theam Soon Lim ◽  
Gee Jun Tye ◽  
Yee Siew Choong

Author(s):  
Maximilian Collatz ◽  
Florian Mock ◽  
Emanuel Barth ◽  
Martin Hölzer ◽  
Konrad Sachse ◽  
...  

Abstract Motivation By binding to specific structures on antigenic proteins, the so-called epitopes, B-cell antibodies can neutralize pathogens. The identification of B-cell epitopes is of great value for the development of specific serodiagnostic assays and the optimization of medical therapy. However, identifying diagnostically or therapeutically relevant epitopes is a challenging task that usually involves extensive laboratory work. In this study, we show that the time, cost and labor-intensive process of epitope detection in the lab can be significantly reduced using in silico prediction. Results Here, we present EpiDope, a python tool which uses a deep neural network to detect linear B-cell epitope regions on individual protein sequences. With an area under the curve between 0.67 ± 0.07 in the receiver operating characteristic curve, EpiDope exceeds all other currently used linear B-cell epitope prediction tools. Our software is shown to reliably predict linear B-cell epitopes of a given protein sequence, thus contributing to a significant reduction of laboratory experiments and costs required for the conventional approach. Availabilityand implementation EpiDope is available on GitHub (http://github.com/mcollatz/EpiDope). Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Maximilian Collatz ◽  
Florian Mock ◽  
Martin Hölzer ◽  
Emanuel Barth ◽  
Konrad Sachse ◽  
...  

ABSTRACTBy binding to specific structures on antigenic proteins, the so-called epitopes, B-cell antibodies can neutralize pathogens. The identification of B-cell epitopes is of great value for the development of specific serodiagnostic assays and the optimization of medical therapy. However, identifying diagnostically or therapeutically relevant epitopes is a challenging task that usually involves extensive laboratory work. In this study, we show that the time, cost and labor-intensive process of epitope detection in the lab can be significantly shortened by using in silico prediction. Here we present EpiDope, a python tool which uses a deep neural network to detect B-cell epitope regions on individual protein sequences (github.com/mcollatz/EpiDope). With an area under the curve (AUC) between 0.67 ± 0.07 in the ROC curve, EpiDope exceeds all other currently used B-cell prediction tools. Moreover, for AUC10% (AUC for a false-positive rate < 0.1), EpiDope improves the prediction accuracy in comparison to other state-of-the-art methods. Our software is shown to reliably predict linear B-cell epitopes of a given protein sequence, thus contributing to a significant reduction of laboratory experiments and costs required for the conventional approach.


Sign in / Sign up

Export Citation Format

Share Document