A Novel Ensemble Machine Learning Model for Prediction of Zika Virus T-Cell Epitopes

2021 ◽  
pp. 275-292
Author(s):  
Syed Nisar Hussain Bukhari ◽  
Amit Jain ◽  
Ehtishamul Haq
2021 ◽  
Vol 15 (8) ◽  
pp. 878-888
Author(s):  
Yang Liu ◽  
Xia-hui Ouyang ◽  
Zhi-Xiong Xiao ◽  
Le Zhang ◽  
Yang Cao

Background: T lymphocyte achieves an immune response by recognizing antigen peptides (also known as T cell epitopes) through major histocompatibility complex (MHC) molecules. The immunogenicity of T cell epitopes depends on their source and stability in combination with MHC molecules. The binding of the peptide to MHC is the most selective step, so predicting the binding affinity of the peptide to MHC is the principal step in predicting T cell epitopes. The identification of epitopes is of great significance in the research of vaccine design and T cell immune response. Objective: The traditional method for identifying epitopes is to synthesize and test the binding activity of peptide by experimental methods, which is not only time-consuming, but also expensive. In silico methods for predicting peptide-MHC binding emerge to pre-select candidate peptides for experimental testing, which greatly saves time and costs. By summarizing and analyzing these methods, we hope to have a better insight and provide guidance for future directions. Methods: Up to now, a number of methods have been developed to predict the binding ability of peptides to MHC based on various principles. Some of them employ matrix models or machine learning models based on the sequence characteristic embedded in peptides or MHC to predict the binding ability of peptides to MHC. Some others utilize the three-dimensional structural information of peptides or MHC, for example, by extracting three-dimensional structural information to construct a feature matrix or machine learning model, or directly using protein structure prediction, molecular docking to predict the binding mode of peptides and MHC. Results: Although the methods in predicting peptide-MHC binding based on the feature matrix or machine learning model can achieve high-throughput prediction, the accuracy of which depends heavily on the sequence characteristic of confirmed binding peptides. In addition, it cannot provide insights into the mechanism of antigen specificity. Therefore, such methods have certain limitations in practical applications. Methods in predicting peptide-MHC binding based on structural prediction or molecular docking are computationally intensive compared to the methods based on feature matrix or machine learning model and the challenge is how to predict a reliable structural model. Conclusion: This paper reviews the principles, advantages and disadvantages of the methods of peptide-MHC binding prediction and discussed the future directions to achieve more accurate predictions.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 145968-145983 ◽  
Author(s):  
Amirhosein Mosavi ◽  
Ataollah Shirzadi ◽  
Bahram Choubin ◽  
Fereshteh Taromideh ◽  
Farzaneh Sajedi Hosseini ◽  
...  

2019 ◽  
Author(s):  
Flavio Pazos ◽  
Pablo Soto ◽  
Martín Palazzo ◽  
Gustavo Guerberoff ◽  
Patricio Yankilevich ◽  
...  

Abstract Background. Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes that was postulated to be very enriched in genes with still undocumented synaptic functions. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we used these new empirical data to evaluate the predictive power of the catalogue. Then we implemented a series of improvements to the training scheme and the ensemble rules of our model and added the new synaptic genes to the training set, to obtain a new, enhanced catalogue of putative synaptic genes. Results. The retrospective analysis demonstrated that our original catalogue was indeed highly enriched in genes with unknown synaptic function. The changes to the training scheme and the ensemble rules resulted in a catalogue with better predictive power. Finally, training this improved model with an updated training set, that includes all the new synaptic genes, we obtained a new, enhanced catalogue of putative synaptic genes, which we present here announcing a regularly updated version that will be available online at: http://synapticgenes.bnd.edu.uy Conclusions. We show that training a machine learning model solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data, we validated our original approach, improved our model an obtained a better catalogue. The utility of this approach is that it reduces the number of genes to be tested through hypothesis-driven experimentation.


2020 ◽  
Vol 30 (12) ◽  
pp. 1835-1845
Author(s):  
Li Tang ◽  
Matthew C. Hill ◽  
Jun Wang ◽  
Jianxin Wang ◽  
James F. Martin ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document