Evaluation of machine learning methods to predict peptide binding to MHC Class I proteins

AbstractBinding of peptides to Major Histocompatibility Complex (MHC) proteins is a critical step in immune response. Peptides bound to MHCs are recognized by CD8+ (MHC Class I) and CD4+ (MHC Class II) T-cells. Successful prediction of which peptides will bind to specific MHC alleles would benefit many cancer immunotherapy appications. Currently, supervised machine learning is the leading computational approach to predict peptide-MHC binding, and a number of methods, trained using results of binding assays, have been published. Many clinical researchers are dissatisfied with the sensitivity and specificity of currently available methods and the limited number of alleles for which they can be applied. We evaluated several recent methods to predict peptide-MHC Class I binding affinities and a new method of our own design (MHCnuggets). We used a high-quality benchmark set of 51 alleles, which has been applied previously. The neural network methods NetMHC, NetMHCpan, MHCflurry, and MHCnuggets achieved similar best-in-class prediction performance in our testing, and of these methods MHCnuggets was significantly faster. MHCnuggets is a gated recurrent neural network, and the only method to our knowledge which can handle peptides of any length, without artificial lengthening and shortening. Seventeen alleles were problematic for all tested methods. Prediction difficulties could be explained by deficiencies in the training and testing examples in the benchmark, suggesting that biological differences in allele-specific binding properties are not as important as previously claimed. Advances in accuracy and speed of computational methods to predict peptide-MHC affinity are urgently needed. These methods will be at the core of pipelines to identify patients who will benefit from immunotherapy, based on tumor-derived somatic mutations. Machine learning methods, such as MHCnuggets, which efficiently handle peptides of any length will be increasingly important for the challenges of predicting immunogenic response for MHC Class II alleles.Author SummaryMachine learning methods are a popular approach for predicting whether a peptide will bind to Major Histocompatibility Complex (MHC) proteins, a critical step in activation of cytotoxic T-cells. The input to these methods is a peptide sequence and an MHC allele of interest, and the output is the predicted binding affinity. MHC Class I and II proteins bind peptides of 8-11 amino acids and 16-26 amino acids respectively. This has been an obstacle for machine learning, because the methods used to date can only handle fixed-length inputs. We show that a recently developed technique known as gated recurrent neural networks can handle peptides of variable length and predict peptide-MHC binding as well or better than existing methods, at substantially faster speeds. Our results have implications for the hundreds of MHC alleles that cannot be predicted with current methods.

Download Full-text

Novel Machine Learning Methods for MHC Class I Binding Prediction

Pattern Recognition in Bioinformatics - Lecture Notes in Computer Science ◽

10.1007/978-3-642-16001-1_9 ◽

2010 ◽

pp. 98-109 ◽

Cited By ~ 3

Author(s):

Christian Widmer ◽

Nora C. Toussaint ◽

Yasemin Altun ◽

Oliver Kohlbacher ◽

Gunnar Rätsch

Keyword(s):

Machine Learning ◽

Mhc Class I ◽

Class I ◽

Binding Prediction ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Possibility of Autonomous Estimation of Shiba Goat’s Estrus and Non-Estrus Behavior by Machine Learning Methods

Animals ◽

10.3390/ani10050771 ◽

2020 ◽

Vol 10 (5) ◽

pp. 771

Author(s):

Toshiya Arakawa

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Markov Models ◽

Tracking System ◽

Video Tracking ◽

Training Data ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Mammalian behavior is typically monitored by observation. However, direct observation requires a substantial amount of effort and time, if the number of mammals to be observed is sufficiently large or if the observation is conducted for a prolonged period. In this study, machine learning methods as hidden Markov models (HMMs), random forests, support vector machines (SVMs), and neural networks, were applied to detect and estimate whether a goat is in estrus based on the goat’s behavior; thus, the adequacy of the method was verified. Goat’s tracking data was obtained using a video tracking system and used to estimate whether they, which are in “estrus” or “non-estrus”, were in either states: “approaching the male”, or “standing near the male”. Totally, the PC of random forest seems to be the highest. However, The percentage concordance (PC) value besides the goats whose data were used for training data sets is relatively low. It is suggested that random forest tend to over-fit to training data. Besides random forest, the PC of HMMs and SVMs is high. However, considering the calculation time and HMM’s advantage in that it is a time series model, HMM is better method. The PC of neural network is totally low, however, if the more goat’s data were acquired, neural network would be an adequate method for estimation.

Download Full-text

Landslide susceptibility mapping based on convolutional neural network and conventional machine learning methods

10.21203/rs.3.rs-190195/v1 ◽

2021 ◽

Author(s):

Rui Liu ◽

Xin Yang ◽

Chong Xu ◽

Luyao Li ◽

Xiangqiang Zeng

Keyword(s):

Neural Network ◽

Machine Learning ◽

Convolutional Neural Network ◽

Landslide Susceptibility ◽

Susceptibility Mapping ◽

Landslide Susceptibility Mapping ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Conventional Machine

Abstract Landslide susceptibility mapping (LSM) is a useful tool to estimate the probability of landslide occurrence, providing a scientific basis for natural hazards prevention, land use planning, and economic development in landslide-prone areas. To date, a large number of machine learning methods have been applied to LSM, and recently the advanced Convolutional Neural Network (CNN) has been gradually adopted to enhance the prediction accuracy of LSM. The objective of this study is to introduce a CNN based model in LSM and systematically compare its overall performance with the conventional machine learning models of random forest, logistic regression, and support vector machine. Herein, we selected the Jiuzhaigou region in Sichuan Province, China as the study area. A total number of 710 landslides and 12 predisposing factors were stacked to form spatial datasets for LSM. The ROC analysis and several statistical metrics, such as accuracy, root mean square error (RMSE), Kappa coefficient, sensitivity, and specificity were used to evaluate the performance of the models in the training and validation datasets. Finally, the trained models were calculated and the landslide susceptibility zones were mapped. Results suggest that both CNN and conventional machine-learning based models have a satisfactory performance (AUC: 85.72% − 90.17%). The CNN based model exhibits excellent good-of-fit and prediction capability, and achieves the highest performance (AUC: 90.17%) but also significantly reduces the salt-of-pepper effect, which indicates its great potential of application to LSM.

Download Full-text

H-2M3a violates the paradigm for major histocompatibility complex class I peptide binding.

Journal of Experimental Medicine ◽

10.1084/jem.181.5.1817 ◽

1995 ◽

Vol 181 (5) ◽

pp. 1817-1825 ◽

Cited By ~ 22

Author(s):

J M Vyas ◽

J R Rodgers ◽

R R Rich

Keyword(s):

Amino Acids ◽

Mhc Class I ◽

Temperature Shift ◽

Class I ◽

Major Histocompatibility ◽

Histocompatibility Complex ◽

Compensatory Adaptation ◽

Chemotactic Peptides ◽

The Stability ◽

Formyl Peptides

The major histocompatibility (MHC) class I-b molecule H-2M3a binds and presents N-formylated peptides to cytotoxic T lymphocytes. This requirement potentially places severe constraints on the number of peptides that M3a can present to the immune system. Consistent with this idea, the M3a-Ld MHC class I chimera is expressed at very low levels on the cell surface, but can be induced significantly by the addition of specific peptides at 27 degrees C. Using this assay, we show that M3a binds many very short N-formyl peptides, including N-formyl chemotactic peptides and canonical octapeptides. This observation is in sharp contrast to the paradigmatic size range of peptides of 8-10 amino acids binding to most class I-a molecules and the class I-b molecule Qa-2. Stabilization by fMLF-benzyl amide could be detected at peptide concentrations as low as 100 nM. While N-formyl peptides as short as two amino acids in length stabilized expression of M3a-Ld, increasing the length of these peptides added to the stability of peptide-MHC complexes as determined by 27-37 degrees C temperature shift experiments. We propose that relaxation of the length rule may represent a compensatory adaptation to maximize the number of peptides that can be presented by H-2M3a.

Download Full-text

Detecting Items with the Biggest Weight Based on Neural Network and Machine Learning Methods

Communications in Computer and Information Science - Data Stream Mining & Processing ◽

10.1007/978-3-030-61656-4_26 ◽

2020 ◽

pp. 383-396

Author(s):

Vitaliy Danylyk ◽

Victoria Vysotska ◽

Vasyl Lytvyn ◽

Svitlana Vyshemyrska ◽

Iryna Lurie ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Convolutional Neural Network Model in Machine Learning Methods and Computer Vision for Image Recognition: A Review

Journal of Applied Sciences Research ◽

10.22587/jasr.2018.14.6.5 ◽

2018 ◽

Keyword(s):

Neural Network ◽

Machine Learning ◽

Computer Vision ◽

Convolutional Neural Network ◽

Network Model ◽

Image Recognition ◽

Neural Network Model ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

Business Systems Research Journal ◽

10.2478/bsrj-2014-0021 ◽

2014 ◽

Vol 5 (3) ◽

pp. 82-96 ◽

Cited By ~ 3

Author(s):

Marijana Zekić-Sušac ◽

Sanja Pfeifer ◽

Nataša Šarlija

Keyword(s):

Neural Network ◽

Machine Learning ◽

Classification Accuracy ◽

Classification Problem ◽

High Dimensional ◽

Nearest Neighbour ◽

Learning Methods ◽

Machine Learning Methods ◽

Dimensional Classification ◽

Artificial Neural

Abstract Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.

Download Full-text

Monitoring the Variation of Vegetation Water Content with Machine Learning Methods: Point–Surface Fusion of MODIS Products and GNSS-IR Observations

Remote Sensing ◽

10.3390/rs11121440 ◽

2019 ◽

Vol 11 (12) ◽

pp. 1440 ◽

Cited By ~ 1

Author(s):

Qiangqiang Yuan ◽

Shuwen Li ◽

Linwei Yue ◽

Tongwen Li ◽

Huanfeng Shen ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Water Content ◽

Linear Regression Method ◽

Learning Methods ◽

Machine Learning Methods ◽

Vegetation Water Content ◽

Drought Prediction ◽

Vegetation Water ◽

Surface Fusion

Vegetation water content (VWC) is recognized as an important parameter in vegetation growth studies, natural disasters such as forest fires, and drought prediction. Recently, the Global Navigation Satellite System Interferometric Reflectometry (GNSS-IR) has emerged as an important technique for monitoring vegetation information. The normalized microwave reflection index (NMRI) was developed to reflect the change of VWC based on this fact. However, NMRI uses local site-based data, and the sparse distribution hinders the application of NMRI. In this study, we obtained a 500 m spatially continuous NMRI product by integrating GNSS-IR site data with other VWC-related products using the point–surface fusion technique. The auxiliary data in the fusion process include the normalized difference vegetation index (NDVI), gross primary productivity (GPP), and precipitation. Meanwhile, the fusion performance of three machine learning methods, i.e., the back-propagation neural network (BPNN), generalized regression neural network (GRNN), and random forest (RF) are compared and analyzed. The machine learning methods achieve satisfactory results, with cross-validation R values of 0.71–0.83 and RMSEs of 0.025–0.037. The results show a clear improvement over the traditional multiple linear regression method, which achieves R (RMSE) values of only about 0.4 (0.045). It indicates that the machine learning methods can better learn the complex nonlinear relationship between NMRI and the input VWC-related index. Among the machine learning methods, the RF model obtained the best results. Long time-series NMRI images with a 500 m spatial resolution in the western part of the continental U.S. were then obtained. The results show that the spatial distribution of the NMRI product is consistent with a drought situation from 2012 to 2014 in the U.S., which verifies the feasibility of analyzing and predicting drought times and distribution ranges by using the 500 m fusion product.

Download Full-text

The Major Histocompatibility Complex of Old World Camels—A Synopsis

Cells ◽

10.3390/cells8101200 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1200 ◽

Cited By ~ 1

Author(s):

Plasil ◽

Wijkmark ◽

Elbers ◽

Oppelt ◽

Burger ◽

...

Keyword(s):

Major Histocompatibility Complex ◽

Mhc Class I ◽

Mhc Class Ii ◽

Class Ii ◽

Class I ◽

Class Iii ◽

Old World ◽

Histocompatibility Complex ◽

Mhc Class Iii ◽

Antigen Presenting

This study brings new information on major histocompatibility complex (MHC) class III sub-region genes in Old World camels and integrates current knowledge of the MHC region into a comprehensive overview for Old World camels. Out of the MHC class III genes characterized, TNFA and the LY6 gene family showed high levels of conservation, characteristic for MHC class III loci in general. For comparison, an MHC class II gene TAP1, not coding for antigen presenting molecules but functionally related to MHC antigen presenting functions was studied. TAP1 had many SNPs, even higher than the MHC class I and II genes encoding antigen presenting molecules. Based on this knowledge and using new camel genomic resources, we constructed an improved genomic map of the entire MHC region of Old World camels. The MHC class III sub-region shows a standard organization similar to that of pig or cattle. The overall genomic structure of the camel MHC is more similar to pig MHC than to cattle MHC. This conclusion is supported by differences in the organization of the MHC class II sub-region, absence of functional DY genes, different organization of MIC genes in the MHC class I sub-region, and generally closer evolutionary relationships of camel and porcine MHC gene sequences analyzed so far.

Download Full-text

Major Histocompatibility Complex Class II and Programmed Death Ligand 1 Expression Predict Outcome After Programmed Death 1 Blockade in Classic Hodgkin Lymphoma

Journal of Clinical Oncology ◽

10.1200/jco.2017.77.3994 ◽

2018 ◽

Vol 36 (10) ◽

pp. 942-950 ◽

Cited By ~ 106

Author(s):

Margaretha G.M. Roemer ◽

Robert A. Redd ◽

Fathima Zumla Cader ◽

Christine J. Pak ◽

Sara Abdelrahman ◽

...

Keyword(s):

Mhc Class I ◽

Mhc Class Ii ◽

Class Ii ◽

Class I ◽

Histocompatibility Complex ◽

Classic Hodgkin Lymphoma ◽

Programmed Death ◽

Clinical Responses ◽

Programmed Death 1 ◽

Cell Expression

Purpose Hodgkin Reed-Sternberg (HRS) cells evade antitumor immunity by multiple means, including gains of 9p24.1/ CD274(PD-L1)/ PDCD1LG2(PD-L2) and perturbed antigen presentation. Programmed death 1 (PD-1) receptor blockade is active in classic Hodgkin lymphoma (cHL) despite reported deficiencies of major histocompatibility complex (MHC) class I expression on HRS cells. Herein, we assess bases of sensitivity to PD-1 blockade in patients with relapsed/refractory cHL who were treated with nivolumab (anti–PD-1) in the CheckMate 205 trial. Methods HRS cells from archival tumor biopsies were evaluated for 9p24.1 alterations by fluorescence in situ hybridization and for expression of PD ligand 1 (PD-L1) and the antigen presentation pathway components—β2-microglobulin, MHC class I, and MHC class II—by immunohistochemistry. These parameters were correlated with clinical responses and progression-free survival (PFS) after PD-1 blockade. Results Patients with higher-level 9p24.1 copy gain and increased PD-L1 expression on HRS cells had superior PFS. HRS cell expression of β2-microglobulin/MHC class I was not predictive for complete remission or PFS after nivolumab therapy. In contrast, HRS cell expression of MHC class II was predictive for complete remission. In patients with a > 12-month interval between myeloablative autologous stem-cell transplantation and nivolumab therapy, HRS cell expression of MHC class II was associated with prolonged PFS. Conclusion Genetically driven PD-L1 expression and MHC class II positivity on HRS cells are potential predictors of favorable outcome after PD-1 blockade. In cHL, clinical responses to nivolumab were not dependent on HRS cell expression of MHC class I.

Download Full-text