OnionMHC: A deep learning model for peptide — HLA-A*02:01 binding predictions using both structure and sequence feature sets

Abstract The peptide binding to Major Histocompatibility Complex (MHC) proteins is an important step in the antigen-presentation pathway. Thus, predicting the binding potential of peptides with MHC is essential for the design of peptide-based therapeutics. Most of the available machine learning-based models predict the peptide-MHC binding based on the sequence of amino acids alone. Given the importance of structural information in determining the stability of the complex, here we have utilized both the complex structure and the peptide sequence features to predict the binding affinity of peptides to human receptor HLA-A*02:01. To our knowledge, no such model has been developed for the human HLA receptor before that incorporates both structure and sequence-based features. Results: We have applied machine learning techniques through the natural language processing (NLP) and convolutional neural network to design a model that performs comparably with the existing state-of-the-art models. Our model shows that the information from both sequence and structure domains results in enhanced performance in the binding prediction compared to the information from one domain alone. The testing results in 18 weekly benchmark datasets provided by the Immune Epitope Database (IEDB) as well as experimentally validated peptides from the whole-exome sequencing analysis of the breast cancer patients indicate that our model has achieved state-of-the-art performance. Conclusion: We have developed a deep-learning model (OnionMHC) that incorporates both structure as well as sequence-based features to predict the binding affinity of peptides with human receptor HLA-A*02:01. The model demonstrates state-of-the-art performance on the IEDB benchmark dataset as well as the experimentally validated peptides. The model can be used in the screening of potential neo-epitopes for the development of cancer vaccines or designing peptides for peptide-based therapeutics. OnionMHC is freely available at https://github.com/shikhar249/OnionMHC .

Download Full-text

OnionMHC: A Deep Learning Model for Peptide - HLA-A*02:01 Binding Predictions using both Structure and Sequence Feature Sets

10.21203/rs.3.rs-124695/v1 ◽

2020 ◽

Author(s):

Shikhar Saxena ◽

Sambhavi Animesh ◽

Melissa Fullwood ◽

Yuguang Mu

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Binding Affinity ◽

State Of The Art ◽

Learning Model ◽

Peptide Sequence ◽

Sequencing Analysis ◽

Art Performance ◽

Deep Learning Model ◽

Human Receptor

Abstract Background:The peptide binding to Major Histocompatibility Complex (MHC) proteins is an important step in the antigen-presentation pathway. Thus, predicting the binding potential of peptides with MHC is essential for the design of peptide-based therapeutics. Most of the available machine learning-based models predict the peptide-MHC binding based on the sequence of amino acids alone. Given the importance of structural information in determining the stability of the complex, here we have utilized both the complex structure and the peptide sequence features to predict the binding affinity of peptides to human receptor HLA-A*02:01. To our knowledge, no such model has been developed for the human HLA receptor before that incorporates both structure and sequence-based features.Results:We have applied machine learning techniques through the natural language processing (NLP) and convolutional neural network to design a model that performs comparably with the existing state-of-the-art models. Our model shows that the information from both sequence and structure domains results in enhanced performance in the binding prediction compared to the information from one domain alone. The testing results in 18 weekly benchmark datasets provided by the Immune Epitope Database (IEDB) as well as experimentally validated peptides from the whole-exome sequencing analysis of the breast cancer patients indicate that our model has achieved state-of-the-art performance.Conclusion: We have developed a deep-learning model (OnionMHC) that incorporates both structure as well as sequence-based features to predict the binding affinity of peptides with human receptor HLA-A*02:01. The model demonstrates state-of-the-art performance on the IEDB benchmark dataset as well as the experimentally validated peptides. The model can be used in the screening of potential neo-epitopes for the development of cancer vaccines or designing peptides for peptide-based therapeutics. OnionMHC is freely available at https://github.com/shikhar249/OnionMHC

Download Full-text

Attentive deep learning-based tumor-only somatic mutation classifier achieves high accuracy agnostic of tissue type and capture kit.

10.1101/2021.12.07.471513 ◽

2021 ◽

Author(s):

R. Tyler McLaughlin ◽

Maansi Asthana ◽

Marc Di Meo ◽

Michele Ceccarelli ◽

Howard J. Jacob ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

State Of The Art ◽

Variant Calling ◽

Learning Model ◽

Tissue Type ◽

Germline Variants ◽

Art Performance ◽

Deep Learning Model ◽

Tumor Dna

In precision oncology, reliable identification of tumor-specific DNA mutations requires sequencing tumor DNA and non-tumor DNA (so-called "matched normal") from the same patient. The normal sample allows researchers to distinguish acquired (somatic) and hereditary (germline) variants. The ability to distinguish somatic and germline variants facilitates estimation of tumor mutation burden (TMB), which is a recently FDA-approved pan-cancer marker for highly successful cancer immunotherapies; in tumor-only variant calling (i.e., without a matched normal), the difficulty in discriminating germline and somatic variants results in inflated and unreliable TMB estimates. We apply machine learning to the task of somatic vs germline classification in tumor-only samples using TabNet, a recently developed attentive deep learning model for tabular data that has achieved state of the art performance in multiple classification tasks (Arik and Pfister 2019). We constructed a training set for supervised classification using features derived from tumor-only variant calling and drawing somatic and germline truth-labels from an independent pipeline incorporating the patient-matched normal samples. Our trained model achieved state-of-the-art performance on two hold-out test datasets: a TCGA dataset including sarcoma, breast adenocarcinoma, and endometrial carcinoma samples (F1-score: 88.3), and a metastatic melanoma dataset, (F1-score 79.8). Concordance between matched-normal and tumor-only TMB improves from R2 = 0.006 to 0.705 with the addition of our classifier. And importantly, this approach generalizes across tumor tissue types and capture kits and has a call rate of 100%. The interpretable feature masks of the attentive deep learning model explain the reasons for misclassified variants. We reproduce the recent finding that tumor-only TMB estimates for Black patients are extremely inflated relative to that of White patients due to the racial biases of germline databases. We show that our machine learning approach appreciably reduces this racial bias in tumor-only variant-calling.

Download Full-text

Non-invasive cuff-less blood pressure estimation using a hybrid deep learning model

Optical and Quantum Electronics ◽

10.1007/s11082-020-02667-0 ◽

2021 ◽

Vol 53 (2) ◽

Author(s):

Sen Yang ◽

Yaping Zhang ◽

Siu-Yeung Cho ◽

Ricardo Correia ◽

Stephen P. Morgan

Keyword(s):

Machine Learning ◽

Blood Pressure ◽

Deep Learning ◽

Absolute Error ◽

Learning Model ◽

Physiological Measurement ◽

British Hypertension Society ◽

Non Invasive ◽

Blood Pressure Estimation ◽

Deep Learning Model

AbstractConventional blood pressure (BP) measurement methods have different drawbacks such as being invasive, cuff-based or requiring manual operations. There is significant interest in the development of non-invasive, cuff-less and continual BP measurement based on physiological measurement. However, in these methods, extracting features from signals is challenging in the presence of noise or signal distortion. When using machine learning, errors in feature extraction result in errors in BP estimation, therefore, this study explores the use of raw signals as a direct input to a deep learning model. To enable comparison with the traditional machine learning models which use features from the photoplethysmogram and electrocardiogram, a hybrid deep learning model that utilises both raw signals and physical characteristics (age, height, weight and gender) is developed. This hybrid model performs best in terms of both diastolic BP (DBP) and systolic BP (SBP) with the mean absolute error being 3.23 ± 4.75 mmHg and 4.43 ± 6.09 mmHg respectively. DBP and SBP meet the Grade A and Grade B performance requirements of the British Hypertension Society respectively.

Download Full-text

Semantic segmentation of PolSAR image data using advanced deep learning model

Scientific Reports ◽

10.1038/s41598-021-94422-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Rajat Garg ◽

Anil Kumar ◽

Nikunj Bansal ◽

Manish Prateek ◽

Shashi Kumar

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Deep Learning ◽

Urban Area ◽

Urban Areas ◽

Learning Algorithms ◽

Semantic Segmentation ◽

Learning Model ◽

Machine Learning Algorithms ◽

Deep Learning Model

AbstractUrban area mapping is an important application of remote sensing which aims at both estimation and change in land cover under the urban area. A major challenge being faced while analyzing Synthetic Aperture Radar (SAR) based remote sensing data is that there is a lot of similarity between highly vegetated urban areas and oriented urban targets with that of actual vegetation. This similarity between some urban areas and vegetation leads to misclassification of the urban area into forest cover. The present work is a precursor study for the dual-frequency L and S-band NASA-ISRO Synthetic Aperture Radar (NISAR) mission and aims at minimizing the misclassification of such highly vegetated and oriented urban targets into vegetation class with the help of deep learning. In this study, three machine learning algorithms Random Forest (RF), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM) have been implemented along with a deep learning model DeepLabv3+ for semantic segmentation of Polarimetric SAR (PolSAR) data. It is a general perception that a large dataset is required for the successful implementation of any deep learning model but in the field of SAR based remote sensing, a major issue is the unavailability of a large benchmark labeled dataset for the implementation of deep learning algorithms from scratch. In current work, it has been shown that a pre-trained deep learning model DeepLabv3+ outperforms the machine learning algorithms for land use and land cover (LULC) classification task even with a small dataset using transfer learning. The highest pixel accuracy of 87.78% and overall pixel accuracy of 85.65% have been achieved with DeepLabv3+ and Random Forest performs best among the machine learning algorithms with overall pixel accuracy of 77.91% while SVM and KNN trail with an overall accuracy of 77.01% and 76.47% respectively. The highest precision of 0.9228 is recorded for the urban class for semantic segmentation task with DeepLabv3+ while machine learning algorithms SVM and RF gave comparable results with a precision of 0.8977 and 0.8958 respectively.

Download Full-text

BEHRT-HF: an interpretable transformer-based, deep learning model for prediction of incident heart failure

European Heart Journal ◽

10.1093/ehjci/ehaa946.3553 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

S Rao ◽

Y Li ◽

R Ramakrishnan ◽

A Hassaine ◽

D Canoy ◽

...

Keyword(s):

Heart Failure ◽

Deep Learning ◽

State Of The Art ◽

Failure Prediction ◽

Predictive Performance ◽

Learning Model ◽

Learning Framework ◽

Incident Heart Failure ◽

Ablation Study ◽

Deep Learning Model

Abstract Background/Introduction Predicting incident heart failure has been challenging. Deep learning models when applied to rich electronic health records (EHR) offer some theoretical advantages. However, empirical evidence for their superior performance is limited and they remain commonly uninterpretable, hampering their wider use in medical practice. Purpose We developed a deep learning framework for more accurate and yet interpretable prediction of incident heart failure. Methods We used longitudinally linked EHR from practices across England, involving 100,071 patients, 13% of whom had been diagnosed with incident heart failure during follow-up. We investigated the predictive performance of a novel transformer deep learning model, “Transformer for Heart Failure” (BEHRT-HF), and validated it using both an external held-out dataset and an internal five-fold cross-validation mechanism using area under receiver operating characteristic (AUROC) and area under the precision recall curve (AUPRC). Predictor groups included all outpatient and inpatient diagnoses within their temporal context, medications, age, and calendar year for each encounter. By treating diagnoses as anchors, we alternatively removed different modalities (ablation study) to understand the importance of individual modalities to the performance of incident heart failure prediction. Using perturbation-based techniques, we investigated the importance of associations between selected predictors and heart failure to improve model interpretability. Results BEHRT-HF achieved high accuracy with AUROC 0.932 and AUPRC 0.695 for external validation, and AUROC 0.933 (95% CI: 0.928, 0.938) and AUPRC 0.700 (95% CI: 0.682, 0.718) for internal validation. Compared to the state-of-the-art recurrent deep learning model, RETAIN-EX, BEHRT-HF outperformed it by 0.079 and 0.030 in terms of AUPRC and AUROC. Ablation study showed that medications were strong predictors, and calendar year was more important than age. Utilising perturbation, we identified and ranked the intensity of associations between diagnoses and heart failure. For instance, the method showed that established risk factors including myocardial infarction, atrial fibrillation and flutter, and hypertension all strongly associated with the heart failure prediction. Additionally, when population was stratified into different age groups, incident occurrence of a given disease had generally a higher contribution to heart failure prediction in younger ages than when diagnosed later in life. Conclusions Our state-of-the-art deep learning framework outperforms the predictive performance of existing models whilst enabling a data-driven way of exploring the relative contribution of a range of risk factors in the context of other temporal information. Funding Acknowledgement Type of funding source: Private grant(s) and/or Sponsorship. Main funding source(s): National Institute for Health Research, Oxford Martin School, Oxford Biomedical Research Centre

Download Full-text

A Hybrid Prognostics Deep Learning Model for Remaining Useful Life Prediction

Electronics ◽

10.3390/electronics10010039 ◽

2020 ◽

Vol 10 (1) ◽

pp. 39

Author(s):

Zhiyuan Xie ◽

Shichang Du ◽

Jun Lv ◽

Yafei Deng ◽

Shiyao Jia

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Model ◽

Recurrent Network ◽

Remaining Useful Life ◽

Support Vector ◽

Second Phase ◽

Learning Methods ◽

Useful Life ◽

Deep Learning Model

Remaining Useful Life (RUL) prediction is significant in indicating the health status of the sophisticated equipment, and it requires historical data because of its complexity. The number and complexity of such environmental parameters as vibration and temperature can cause non-linear states of data, making prediction tremendously difficult. Conventional machine learning models such as support vector machine (SVM), random forest, and back propagation neural network (BPNN), however, have limited capacity to predict accurately. In this paper, a two-phase deep-learning-model attention-convolutional forget-gate recurrent network (AM-ConvFGRNET) for RUL prediction is proposed. The first phase, forget-gate convolutional recurrent network (ConvFGRNET) is proposed based on a one-dimensional analog long short-term memory (LSTM), which removes all the gates except the forget gate and uses chrono-initialized biases. The second phase is the attention mechanism, which ensures the model to extract more specific features for generating an output, compensating the drawbacks of the FGRNET that it is a black box model and improving the interpretability. The performance and effectiveness of AM-ConvFGRNET for RUL prediction is validated by comparing it with other machine learning methods and deep learning methods on the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset and a dataset of ball screw experiment.

Download Full-text

Deep Learning for the Automated Classification of Functional Brain Networks in fMRI (Preprint)

10.2196/preprints.33825 ◽

2021 ◽

Author(s):

Lukman Ismael ◽

Pejman Rasti ◽

Florian Bernard ◽

Philippe Menei ◽

Aram Ter Minassian ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Algorithms ◽

Brain Networks ◽

Learning Model ◽

Machine Learning Algorithms ◽

Functional Networks ◽

Functional Brain ◽

Functional Brain Networks ◽

Deep Learning Model

BACKGROUND The functional MRI (fMRI) is an essential tool for the presurgical planning of brain tumor removal, allowing the identification of functional brain networks in order to preserve the patient’s neurological functions. One fMRI technique used to identify the functional brain network is the resting-state-fMRI (rsfMRI). However, this technique is not routinely used because of the necessity to have a expert reviewer to identify manually each functional networks. OBJECTIVE We aimed to automatize the detection of brain functional networks in rsfMRI data using deep learning and machine learning algorithms METHODS We used the rsfMRI data of 82 healthy patients to test the diagnostic performance of our proposed end-to-end deep learning model to the reference functional networks identified manually by 2 expert reviewers. RESULTS Experiment results show the best performance of 86% correct recognition rate obtained from the proposed deep learning architecture which shows its superiority over other machine learning algorithms that were equally tested for this classification task. CONCLUSIONS The proposed end-to-end deep learning model was the most performant machine learning algorithm. The use of this model to automatize the functional networks detection in rsfMRI may allow to broaden the use of the rsfMRI, allowing the presurgical identification of these networks and thus help to preserve the patient’s neurological status. CLINICALTRIAL Comité de protection des personnes Ouest II, decision reference CPP 2012-25)

Download Full-text

A Physics-Infused Deep Learning Model for the Prediction of Refractive Indices and Its Use for the Large-Scale Screening of Organic Compound Space

10.26434/chemrxiv.8796950 ◽

2019 ◽

Author(s):

Mojtaba Haghighatlari ◽

Gaurav Vishwakarma ◽

Mohammad Atif Faiz Afzal ◽

Johannes Hachmann

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Organic Molecules ◽

Learning Model ◽

Training Data ◽

Refractive Indices ◽

Learning Models ◽

Deep Learning Model ◽

Machine Learning Models

<div><div><div><p>We present a multitask, physics-infused deep learning model to accurately and efficiently predict refractive indices (RIs) of organic molecules, and we apply it to a library of 1.5 million compounds. We show that it outperforms earlier machine learning models by a significant margin, and that incorporating known physics into data-derived models provides valuable guardrails. Using a transfer learning approach, we augment the model to reproduce results consistent with higher-level computational chemistry training data, but with a considerably reduced number of corresponding calculations. Prediction errors of machine learning models are typically smallest for commonly observed target property values, consistent with the distribution of the training data. However, since our goal is to identify candidates with unusually large RI values, we propose a strategy to boost the performance of our model in the remoter areas of the RI distribution: We bias the model with respect to the under-represented classes of molecules that have values in the high-RI regime. By adopting a metric popular in web search engines, we evaluate our effectiveness in ranking top candidates. We confirm that the models developed in this study can reliably predict the RIs of the top 1,000 compounds, and are thus able to capture their ranking. We believe that this is the first study to develop a data-derived model that ensures the reliability of RI predictions by model augmentation in the extrapolation region on such a large scale. These results underscore the tremendous potential of machine learning in facilitating molecular (hyper)screening approaches on a massive scale and in accelerating the discovery of new compounds and materials, such as organic molecules with high-RI for applications in opto-electronics.</p></div></div></div>

Download Full-text

Case Study of Deep Learning Model of Temperature-Induced Deflection of a Cable-Stayed Bridge Driven by Data Knowledge

Symmetry ◽

10.3390/sym13122293 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2293

Author(s):

Zixiang Yue ◽

Youliang Ding ◽

Hanwei Zhao ◽

Zhiwen Wang

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Prior Knowledge ◽

Learning Model ◽

Bridge Management ◽

Learning Network ◽

Cable Stayed Bridge ◽

Main Girder ◽

Deep Learning Model

A cable-stayed bridge is a typical symmetrical structure, and symmetry affects the deformation characteristics of such bridges. The main girder of a cable-stayed bridge will produce obvious deflection under the inducement of temperature. The regression model of temperature-induced deflection is hoped to provide a comparison value for bridge evaluation. Based on the temperature and deflection data obtained by the health monitoring system of a bridge, establishing the correlation model between temperature and temperature-induced deflection is meaningful. It is difficult to complete a high-quality model only by the girder temperature. The temperature features based on prior knowledge from the mechanical mechanism are used as the input information in this paper. At the same time, to strengthen the nonlinear ability of the model, this paper selects an independent recurrent neural network (IndRNN) for modeling. The deep learning neural network is compared with machine learning neural networks to prove the advancement of deep learning. When only the average temperature of the main girder is input, the calculation accuracy is not high regardless of whether the deep learning network or the machine learning network is used. When the temperature information extracted by the prior knowledge is input, the average error of IndRNN model is only 2.53%, less than those of BPNN model and traditional RNN. Combining knowledge with deep learning is undoubtedly the best modeling scheme. The deep learning model can provide a comparison value of bridge deformation for bridge management.

Download Full-text

Comparison between Deep Learning and Tree-Based Machine Learning Approaches for Landslide Susceptibility Mapping

Water ◽

10.3390/w13192664 ◽

2021 ◽

Vol 13 (19) ◽

pp. 2664

Author(s):

Sunil Saha ◽

Jagabandhu Roy ◽

Tusar Kanti Hembram ◽

Biswajeet Pradhan ◽

Abhirup Dikshit ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Landslide Susceptibility ◽

Learning Model ◽

Susceptibility Mapping ◽

Landslide Susceptibility Mapping ◽

Learning Approaches ◽

Statistical Measures ◽

Deep Learning Model

The efficiency of deep learning and tree-based machine learning approaches has gained immense popularity in various fields. One deep learning model viz. convolution neural network (CNN), artificial neural network (ANN) and four tree-based machine learning models, namely, alternative decision tree (ADTree), classification and regression tree (CART), functional tree and logistic model tree (LMT), were used for landslide susceptibility mapping in the East Sikkim Himalaya region of India, and the results were compared. Landslide areas were delimited and mapped as landslide inventory (LIM) after gathering information from historical records and periodic field investigations. In LIM, 91 landslides were plotted and classified into training (64 landslides) and testing (27 landslides) subsets randomly to train and validate the models. A total of 21 landslide conditioning factors (LCFs) were considered as model inputs, and the results of each model were categorised under five susceptibility classes. The receiver operating characteristics curve and 21 statistical measures were used to evaluate and prioritise the models. The CNN deep learning model achieved the priority rank 1 with area under the curve of 0.918 and 0.933 by using the training and testing data, quantifying 23.02% and 14.40% area as very high and highly susceptible followed by ANN, ADtree, CART, FTree and LMT models. This research might be useful in landslide studies, especially in locations with comparable geophysical and climatological characteristics, to aid in decision making for land use planning.

Download Full-text