scholarly journals Prediction of specific TCR-peptide binding from large dictionaries of TCR-peptide pairs

2019 ◽  
Author(s):  
Ido Springer ◽  
Hanan Besser ◽  
Nili Tickotsky-Moskovitz ◽  
Shirit Dvorkin ◽  
Yoram Louzoun

AbstractCurrent sequencing methods allow for detailed samples of T cell receptors (TCR) repertoires. To determine from a repertoire whether its host had been exposed to a target, computational tools that predict TCR-epitope binding are required. Currents tools are based on conserved motifs and are applied to peptides with many known binding TCRs.Given any TCR and peptide, we employ new NLP-based methods to predict whether they bind. We combined large-scale TCR-peptide dictionaries with deep learning methods to produce ERGO (pEptide tcR matchinG predictiOn), a highly specific and generic TCR-peptide binding predictor.A set of standard tests are defined for the performance of peptide-TCR binding, including the detection of TCRs binding to a given peptide/antigen, choosing among a set of candidate peptides for a given TCR and determining whether any pair of TCR-peptide bind. ERGO significantly outperforms current methods in these tests even when not trained specifically for each test.The software implementation and data sets are available at https://github.com/louzounlab/ERGO

2021 ◽  
Author(s):  
Ronghui You ◽  
Wei Qu ◽  
Hiroshi Mamitsuka ◽  
Shanfeng Zhu

Computationally predicting MHC-peptide binding affinity is an important problem in immunological bioinformatics. Recent cutting-edge deep learning-based methods for this problem are unable to achieve satisfactory performance for MHC class II molecules. This is because such methods generate the input by simply concatenating the two given sequences: (the estimated binding core of) a peptide and (the pseudo sequence of) an MHC class II molecule, ignoring the biological knowledge behind the interactions of the two molecules. We thus propose a binding core-aware deep learning-based model, DeepMHCII, with binding interaction convolution layer (BICL), which allows integrating all potential binding cores (in a given peptide) and the MHC pseudo (binding) sequence, through modeling the interaction with multiple convolutional kernels. Extensive empirical experiments with four large-scale datasets demonstrate that DeepMHCII significantly outperformed four state-of-the-art methods under numerous settings, such as five-fold cross-validation, leave one molecule out, validation with independent testing sets, and binding core prediction. All these results with visualization of the predicted binding cores indicate the effectiveness and importance of properly modeling biological facts in deep learning for high performance and knowledge discovery. DeepMHCII is publicly available at https://weilab.sjtu.edu.cn/DeepMHCII/.


2019 ◽  
Vol 277 ◽  
pp. 02007
Author(s):  
Qingzhi Zhang ◽  
Panfeng Wu ◽  
Xiaohui Du ◽  
Hualiang Sun ◽  
Lijia Yu

With the extensive application of deep learning in the field of human rehabilitation, skeleton based rehabilitation recognition is becoming more and more concerned with large-scale bone data sets. The key factor of this task is the two intra frame representations of the combined co-and the inter-frame. In this paper, an inter frame representation method based on RNN is proposed. Pointtion of each joint is joint-coded they are assembled into semantic both spatial and temporal domains.we introduce a global spatial aggregation which is able to learn superior joint co features over local aggregation.


2017 ◽  
Author(s):  
Christoph Sommer ◽  
Rudolf Hoefler ◽  
Matthias Samwer ◽  
Daniel W. Gerlich

AbstractSupervised machine learning is a powerful and widely used method to analyze high-content screening data. Despite its accuracy, efficiency, and versatility, supervised machine learning has drawbacks, most notably its dependence on a priori knowledge of expected phenotypes and time-consuming classifier training. We provide a solution to these limitations with CellCognition Explorer, a generic novelty detection and deep learning framework. Application to several large-scale screening data sets on nuclear and mitotic cell morphologies demonstrates that CellCognition Explorer enables discovery of rare phenotypes without user training, which has broad implications for improved assay development in high-content screening.


2021 ◽  
Vol 11 (4) ◽  
pp. 1529
Author(s):  
Xiaohong Sun ◽  
Jinan Gu ◽  
Meimei Wang ◽  
Yanhua Meng ◽  
Huichao Shi

In the wheel hub industry, the quality control of the product surface determines the subsequent processing, which can be realized through the hub defect image recognition based on deep learning. Although the existing methods based on deep learning have reached the level of human beings, they rely on large-scale training sets, however, these models are completely unable to cope with the situation without samples. Therefore, in this paper, a generalized zero-shot learning framework for hub defect image recognition was built. First, a reverse mapping strategy was adopted to reduce the hubness problem, then a domain adaptation measure was employed to alleviate the projection domain shift problem, and finally, a scaling calibration strategy was used to avoid the recognition preference of seen defects. The proposed model was validated using two data sets, VOC2007 and the self-built hub defect data set, and the results showed that the method performed better than the current popular methods.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Douwe van der Wal ◽  
Iny Jhun ◽  
Israa Laklouk ◽  
Jeff Nirschl ◽  
Lara Richer ◽  
...  

AbstractBiology has become a prime area for the deployment of deep learning and artificial intelligence (AI), enabled largely by the massive data sets that the field can generate. Key to most AI tasks is the availability of a sufficiently large, labeled data set with which to train AI models. In the context of microscopy, it is easy to generate image data sets containing millions of cells and structures. However, it is challenging to obtain large-scale high-quality annotations for AI models. Here, we present HALS (Human-Augmenting Labeling System), a human-in-the-loop data labeling AI, which begins uninitialized and learns annotations from a human, in real-time. Using a multi-part AI composed of three deep learning models, HALS learns from just a few examples and immediately decreases the workload of the annotator, while increasing the quality of their annotations. Using a highly repetitive use-case—annotating cell types—and running experiments with seven pathologists—experts at the microscopic analysis of biological specimens—we demonstrate a manual work reduction of 90.60%, and an average data-quality boost of 4.34%, measured across four use-cases and two tissue stain types.


2017 ◽  
Vol 28 (23) ◽  
pp. 3428-3436 ◽  
Author(s):  
Christoph Sommer ◽  
Rudolf Hoefler ◽  
Matthias Samwer ◽  
Daniel W. Gerlich

Supervised machine learning is a powerful and widely used method for analyzing high-content screening data. Despite its accuracy, efficiency, and versatility, supervised machine learning has drawbacks, most notably its dependence on a priori knowledge of expected phenotypes and time-consuming classifier training. We provide a solution to these limitations with CellCognition Explorer, a generic novelty detection and deep learning framework. Application to several large-scale screening data sets on nuclear and mitotic cell morphologies demonstrates that CellCognition Explorer enables discovery of rare phenotypes without user training, which has broad implications for improved assay development in high-content screening.


GigaScience ◽  
2020 ◽  
Vol 9 (8) ◽  
Author(s):  
Yeping Lina Qiu ◽  
Hong Zheng ◽  
Olivier Gevaert

Abstract Background As missing values are frequently present in genomic data, practical methods to handle missing data are necessary for downstream analyses that require complete data sets. State-of-the-art imputation techniques, including methods based on singular value decomposition and K-nearest neighbors, can be computationally expensive for large data sets and it is difficult to modify these algorithms to handle certain cases not missing at random. Results In this work, we use a deep-learning framework based on the variational auto-encoder (VAE) for genomic missing value imputation and demonstrate its effectiveness in transcriptome and methylome data analysis. We show that in the vast majority of our testing scenarios, VAE achieves similar or better performances than the most widely used imputation standards, while having a computational advantage at evaluation time. When dealing with data missing not at random (e.g., few values are missing), we develop simple yet effective methodologies to leverage the prior knowledge about missing data. Furthermore, we investigate the effect of varying latent space regularization strength in VAE on the imputation performances and, in this context, show why VAE has a better imputation capacity compared to a regular deterministic auto-encoder. Conclusions We describe a deep learning imputation framework for transcriptome and methylome data using a VAE and show that it can be a preferable alternative to traditional methods for data imputation, especially in the setting of large-scale data and certain missing-not-at-random scenarios.


2021 ◽  
Author(s):  
Xian Xian Liu ◽  
Gloria Li ◽  
Wei Lou ◽  
Juntao Gao ◽  
Simon Fong

[Background]: An emerging type of cancer treatment, known as cell immunotherapy, is gaining popularity over chemotherapy or other radia-tion therapy that causes mass destruction to our body. One favourable ap-proach in cell immunotherapy is the use of neoantigens as targets that help our body immune system identify the cancer cells from healthy cells. Neoan-tigens, which are non-autologous proteins with individual specificity, are generated by non-synonymous mutations in the tumor cell genome. Owing to its strong immunogenicity and lack of expression in normal tissues, it is now an important target for tumor immunotherapy. Neoantigens are some form of special protein fragments excreted as a by-product on the surface of cancer cells during the DNA mutation at the tumour. In cancer immunotherapies, certain neoantigens which exist only on cancer cells elicit our white blood cells (body's defender, anti-cancer T-cell) responses that fight the cancer cells while leaving healthy cells alone. Personalized cancer vaccines there-fore can be designed de novo for each individual patient, when the specific neoantigens are found to be relevant to his/her tumour. The vaccine which is usually coded in synthetic long peptides, RNA or DNA representing the neo-antigens trigger an immune response in the body to destroy the cancer cells (tumour). The specific neoantigens can be found by a complex process of biopsy and genome sequencing. Alternatively, modern technologies nowa-days tap on AI to predict the right neoantigen candidates using algorithms. However, determining the binding and non-binding of neoantigens on T-cell receptors (TCR) is a challenging computational task due to its very large search space. [Objective]: To enhance the efficiency and accuracy of traditional deep learning tools, for serving the same purpose of finding potential responsive-ness to immunotherapy through correctly predicted neoantigens. It is known that deep learning is possible to explore which novel neoantigens bind to T-cell receptors and which ones don't. The exploration may be technically ex-pensive and time-consuming since deep learning is an inherently computa-tional method. one can use putative neoantigen peptide sequences to guide personalized cancer vaccines design. [Methods]: These models all proceed through complex feature engineering, including feature extraction, dimension reduction and so on. In this study, we derived 4 features to facilitate prediction and classification of 4 HLA-peptide binding namely AAC and DC from the global sequence, and the LAAC and LDC from the local sequence information. Based on the patterns of sequence formation, a nested structure of bidirectional long-short term memory neural network called local information module is used to extract context-based features around every residue. Another bilstm network layer called global information module is introduced above local information module layer to integrate context-based features of all residues in the same HLA-peptide binding chain, thereby involving inter-residue relationships in the training process. introduced. [Results]: Finally, a more effective model is obtained by fusing the above two modules and 4 features matric, the method performs significantly better than previous prediction schemes, whose overall r-square increased to 0.0125 and 0.1064 on train and increased to 0.0782 and 0.2926 on test da-tasets. The RMSE for our proposed models trained decreased to approxi-mately 0.0745 and 1.1034, respectively, and decreased to 0.6712 and 1.6506 on test dataset. [Conclusion]: Our work has been actively refining a machine-learning model to improve neoantigen identification and predictions with the determinants for Neoantigen identification. The final experimental results show that our method is more effective than existing methods for predicting peptide types, which can help laboratory researchers to identify the type of novel HLA-peptide binding. Keywords: machine learning; Cancer Cell Immunology; HLA-peptide binding Neoantigen Prediction; HLA; Data Visualization; Novel Neoanti-gen and TCR Pairing Discovery; Vector representation


2021 ◽  
Author(s):  
Jun Liu ◽  
Feng Deng ◽  
Geng Yuan ◽  
Xue Lin ◽  
Houbing Song ◽  
...  

Recently, the study on model interpretability has become a hot topic in deep learning research area. Especially in the field of medical imaging, the requirements for safety are extremely high; Moreover, it is very important for the model to be able to explain. However, the existing solutions for left ventricular segmentation by convolutional neural networks are black boxes; explainable CNNs remains a challenge; explainable deep learning models has always been a task often overlooked in the entire data science lifecycle by data scientists or deep learning engineers. Because of very limited medical imaging data, most solutions currently use transfer learning methods to transfer the model which used on large-scale benchmark data sets (such as ImageNet) to fine tune medical imaging models. Consequently, a large amount of useless parameters are generated, resulting in further barrier for the model to provide a convincing explanation. This paper presents a novel method to automatically segment the Left Ventricle in Cardiac MRI by explainable convolutional neural networks with optimized size and parameters by our enhanced Deep Learning GPU Training System. It is very suitable for deployment on mobile devices. We simplify deep learning tasks on DIGITS systems, monitoring performance, and displaying the heat map of each layer of the network with advanced visualizations in real time. Our experiment results demonstrated that the proposed method is feasible and efficient.


2018 ◽  
Author(s):  
Gokmen Altay

AbstractIn this study, we first present a Tensorflow based Deep Learning (DL) model that provides high performances in predicting the binding of peptides to major histocompatibility complex (MHC) class I protein. Second, we provide the necessary Python codes to run the model and also easily input large train and test peptide binding benchmark dataset. Third, we provide Snakemake based workflow that allows to run all the model and performance analysis over all the different test alleles at once in parallel over computer and clusters. We also provide comparison analysis of the performances of various models. Finally, in order to help attaining to the best possible DL model by a community effort, this work is intended to be a ready to modify base model and workflow for the global Deep Learning community with no domain knowledge in MHC-peptide binding problem and thus provides all the necessary reference code templates and benchmarking data sets for further developments on the presented model architecture. All the reproducible Python codes, Snakemake workflow and benchmark data sets and a tutorial are available online at https://github.com/altayg/Deep-Learning-MHCI.


Sign in / Sign up

Export Citation Format

Share Document