scholarly journals On generative models of T-cell receptor sequences

2019 ◽  
Author(s):  
Giulio Isacchini ◽  
Zachary Sethna ◽  
Yuval Elhanati ◽  
Armita Nourmohammad ◽  
Aleksandra M. Walczak ◽  
...  

T-cell receptors (TCR) are key proteins of the adaptive immune system, generated randomly in each individual, whose diversity underlies our ability to recognize infections and malignancies. Modeling the distribution of TCR sequences is of key importance for immunology and medical applications. Here, we compare two inference methods trained on high-throughput sequencing data: a knowledge-guided approach, which accounts for the details of sequence generation, supplemented by a physics-inspired model of selection; and a knowledge-free Variational Auto-Encoder based on deep artificial neural networks. We show that the knowledge-guided model outperforms the deep network approach at predicting TCR probabilities, while being more interpretable, at a lower computational cost.

2020 ◽  
Vol 2 (3) ◽  
Author(s):  
Joshua D Podlevsky ◽  
Corey M Hudson ◽  
Jerilyn A Timlin ◽  
Kelly P Williams

Abstract CRISPR arrays and CRISPR-associated (Cas) proteins comprise a widespread adaptive immune system in bacteria and archaea. These systems function as a defense against exogenous parasitic mobile genetic elements that include bacteriophages, plasmids and foreign nucleic acids. With the continuous spread of antibiotic resistance, knowledge of pathogen susceptibility to bacteriophage therapy is becoming more critical. Additionally, gene-editing applications would benefit from the discovery of new cas genes with favorable properties. While next-generation sequencing has produced staggering quantities of data, transitioning from raw sequencing reads to the identification of CRISPR/Cas systems has remained challenging. This is especially true for metagenomic data, which has the highest potential for identifying novel cas genes. We report a comprehensive computational pipeline, CasCollect, for the targeted assembly and annotation of cas genes and CRISPR arrays—even isolated arrays—from raw sequencing reads. Benchmarking our targeted assembly pipeline demonstrates significantly improved timing by almost two orders of magnitude compared with conventional assembly and annotation, while retaining the ability to detect CRISPR arrays and cas genes. CasCollect is a highly versatile pipeline and can be used for targeted assembly of any specialty gene set, reconfigurable for user provided Hidden Markov Models and/or reference nucleotide sequences.


2016 ◽  
Vol 32 (20) ◽  
pp. 3098-3106 ◽  
Author(s):  
Bram Gerritsen ◽  
Aridaman Pandit ◽  
Arno C. Andeweg ◽  
Rob J. de Boer

2016 ◽  
Author(s):  
Thierry Mora ◽  
Aleksandra M. Walczak

To recognize pathogens, B and T lymphocytes are endowed with a wide repertoire of receptors generated stochastically by V(D)J recombination. Measuring and estimating the diversity of these receptors is of great importance for understanding adaptive immunity. In this chapter we review recent modeling approaches for analyzing receptor diversity from high-throughput sequencing data. We first clarify the various existing notions of diversity, with its many competing mathematical indices, and the different biological levels at which it can be evaluated. We then describe inference methods for characterizing the statistical diversity of receptors at different stages of their history: generation, selection and somatic evolution. We discuss the intrinsic difficulty of estimating the diversity of receptors realized in a given individual from incomplete samples. Finally, we emphasize the limitations of diversity defined at the level of receptor sequences, and advocate the more relevant notion of functional diversity relative to the set of recognized antigens.


Author(s):  
Pieter Meysman ◽  
Anna Postovskaya ◽  
Nicolas De Neuter ◽  
Benson Ogunjimi ◽  
Kris Laukens

Much is still not understood about the human adaptive immune response to SARS-CoV-2, the causative agent of COVID-19. In this paper, we demonstrate the use of machine learning to classify SARS-CoV-2 epitope specific T-cell clonotypes in T-cell receptor (TCR) sequencing data. We apply these models to public TCR data and show how they can be used to study T-cell longitudinal profiles in COVID-19 patients to characterize how the adaptive immune system reacts to the SARS-CoV-2 virus. Our findings confirm prior knowledge that SARS-CoV-2 reactive T-cell diversity increases over the course of disease progression. However our results show a difference between those T cells that react to epitope unique to SARS-CoV-2, which show a more prominent increase, and those T cells that react to epitopes common to other coronaviruses, which begin at a higher baseline.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
John-William Sidhom ◽  
H. Benjamin Larman ◽  
Drew M. Pardoll ◽  
Alexander S. Baras

AbstractDeep learning algorithms have been utilized to achieve enhanced performance in pattern-recognition tasks. The ability to learn complex patterns in data has tremendous implications in immunogenomics. T-cell receptor (TCR) sequencing assesses the diversity of the adaptive immune system and allows for modeling its sequence determinants of antigenicity. We present DeepTCR, a suite of unsupervised and supervised deep learning methods able to model highly complex TCR sequencing data by learning a joint representation of a TCR by its CDR3 sequences and V/D/J gene usage. We demonstrate the utility of deep learning to provide an improved ‘featurization’ of the TCR across multiple human and murine datasets, including improved classification of antigen-specific TCRs and extraction of antigen-specific TCRs from noisy single-cell RNA-Seq and T-cell culture-based assays. Our results highlight the flexibility and capacity for deep neural networks to extract meaningful information from complex immunogenomic data for both descriptive and predictive purposes.


2018 ◽  
Author(s):  
John-William Sidhom ◽  
H. Benjamin Larman ◽  
Petra Ross-MacDonald ◽  
Megan Wind-Rotolo ◽  
Drew M. Pardoll ◽  
...  

Deep learning algorithms have been utilized to achieve enhanced performance in pattern-recognition tasks, such as in image and vocal recognition 1,2. The ability to learn complex patterns in data has tremendous implications in the genomics and immunology worlds, where sequence motifs become learned ‘features’ that can be used to predict functionality, guiding our understanding of disease and basic biology 3–6. T-cell receptor (TCR) sequencing assesses the diversity of the adaptive immune system, where complex structural patterns in the TCR can be used to model its antigenic interaction. We present DeepTCR, a broad collection of unsupervised and supervised deep learning methods able to uncover structure in highly complex and large TCR sequencing data by learning a joint representation of a given TCR by its CDR3 sequences, V/D/J gene usage, and HLA background in which the T-cells reside. We demonstrate the utility of deep learning to provide an improved ‘featurization’ of the TCR across multiple human and murine datasets, including improved classification of antigen-specific TCR’s in both unsupervised and supervised learning tasks, understanding immunotherapy-related shaping of repertoire in the murine setting, and predicting response to checkpoint blockade immunotherapy from pre-treatment tumor biopsies in a clinical trial of melanoma. Our results show the flexibility and capacity for deep neural networks to handle the complexity of high-dimensional TCR genomic data for both descriptive and predictive purposes across basic science and clinical research.


2021 ◽  
Vol 12 ◽  
Author(s):  
William D. Chronister ◽  
Austin Crinklaw ◽  
Swapnil Mahajan ◽  
Randi Vita ◽  
Zeynep Koşaloğlu-Yalçın ◽  
...  

The adaptive immune system in vertebrates has evolved to recognize non-self antigens, such as proteins expressed by infectious agents and mutated cancer cells. T cells play an important role in antigen recognition by expressing a diverse repertoire of antigen-specific receptors, which bind epitopes to mount targeted immune responses. Recent advances in high-throughput sequencing have enabled the routine generation of T-cell receptor (TCR) repertoire data. Identifying the specific epitopes targeted by different TCRs in these data would be valuable. To accomplish that, we took advantage of the ever-increasing number of TCRs with known epitope specificity curated in the Immune Epitope Database (IEDB) since 2004. We compared seven metrics of sequence similarity to determine their power to predict if two TCRs have the same epitope specificity. We found that a comprehensive k-mer matching approach produced the best results, which we have implemented into TCRMatch, an openly accessible tool (http://tools.iedb.org/tcrmatch/) that takes TCR β-chain CDR3 sequences as an input, identifies TCRs with a match in the IEDB, and reports the specificity of each match. We anticipate that this tool will provide new insights into T cell responses captured in receptor repertoire and single cell sequencing experiments and will facilitate the development of new strategies for monitoring and treatment of infectious, allergic, and autoimmune diseases, as well as cancer.


2020 ◽  
Author(s):  
William D Chronister ◽  
Austin Crinklaw ◽  
Swapnil Mahajan ◽  
Randi Vita ◽  
Zeynep Kosaloglu-Yalcin ◽  
...  

The adaptive immune system in vertebrates has evolved to recognize non-self-antigens, such as proteins expressed by infectious agents and mutated cancer cells. T cells play an important role in antigen recognition by expressing a diverse repertoire of antigen-specific receptors, which bind epitopes to mount targeted immune responses. Recent advances in high-throughput sequencing have enabled the routine generation of T-cell receptor (TCR) repertoire data. Identifying the specific epitopes targeted by different TCRs in these data would be valuable. To accomplish that, we took advantage of the ever-increasing number of TCRs with known epitope specificity curated in the Immune Epitope Database (IEDB) since 2004. We compared six metrics of sequence similarity to determine their power to predict if two TCRs have the same epitope specificity. We found that a comprehensive k-mer matching approach produced the best results, which we have implemented into TCRMatch, an openly accessible tool (http://tools.iedb.org/tcrmatch/) that takes TCR β-chain CDR3 sequences as an input, identifies TCRs with a match in the IEDB, and reports the specificity of each match. We anticipate that this tool will provide new insights into T cell responses captured in receptor repertoire and single cell sequencing experiments and will facilitate the development of new strategies for monitoring and treatment of infectious, allergic, and autoimmune diseases, as well as cancer.


Sign in / Sign up

Export Citation Format

Share Document