scholarly journals ErrorX: automated error correction for immune repertoire sequencing datasets

2020 ◽  
Author(s):  
Alexander M Sevy

AbstractMotivationRecent advances in DNA sequencing technology have allowed deep profiling of B- and T-cell receptor sequences on an unprecedented scale. However, sequencing errors pose a significant challenge in expanding the scope of these experiments. Errors can arise both by PCR during library preparation and by miscalled bases on the sequencing instrument itself. These errors compromise the validity of biological conclusions drawn from the data.ResultsTo address these concerns I have developed ErrorX, a software for automated error correction of B- and T-cell receptor NGS datasets. ErrorX uses deep learning to automatically identify bases that have a high probability of being erroneous. In benchmark studies, ErrorX reduced the overall error rate of public datasets by up to 36% with a false positive rate of 0.05% or less. Since ErrorX is a pure bioinformatics approach, it can be directly applied to any existing antibody or T-cell receptor sequencing datasets to infer sites of probable error without any changes in library preparation.AvailabilityErrorX is free for non-commercial use, with both a command-line interface and GUI available for Mac, Linux, and Windows operating systems, and full documentation available. Pre-compiled binaries are available at https://endeavorbio.com/downloads/.

eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
William S DeWitt ◽  
Anajane Smith ◽  
Gary Schoch ◽  
John A Hansen ◽  
Frederick A Matsen ◽  
...  

The T cell receptor (TCR) repertoire encodes immune exposure history through the dynamic formation of immunological memory. Statistical analysis of repertoire sequencing data has the potential to decode disease associations from large cohorts with measured phenotypes. However, the repertoire perturbation induced by a given immunological challenge is conditioned on genetic background via major histocompatibility complex (MHC) polymorphism. We explore associations between MHC alleles, immune exposures, and shared TCRs in a large human cohort. Using a previously published repertoire sequencing dataset augmented with high-resolution MHC genotyping, our analysis reveals rich structure: striking imprints of common pathogens, clusters of co-occurring TCRs that may represent markers of shared immune exposures, and substantial variations in TCR-MHC association strength across MHC loci. Guided by atomic contacts in solved TCR:peptide-MHC structures, we identify sequence covariation between TCR and MHC. These insights and our analysis framework lay the groundwork for further explorations into TCR diversity.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dawit A. Yohannes ◽  
Katri Kaukinen ◽  
Kalle Kurppa ◽  
Päivi Saavalainen ◽  
Dario Greco

Abstract Background Deep immune receptor sequencing, RepSeq, provides unprecedented opportunities for identifying and studying condition-associated T-cell clonotypes, represented by T-cell receptor (TCR) CDR3 sequences. However, due to the immense diversity of the immune repertoire, identification of condition relevant TCR CDR3s from total repertoires has mostly been limited to either “public” CDR3 sequences or to comparisons of CDR3 frequencies observed in a single individual. A methodology for the identification of condition-associated TCR CDR3s by direct population level comparison of RepSeq samples is currently lacking. Results We present a method for direct population level comparison of RepSeq samples using immune repertoire sub-units (or sub-repertoires) that are shared across individuals. The method first performs unsupervised clustering of CDR3s within each sample. It then finds matching clusters across samples, called immune sub-repertoires, and performs statistical differential abundance testing at the level of the identified sub-repertoires. It finally ranks CDR3s in differentially abundant sub-repertoires for relevance to the condition. We applied the method on total TCR CDR3β RepSeq datasets of celiac disease patients, as well as on public datasets of yellow fever vaccination. The method successfully identified celiac disease associated CDR3β sequences, as evidenced by considerable agreement of TRBV-gene and positional amino acid usage patterns in the detected CDR3β sequences with previously known CDR3βs specific to gluten in celiac disease. It also successfully recovered significantly high numbers of previously known CDR3β sequences relevant to each condition than would be expected by chance. Conclusion We conclude that immune sub-repertoires of similar immuno-genomic features shared across unrelated individuals can serve as viable units of immune repertoire comparison, serving as proxy for identification of condition-associated CDR3s.


2017 ◽  
Vol 19 (suppl_6) ◽  
pp. vi186-vi186
Author(s):  
Oleg Yegorov ◽  
Yanina Yegorova ◽  
Anjelika Dechkovskaia ◽  
Jianping Huang ◽  
Sridharan Gururangan ◽  
...  

2018 ◽  
Author(s):  
Janelle M. Montagne ◽  
Xuwen Alice Zheng ◽  
Iago Pinal-Fernandez ◽  
Jose C. Milisenda ◽  
Lisa Christopher-Stine ◽  
...  

Abstract:T cell receptor (TCR) repertoire sequencing is increasingly employed to characterize adaptive immune responses. However, current TCR sequencing methodologies are complex and expensive, limiting the scale of feasible studies. Here we present Framework Region 3 AmplifiKation sequencing (FR3AK-seq), a simplified multiplex PCR-based approach for the ultra-efficient analysis of TCR complementarity determining region 3 (CDR3) repertoires. By using minimal primer sets targeting a conserved region adjacent to CDR3, undistorted amplicons are analyzed via short read, single-end sequencing. We find that FR3AK-seq is sensitive and quantitative, performing comparably to two industry standards. FR3AK-seq was utilized to quickly and inexpensively characterize the T cell infiltrates of muscle biopsies obtained from 145 patients with idiopathic inflammatory myopathies and controls. A cluster of related TCRs was identified in samples from patients with sporadic inclusion body myositis, suggesting the presence of a shared antigen-driven response. The ease and minimal cost of FR3AK-seq removes critical barriers to routine, large-scale TCR CDR3 repertoire analyses.


2019 ◽  
Vol 48 (D1) ◽  
pp. D1057-D1062 ◽  
Author(s):  
Dmitry V Bagaev ◽  
Renske M A Vroomans ◽  
Jerome Samir ◽  
Ulrik Stervbo ◽  
Cristina Rius ◽  
...  

Abstract Here, we report an update of the VDJdb database with a substantial increase in the number of T-cell receptor (TCR) sequences and their cognate antigens. The update further provides a new database infrastructure featuring two additional analysis modes that facilitate database querying and real-world data analysis. The increased yield of TCR specificity identification methods and the overall increase in the number of studies in the field has allowed us to expand the database more than 5-fold. Furthermore, several new analysis methods are included. For example, batch annotation of TCR repertoire sequencing samples allows for annotating large datasets on-line. Using recently developed bioinformatic methods for TCR motif mining, we have built a reduced set of high-quality TCR motifs that can be used for both training TCR specificity predictors and matching against TCRs of interest. These additions enhance the versatility of the VDJdb in the task of exploring T-cell antigen specificities. The database is available at https://vdjdb.cdr3.net.


2020 ◽  
Vol 66 (9) ◽  
pp. 1228-1237
Author(s):  
Gustav Johansson ◽  
Melita Kaltak ◽  
Cristiana Rîmniceanu ◽  
Avadhesh K Singh ◽  
Jan Lycke ◽  
...  

Abstract Background Immune repertoire sequencing of the T-cell receptor can identify clonotypes that have expanded as a result of antigen recognition or hematological malignancies. However, current sequencing protocols display limitations with nonuniform amplification and polymerase-induced errors during sequencing. Here, we developed a sequencing method that overcame these issues and applied it to γδ T cells, a cell type that plays a unique role in immunity, autoimmunity, homeostasis of intestine, skin, adipose tissue, and cancer biology. Methods The ultrasensitive immune repertoire sequencing method used PCR-introduced unique molecular identifiers. We constructed a 32-panel assay that captured the full diversity of the recombined T-cell receptor delta loci in γδ T cells. The protocol was validated on synthetic reference molecules and blood samples of healthy individuals. Results The 32-panel assay displayed wide dynamic range, high reproducibility, and analytical sensitivity with single-nucleotide resolution. The method corrected for sequencing-depended quantification bias and polymerase-induced errors and could be applied to both enriched and nonenriched cells. Healthy donors displayed oligoclonal expansion of γδ T cells and similar frequencies of clonotypes were detected in both enrichment and nonenriched samples. Conclusions Ultrasensitive immune repertoire sequencing strategy enables quantification of individual and specific clonotypes in a background that can be applied to clinical as well as basic application areas. Our approach is simple, flexible, and can easily be implemented in any molecular laboratory.


2019 ◽  
Vol 0 (0) ◽  
pp. 0 ◽  
Author(s):  
TimothyJ Looney ◽  
DzifaY Duose ◽  
Geoffrey Lowman ◽  
Elizabeth Linch ◽  
Joud Hajjar ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document