scholarly journals An ultra-sensitive T-cell receptor detection method for TCR-Seq and RNA-Seq data

2020 ◽  
Vol 36 (15) ◽  
pp. 4255-4262
Author(s):  
Si-Yi Chen ◽  
Chun-Jie Liu ◽  
Qiong Zhang ◽  
An-Yuan Guo

Abstract Motivation T-cell receptors (TCRs) function to recognize antigens and play vital roles in T-cell immunology. Surveying TCR repertoires by characterizing complementarity-determining region 3 (CDR3) is a key issue. Due to the high diversity of CDR3 and technological limitation, accurate characterization of CDR3 repertoires remains a great challenge. Results We propose a computational method named CATT for ultra-sensitive and precise TCR CDR3 sequences detection. CATT can be applied on TCR sequencing, RNA-Seq and single-cell TCR(RNA)-Seq data to characterize CDR3 repertoires. CATT integrated de Bruijn graph-based micro-assembly algorithm, data-driven error correction model and Bayesian inference algorithm, to self-adaptively and ultra-sensitively characterize CDR3 repertoires with high performance. Benchmark results of datasets from in silico and experimental data demonstrated that CATT showed superior recall and precision compared with existing tools, especially for data with short read length and small size and single-cell sequencing data. Thus, CATT will be a useful tool for TCR analysis in researches of cancer and immunology. Availability and implementation http://bioinfo.life.hust.edu.cn/CATT or https://github.com/GuoBioinfoLab/CATT. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 36 (18) ◽  
pp. 4817-4818 ◽  
Author(s):  
Gregor Sturm ◽  
Tamas Szabo ◽  
Georgios Fotakis ◽  
Marlene Haider ◽  
Dietmar Rieder ◽  
...  

Abstract Summary Advances in single-cell technologies have enabled the investigation of T-cell phenotypes and repertoires at unprecedented resolution and scale. Bioinformatic methods for the efficient analysis of these large-scale datasets are instrumental for advancing our understanding of adaptive immune responses. However, while well-established solutions are accessible for the processing of single-cell transcriptomes, no streamlined pipelines are available for the comprehensive characterization of T-cell receptors. Here, we propose single-cell immune repertoires in Python (Scirpy), a scalable Python toolkit that provides simplified access to the analysis and visualization of immune repertoires from single cells and seamless integration with transcriptomic data. Availability and implementation Scirpy source code and documentation are available at https://github.com/icbi-lab/scirpy. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Gregor Sturm ◽  
Tamas Szabo ◽  
Georgios Fotakis ◽  
Marlene Haider ◽  
Dietmar Rieder ◽  
...  

AbstractSummaryAdvances in single-cell technologies have enabled the investigation of T cell phenotypes and repertoires at unprecedented resolution and scale. Bioinformatic methods for the efficient analysis of these large-scale datasets are instrumental for advancing our understanding of adaptive immune responses in cancer, but also in infectious diseases like COVID-19. However, while well-established solutions are accessible for the processing of single-cell transcriptomes, no streamlined pipelines are available for the comprehensive characterization of T cell receptors. Here we propose Scirpy, a scalable Python toolkit that provides simplified access to the analysis and visualization of immune repertoires from single cells and seamless integration with transcriptomic data.Availability and implementationScirpy source code and documentation are available at https://github.com/icbi-lab/scirpy.


2020 ◽  
Vol 5 ◽  
pp. 42 ◽  
Author(s):  
Suet Ling Felce ◽  
Gillian Farnie ◽  
Michael L. Dustin ◽  
James H. Felce

Background: The leukaemia-derived Jurkat E6.1 cell line has been used as a model T cell in the study of many aspects of T cell biology, most notably activation in response to T cell receptor (TCR) engagement. Methods: We present whole-transcriptome RNA-Sequencing data for Jurkat E6.1 cells in the resting state and two hours post-activation via TCR and CD28. We compare early transcriptional responses in the presence and absence of the chemokines CXCL12 and CCL19, and perform a basic comparison between observed transcriptional responses in Jurkat E6.1 cells and those in primary human T cells using publicly deposited data. Results: Jurkat E6.1 cells have many of the hallmarks of standard T cell transcriptional responses to activation, but lack most of the depth of responses in primary cells. Conclusions: These data indicate that Jurkat E6.1 cells hence represent only a highly simplified model of early T cell transcriptional responses.


2019 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.


2020 ◽  
Vol 36 (10) ◽  
pp. 3115-3123 ◽  
Author(s):  
Teng Fei ◽  
Tianwei Yu

Abstract Motivation Batch effect is a frequent challenge in deep sequencing data analysis that can lead to misleading conclusions. Existing methods do not correct batch effects satisfactorily, especially with single-cell RNA sequencing (RNA-seq) data. Results We present scBatch, a numerical algorithm for batch-effect correction on bulk and single-cell RNA-seq data with emphasis on improving both clustering and gene differential expression analysis. scBatch is not restricted by assumptions on the mechanism of batch-effect generation. As shown in simulations and real data analyses, scBatch outperforms benchmark batch-effect correction methods. Availability and implementation The R package is available at github.com/tengfei-emory/scBatch. The code to generate results and figures in this article is available at github.com/tengfei-emory/scBatch-paper-scripts. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (18) ◽  
pp. 4810-4812
Author(s):  
Qingxi Meng ◽  
Idoia Ochoa ◽  
Mikel Hernaez

Abstract Motivation Sequencing data are often summarized at different annotation levels for further analysis, generally using the general feature format (GFF) or its descendants, gene transfer format (GTF) and GFF3. Existing utilities for accessing these files, like gffutils and gffread, do not focus on reducing the storage space, significantly increasing it in some cases. We propose GPress, a framework for querying GFF files in a compressed form. GPress can also incorporate and compress expression files from both bulk and single-cell RNA-Seq experiments, supporting simultaneous queries on both the GFF and expression files. In brief, GPress applies transformations to the data which are then compressed with the general lossless compressor BSC. To support queries, GPress compresses the data in blocks and creates several index tables for fast retrieval. Results We tested GPress on several GFF files of different organisms, and showed that it achieves on average a 61% reduction in size with respect to gzip (the current de facto compressor for GFF files) while being able to retrieve all annotations for a given identifier or a range of coordinates in a few seconds (when run in a common laptop). In contrast, gffutils provides faster retrieval but doubles the size of the GFF files. When additionally linking an expression file, we show that GPress can reduce its size by more than 68% when compared to gzip (for both bulk and single-cell RNA-Seq experiments), while still retrieving the information within seconds. Finally, applying BSC to the data streams generated by GPress instead of to the original file shows a size reduction of more than 44% on average. Availability and implementation GPress is freely available at https://github.com/qm2/gpress. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Federico Agostinis ◽  
Chiara Romualdi ◽  
Gabriele Sales ◽  
Davide Risso

Summary: We present NewWave, a scalable R/Bioconductor package for the dimensionality reduction and batch effect removal of single-cell RNA sequencing data. To achieve scalability, NewWave uses mini-batch optimization and can work with out-of-memory data, enabling users to analyze datasets with millions of cells. Availability and implementation: NewWave is implemented as an open-source R package available through the Bioconductor project at https://bioconductor.org/packages/NewWave/ Supplementary information: Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Davis J. McCarthy ◽  
Raghd Rostom ◽  
Yuanhua Huang ◽  
Daniel J. Kunz ◽  
Petr Danecek ◽  
...  

AbstractDecoding the clonal substructures of somatic tissues sheds light on cell growth, development and differentiation in health, ageing and disease. DNA-sequencing, either using bulk or using single-cell assays, has enabled the reconstruction of clonal trees from frequency and co-occurrence patterns of somatic variants. However, approaches to systematically characterize phenotypic and functional variations between individual clones are not established. Here we present cardelino (https://github.com/PMBio/cardelino), a computational method for inferring the clone of origin of individual cells that have been assayed using single-cell RNA-seq (scRNA-seq). After validating our model using simulations, we apply cardelino to matched scRNA-seq and exome sequencing data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a key role for cell division genes in non-neutral somatic evolution.Key findingsA novel approach for integrating DNA-seq and single-cell RNA-seq data to reconstruct clonal substructure for single-cell transcriptomes.Evidence for non-neutral evolution of clonal populations in human fibroblasts.Proliferation and cell cycle pathways are commonly distorted in mutated clonal populations.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258029
Author(s):  
Ying Yao ◽  
Łukasz Wyrozżemski ◽  
Knut E. A. Lundin ◽  
Geir Kjetil Sandve ◽  
Shuo-Wang Qiao

Gluten-specific CD4+ T cells drive the pathogenesis of celiac disease and circulating gluten-specific T cells can be identified by staining with HLA-DQ:gluten tetramers. In this first single-cell RNA-seq study of tetramer-sorted T cells from untreated celiac disease patients blood, we found that gluten-specific T cells showed distinct transcriptomic profiles consistent with activated effector memory T cells that shared features with Th1 and follicular helper T cells. Compared to non-specific cells, gluten-specific T cells showed differential expression of several genes involved in T-cell receptor signaling, translational processes, apoptosis, fatty acid transport, and redox potentials. Many of the gluten-specific T cells studied shared T-cell receptor with each other, indicating that circulating gluten-specific T cells belong to a limited number of clones. Moreover, the transcriptional profiles of cells that shared the same clonal origin were transcriptionally more similar compared with between clonally unrelated gluten-specific cells.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
John-William Sidhom ◽  
H. Benjamin Larman ◽  
Drew M. Pardoll ◽  
Alexander S. Baras

AbstractDeep learning algorithms have been utilized to achieve enhanced performance in pattern-recognition tasks. The ability to learn complex patterns in data has tremendous implications in immunogenomics. T-cell receptor (TCR) sequencing assesses the diversity of the adaptive immune system and allows for modeling its sequence determinants of antigenicity. We present DeepTCR, a suite of unsupervised and supervised deep learning methods able to model highly complex TCR sequencing data by learning a joint representation of a TCR by its CDR3 sequences and V/D/J gene usage. We demonstrate the utility of deep learning to provide an improved ‘featurization’ of the TCR across multiple human and murine datasets, including improved classification of antigen-specific TCRs and extraction of antigen-specific TCRs from noisy single-cell RNA-Seq and T-cell culture-based assays. Our results highlight the flexibility and capacity for deep neural networks to extract meaningful information from complex immunogenomic data for both descriptive and predictive purposes.


Sign in / Sign up

Export Citation Format

Share Document