scholarly journals Echidna: integrated simulations of single-cell immune receptor repertoires and transcriptomes

2021 ◽  
Author(s):  
Jiami Han ◽  
Raphael Kuhn ◽  
Chrysa Papadopoulou ◽  
Andreas Agrafiotis ◽  
Victor Kreiner ◽  
...  

Single-cell sequencing now enables the recovery of full-length immune repertoires [B cell receptor (BCR) and T cell receptor (TCR) repertoires], in addition to gene expression information. The feature-rich datasets produced from such experiments require extensive and diverse computational analyses, each of which can significantly influence the downstream immunological interpretations, such as clonal selection and expansion. Simulations produce validated standard datasets, where the underlying generative model can be precisely defined and furthermore perturbed to investigate specific questions of interest. Currently, there is no tool that can be used to simulate a comprehensive ground truth single-cell dataset that incorporates both immune receptor repertoires and gene expression. Therefore, we developed Echidna, an R package that simulates immune receptors and transcriptomes at single-cell resolution. Our simulation tool generates annotated single-cell sequencing data with user-tunable parameters controlling a wide range of features such as clonal expansion, germline gene usage, somatic hypermutation, and transcriptional phenotypes. Echidna can additionally simulate time-resolved B cell evolution, producing mutational networks with complex selection histories incorporating class-switching and B cell subtype information. Finally, we demonstrate the benchmarking potential of Echidna by simulating clonal lineages and comparing the known simulated networks with those inferred from only the BCR sequences as input. Together, Echidna provides a framework that can incorporate experimental data to simulate single-cell immune repertoires to aid software development and bioinformatic benchmarking of clonotyping, phylogenetics, transcriptomics and machine learning strategies.

2019 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.


2020 ◽  
Author(s):  
Shreya Johri ◽  
Deepali Jain ◽  
Ishaan Gupta

AbstractBesides severe respiratory distress, recent reports in Covid-19 patients have found a strong association between platelet counts and patient survival. Along with hemodynamic changes such as prolonged clotting time, high fibrin degradation products and D-dimers, increased levels of monocytes with disturbed morphology have also been identified. In this study, through an integrated analysis of bulk RNA-sequencing data from Covid-19 patients with data from single-cell sequencing studies on lung tissues, we found that most of the cell-types that contributed to the altered gene expression were of hematopoietic origin. We also found that differentially expressed genes in Covid-19 patients formed a significant pool of the expressing genes in phagocytic cells such as Monocytes and platelets. Interestingly, while we observed a general enrichment for Monocytes in Covid-19 patients, we found that the signal for FCGRA3+ Monocytes was depleted. Further, we found evidence that age-associated gene expression changes in Monocytes and platelets, associated with inflammation, mirror gene expression changes in Covid-19 patients suggesting that pro-inflammatory signalling during aging may worsen the infection in older patients. We identified more than 20 genes that change in the same direction between Covid-19 infection and aging cells that may act as potential therapeutic targets. Of particular interest were IL2RG, GNLY and GMZA expressed in platelets, which facilitates cytokine signalling in Monocytes through an interaction with platelets. To understand whether infection can directly manipulate the biology of Monocytes and platelets, we hypothesize that these non-ACE2 expressing cells may be infected by the virus through the phagocytic route. We observed that phagocytic cells such as Monocytes, T-cells, and platelets have a significantly higher expression of genes that are a part of the Covid-19 viral interactome. Hence these cell-types may have an active rather than a reactive role in viral pathogenesis to manifest clinical symptoms such as coagulopathy. Therefore, our results present molecular evidence for pursuing both anti-inflammatory and anticoagulation therapy for better patient management especially in older patients.


2018 ◽  
Author(s):  
Yue Hu ◽  
Xuegong Zhang

With the development of single-cell sequencing technologies, parallel sequencing the transcriptome and genome is becoming available and will bring us the opportunity to uncover association between genotype and phenotype at single-cell level. Due to the special characteristics of single-cell sequencing data, new method is needed to identify eQTL from single-cell data. We developed an R package SCeQTL that uses zero-inflated negative binomial regression to do eQTL analysis on single-cell data. It can distinguish two type of gene-expression differences among different genotype groups. It can also be used for finding gene expression variations associated with other grouping factors like cell lineages or cell types.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Boying Gong ◽  
Yun Zhou ◽  
Elizabeth Purdom

AbstractA growing number of single-cell sequencing platforms enable joint profiling of multiple omics from the same cells. We present , a novel method that not only allows for analyzing the data from joint-modality platforms, but provides a coherent framework for the integration of multiple datasets measured on different modalities. We demonstrate its performance on multi-modality data of gene expression and chromatin accessibility and illustrate the integration abilities of by jointly analyzing this multi-modality data with single-cell RNA-seq and ATAC-seq datasets.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Shadi Darvish Shafighi ◽  
Szymon M. Kiełbasa ◽  
Julieta Sepúlveda-Yáñez ◽  
Ramin Monajemi ◽  
Davy Cats ◽  
...  

Abstract Background Drawing genotype-to-phenotype maps in tumors is of paramount importance for understanding tumor heterogeneity. Assignment of single cells to their tumor clones of origin can be approached by matching the genotypes of the clones to the mutations found in RNA sequencing of the cells. The confidence of the cell-to-clone mapping can be increased by accounting for additional measurements. Follicular lymphoma, a malignancy of mature B cells that continuously acquire mutations in parallel in the exome and in B cell receptor loci, presents a unique opportunity to join exome-derived mutations with B cell receptor sequences as independent sources of evidence for clonal evolution. Methods Here, we propose CACTUS, a probabilistic model that leverages the information from an independent genomic clustering of cells and exploits the scarce single cell RNA sequencing data to map single cells to given imperfect genotypes of tumor clones. Results We apply CACTUS to two follicular lymphoma patient samples, integrating three measurements: whole exome, single-cell RNA, and B cell receptor sequencing. CACTUS outperforms a predecessor model by confidently assigning cells and B cell receptor-based clusters to the tumor clones. Conclusions The integration of independent measurements increases model certainty and is the key to improving model performance in the challenging task of charting the genotype-to-phenotype maps in tumors. CACTUS opens the avenue to study the functional implications of tumor heterogeneity, and origins of resistance to targeted therapies. CACTUS is written in R and source code, along with all supporting files, are available on GitHub (https://github.com/LUMC/CACTUS).


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 47
Author(s):  
Nicholas Borcherding ◽  
Nicholas L. Bormann

Single-cell sequencing is an emerging technology in the field of immunology and oncology that allows researchers to couple RNA quantification and other modalities, like immune cell receptor profiling at the level of an individual cell. A number of workflows and software packages have been created to process and analyze single-cell transcriptomic data. These packages allow users to take the vast dimensionality of the data generated in single-cell-based experiments and distill the data into novel insights. Unlike the transcriptomic field, there is a lack of options for software that allow for single-cell immune receptor profiling. Enabling users to easily combine mRNA and immune profiling, scRepertoire was built to process data derived from 10x Genomics Chromium Immune Profiling for both T-cell receptor (TCR) and immunoglobulin (Ig) enrichment workflows and subsequently interacts with the popular Seurat R package. The scRepertoire R package and processed data are open source and available on GitHub and provides in-depth tutorials on the capability of the package.


Author(s):  
Roger Volden ◽  
Christopher Vollmers

AbstractSingle cell transcriptome analysis elucidates facets of cell biology that have been previously out of reach. However, the high-throughput analysis of thousands of single cell transcriptomes has been limited by sample preparation and sequencing technology. High-throughput single cell analysis today is facilitated by protocols like the 10X Genomics platform or Drop-Seq which generate cDNA pools in which the origin of a transcript is encoded at its 5’ or 3’ end. These cDNA pools are currently analyzed by short read Illumina sequencing which can identify the cellular origin of a transcript and what gene it was transcribed from. However, these methods fail to retrieve isoform information. In principle, cDNA pools prepared using these approaches can be analyzed with Pacific Biosciences and Oxford Nanopore long-read sequencers to retrieve isoform information but all current implementations rely heavily on Illumina short-reads for the analysis in addition to long reads. Here, we used R2C2 to sequence and demultiplex 9 million full-length cDNA molecules generated by the 10X Chromium platform from ∼3000 peripheral blood mononuclear cells (PBMCs). We used these reads to – independent from Illumina data – cluster cells into B cells, T cells, and Monocytes and generate isoform-level transcriptomes for these cell-types. We also generated isoform-level transcriptomes for all single cells and used this information to identify a wide range of isoform diversity between genes. Finally, we also designed a computational workflow to extract paired adaptive immune receptor – T cell receptor and B cell receptor (TCR and BCR) –sequences unique to each T and B cell. This work represents a new, simple, and powerful approach that –using a single sequencing method – can extract an unprecedented amount of information from thousands of single cells.


2021 ◽  
Author(s):  
Boying Gong ◽  
Yun Zhou ◽  
Elizabeth Purdom

AbstractSingle-cell measurements of different cellular features or modalities from cells from the same system allow for a comprehensive understanding of a biological process. While the most common single-cell sequencing technologies require separate input cells for different modalities, there are a growing number of platforms that allow for measuring several modalities on a single cell. We present a novel method, Cobolt, for analyzing such multi-modality single-cell sequencing datasets. Cobolt jointly models the multiple modalities via a novel application of Multimodal Variational Autoencoder (MVAE) to a hierarchical generative model. We first demonstrate its performance on data from the multi-modality platform SNARE-seq, consisting of measurements of gene expression and chromatin accessibility on the same cells. We then illustrate the ability of Cobolt to integrate multi-modality platforms with single-modality platforms by jointly analyzing a SNARE-seq dataset, a single-cell gene expression dataset, and a single-cell chromatin accessibility dataset. We compared Cobolt with current options for analyzing such datasets and show that Cobolt provides robust and flexible results for integration of single-cell data on multiple modalities.


Author(s):  
C Ganier ◽  
X Du-Harpur ◽  
N Harun ◽  
B Wan ◽  
C Arthurs ◽  
...  

ABSTRACTCoronavirus disease 2019 (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and is associated with a wide range of systemic manifestations. Several observations support a role for vascular endothelial dysfunction in the pathogenesis including an increased incidence of thrombotic events and coagulopathy and the presence of vascular risk factors as an independent predictor of poor prognosis. It has recently been reported that endothelitis is associated with viral inclusion bodies suggesting a direct role for SARS-CoV-2 in the pathogenesis. The ACE2 receptor has been shown to mediate SARS-CoV-2 uptake and it has been proposed that CD147 (BSG) can function as an alternative cell surface receptor. To define the endothelial cell populations that are susceptible to infection with SARS-CoV-2, we investigated the expression of ACE2 as well as other genes implicated in the cellular entry of SARS-Cov-2 in the vascular endothelium through the analysis of single cell sequencing data derived from multiple human tissues (skin, liver, kidney, lung and intestine). We found that CD147 (BSG) but not ACE2 is detectable in vascular endothelial cells within single cell sequencing datasets derived from multiple tissues in healthy individuals. This implies that either ACE2 is not expressed in healthy tissue but is instead induced in response to SARS-Cov2 or that SARS-Cov2 enters endothelial cells via an alternative receptor such as CD147.


2021 ◽  
Author(s):  
Kevin Verstaen ◽  
Ines Lammens ◽  
Jana Roels ◽  
Yvan Saeys ◽  
Bart N Lambrecht ◽  
...  

Single-cell RNA sequencing is instrumental to unravel the cellular and transcriptomic heterogeneity of T and B cells in health and disease. Recent technological advances add additional layers of information allowing researchers to simultaneously explore the transcriptomic, surface protein and immune receptor diversity during adaptive immune responses. The increasing data complexicity poses a burden on the workload for bioinformaticians, who are often not familiar with the specificities and biology of immune receptor profiling. The wet-lab modalities and sequencing capabilities currently have outpaced bioinformatics solutions, which forms an ever-increasing barrier for many biologists to analyze their datasets. Here, we present DALI (Diversity AnaLysis Interface), a software package to identify and analyze T cell and B cell receptor diversity in high-throughput single-cell sequencing data. DALI aims to support bioinformaticians with a functional toolbox, allowing seamless integration of multimodel scRNAseq and immune receptor profiling data. The R-based package builds further on workflows using the Seurat package and other existing tools for BCR/TCR analyses. In addition, DALI is designed to engage immunologists having limited coding experience with their data, using a browser-based interactive graphical user interface. The implementation of DALI can effectively lead to a two-way communication between wet-lab scientists and bioinformaticians to advance the analysis of complex datasets.


Sign in / Sign up

Export Citation Format

Share Document