scholarly journals Normalizing and denoising protein expression data from droplet-based single cell profiling

Author(s):  
Matthew P. Mulè ◽  
Andrew J. Martins ◽  
John S. Tsang

AbstractRecent methods enable simultaneous measurement of protein expression with the transcriptome in single cells by combining protein labeling with DNA barcoded antibodies followed by droplet based single cell capture and sequencing (e.g. CITE-seq). While data normalization and denoising have received considerable attention for single cell RNA-seq data, such methods for protein data have been less explored. Here we showed that a major source of noise in CITE-seq data originated from unbound antibody encapsulated in droplets. We also found that the counts of isotype controls and those of the “negative” population inferred from all protein counts of each cell are significantly correlated, suggesting that their covariation likely reflects cell-to-cell differences due to technical factors such as non-specific antibody binding and droplet-to-droplet differences in capture efficiency of the DNA tags. Motivated by these observations, we developed a normalization method for CITE-seq protein expression data called Denoised and Scaled by Background (DSB). DSB corrects for 1) protein-specific background noise as reflected by empty droplets, 2) the technical cell-to-cell variation as captured by the latent noise component described above. DSB normalization improves separation between positive and negative populations for each protein, centers the negative-staining population around zero, and can improve unbiased protein expression-based clustering. DSB is available through the open source R package “DSB” via a single function call and can be readily integrated with existing single cell analysis workflows, including those in Bioconductor and Seurat.

2021 ◽  
Author(s):  
Zachary J. DeBruine ◽  
Karsten Melcher ◽  
Timothy J. Triche

AbstractNon-negative matrix factorization (NMF) is an intuitively appealing method to extract additive combinations of measurements from noisy or complex data. NMF is applied broadly to text and image processing, time-series analysis, and genomics, where recent technological advances permit sequencing experiments to measure the representation of tens of thousands of features in millions of single cells. In these experiments, a count of zero for a given feature in a given cell may indicate either the absence of that feature or an insufficient read coverage to detect that feature (“dropout”). In contrast to spectral decompositions such as the Singular Value Decomposition (SVD), the strictly positive imputation of signal by NMF is an ideal fit for single-cell data with ambiguous zeroes. Nevertheless, most single-cell analysis pipelines apply SVD or Principal Component Analysis (PCA) on transformed counts because these implementations are fast while current NMF implementations are slow. To address this need, we present an accessible NMF implementation that is much faster than PCA and rivals the runtimes of state-of-the-art SVD. NMF models learned with our implementation from raw count matrices yield intuitive summaries of complex biological processes, capturing coordinated gene activity and enrichment of sample metadata. Our NMF implementation, available in the RcppML (Rcpp Machine Learning library) R package, improves upon current NMF implementations by introducing a scaling diagonal to enable convex L1 regularization for feature engineering, reproducible factor scalings, and symmetric factorizations. RcppML NMF easily handles sparse datasets with millions of samples, making NMF an attractive replacement for PCA in the analysis of single-cell experiments.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e20013-e20013
Author(s):  
Sanjay deMel ◽  
Jonathan Scolnick ◽  
Xiaojing Huo ◽  
Cinnie Soekojo ◽  
Fangfang Song ◽  
...  

e20013 Background: Multiple Myeloma (MM) is an incurable plasma cell (PC) malignancy and high risk (HR) MM remains an unmet clinical need. Translocation 4;14 occurs in 15% of MM and is associated with an adverse prognosis. A deeper understanding of the biology and immune micro-environment of t(4;14) MM is necessary for the development of effective targeted therapies. Single Cell multi-omics provides a new tool for phenotypic characterization of MM. Here we used Proteona’s ESCAPE™ single cell multi-omics platform to study a cohort of patients with t(4;14) MM. Methods: Diagnostic bone marrow (BM) samples from 14 patients with t(4;14) MM were analysed using the ESCAPE platform from Proteona which simultaneously measures gene and cell surface protein expression in single cells. Cryopreserved BM samples were stained with 65 DNA barcoded antibodies and subsequently sorted on CD138 expression. The CD138 positive and negative fractions were recombined at a known ratio for analysis using the 10x Genomics 3’ RNAseq kit. Resulting data were analyzed with Seurat and MapCell. Results: The patients had a median age of 63 years. All received novel agent based induction. Median progression free and overall survival (PFS and OS) were 22 and 34 months respectively. MMSET was overexpressed in all PCs while FGFR3 expression could be categorized into zero cells expressing FGFR3, low expression (< 10% of cells expressing FGFR3) or high expression (> 80% of cells expressing FGFR3). We also found heterogeneity in the expression of cancer testis antigens (CTA) such as FA133A and CTAG2 between PC clusters across samples. Variation in the immune microenvironment of the BM was seen across all patient samples with no correlation between cell types and PFS or OS. However, an analysis of BM samples at diagnosis and relapse in one patient showed a shift in the ratio of T cells to CD14 monocytes with a ratio of 5.7 at diagnosis compared to 0.6 at relapse. Further analysis of PCs in this patient found 8 PC populations, each containing variable numbers of cells from both the diagnostic and relapse samples. This suggests that all populations present at relapse were also present at diagnosis, although at variable proportions. Increased expression of RCAN3 (associated with cereblon depletion) was detected at relapse. Conclusions: We present the first application of single cell multi-omics immune profiling in high risk MM. The heterogeneity in expression of CTA has implications for the application of immunotherapies, while the upregulation of RCAN3 may explain failure of immunomodulatory therapy. Our small sample size may explain the lack of correlation between gene or protein expression with clinical outcomes. We propose that t(4;14) MM is a genomically and immunologically heterogeneous disease. Single cell analysis of larger cohorts is required to build on our findings.


Micromachines ◽  
2019 ◽  
Vol 10 (6) ◽  
pp. 409 ◽  
Author(s):  
Bing Deng ◽  
Heyi Wang ◽  
Zhaoyi Tan ◽  
Yi Quan

The single-cell capture microfluidic chip has many advantages, including low cost, high throughput, easy manufacturing, integration, non-toxicity and good stability. Because of these characteristics, the cell capture microfluidic chip is increasingly becoming an important carrier on the study of life science and pharmaceutical analysis. Important promises of single-cell analysis are the paring, fusion, disruption and analysis of intracellular components for capturing a single cell. The capture, which is based on the fluid dynamics method in the field of micro fluidic chips is an important way to achieve and realize the operations mentioned above. The aim of this study was to compare the ability of three fluid dynamics-based microfluidic chip structures to capture cells. The effects of cell growth and distribution after being captured by different structural chips and the subsequent observation and analysis of single cells on the chip were compared. It can be seen from the experimental results that the microfluidic chip structure most suitable for single-cell capture is a U-shaped structure. It enables single-cell capture as well as long-term continuous culture and the single-cell observation of captured cells. Compared to the U-shaped structure, the cells captured by the microcavity structure easily overlapped during the culture process and affected the subsequent analysis of single cells. The flow shortcut structure can also be used to capture and observe single cells, however, the shearing force of the fluid caused by the chip structure is likely to cause deformation of the cultured cells. By comparing the cell capture efficiency of the three chips, the reagent loss during the culture process and the cell growth state of the captured cells, we are provided with a theoretical support for the design of a single-cell capture microfluidic chip and a reference for the study of single-cell capture in the future.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Leonardo Morelli ◽  
Valentina Giansanti ◽  
Davide Cittaro

AbstractSingle cell profiling has been proven to be a powerful tool in molecular biology to understand the complex behaviours of heterogeneous system. The definition of the properties of single cells is the primary endpoint of such analysis, cells are typically clustered to underpin the common determinants that can be used to describe functional properties of the cell mixture under investigation. Several approaches have been proposed to identify cell clusters; while this is matter of active research, one popular approach is based on community detection in neighbourhood graphs by optimisation of modularity. In this paper we propose an alternative and principled solution to this problem, based on Stochastic Block Models. We show that such approach not only is suitable for identification of cell groups, it also provides a solid framework to perform other relevant tasks in single cell analysis, such as label transfer. To encourage the use of Stochastic Block Models, we developed a python library, , that is compatible with the popular framework.


2019 ◽  
Author(s):  
Johan Reimegård ◽  
Marcus Danielsson ◽  
Marcel Tarbier ◽  
Jens Schuster ◽  
Sathishkumar Baskaran ◽  
...  

ABSTRACTCombined measurements of mRNA and protein expression in single cells enables in-depth analysis of cellular states. We present single-cell protein and RNA co-profiling (SPARC), an approach to simultaneously measure global mRNA and large sets of intracellular protein in individual cells. Using SPARC, we show that mRNA expression fails to accurately reflect protein abundance at the time of measurement in human embryonic stem cells, although the direction of changes of mRNA and protein expression are in agreement during cellular differentiation. Moreover, protein levels of transcription factors better predict their downstream effects than do the corresponding transcripts. We further show that changes of the balance between protein and mRNA expression levels can be applied to infer expression kinetic trajectories, revealing future states of individual cells. Finally, we highlight that mRNA expression may be more varied among cells than levels of the corresponding proteins. Overall, our results demonstrate that mRNA and protein measurements in single cells provide different and complementary information regarding cell states. Accordingly, SPARC can offer valuable insights in gene expression programs of single cells.


2018 ◽  
Author(s):  
Stephanie M. Linker ◽  
Lara Urban ◽  
Stephen Clark ◽  
Mariya Chhatriwala ◽  
Shradha Amatya ◽  
...  

AbstractBackgroundAlternative splicing is a key regulatory mechanism in eukaryotic cells and increases the effective number of functionally distinct gene products. Using bulk RNA sequencing, splicing variation has been studied across human tissues and in genetically diverse populations. This has identified disease-relevant splicing events, as well as associations between splicing and genomic variations, including sequence composition and conservation. However, variability in splicing between single cells from the same tissue or cell type and its determinants remain poorly understood.ResultsWe applied parallel DNA methylation and transcriptome sequencing to differentiating human induced pluripotent stem cells to characterize splicing variation (exon skipping) and its determinants. Our results shows that variation in single-cell splicing can be accurately predicted based on local sequence composition and genomic features. We observe moderate but consistent contributions from local DNA methylation profiles to splicing variation across cells. A combined model that is built based on sequence as well as DNA methylation information accurately predicts different splicing modes of individual cassette exons (AUC=0.85). These categories include the conventional inclusion and exclusion patterns, but also more subtle modes of cell-to-cell variation in splicing. Finally, we identified and characterized associations between DNA methylation and splicing changes during cell differentiation.ConclusionsOur study yields new insights into alternative splicing at the single-cell level and reveals a previously underappreciated link between DNA methylation variation and splicing.


2021 ◽  
Author(s):  
Konrad Thorner ◽  
Aaron M. Zorn ◽  
Praneet Chaturvedi

AbstractAnnotation of single cells has become an important step in the single cell analysis framework. With advances in sequencing technology thousands to millions of cells can be processed to understand the intricacies of the biological system in question. Annotation through manual curation of markers based on a priori knowledge is cumbersome given this exponential growth. There are currently ~200 computational tools available to help researchers automatically annotate single cells using supervised/unsupervised machine learning, cell type markers, or tissue-based markers from bulk RNA-seq. But with the expansion of publicly available data there is also a need for a tool which can help integrate multiple references into a unified atlas and understand how annotations between datasets compare. Here we present ELeFHAnt: Ensemble learning for harmonization and annotation of single cells. ELeFHAnt is an easy-to-use R package that employs support vector machine and random forest algorithms together to perform three main functions: 1) CelltypeAnnotation 2) LabelHarmonization 3) DeduceRelationship. CelltypeAnnotation is a function to annotate cells in a query Seurat object using a reference Seurat object with annotated cell types. LabelHarmonization can be utilized to integrate multiple cell atlases (references) into a unified cellular atlas with harmonized cell types. Finally, DeduceRelationship is a function that compares cell types between two scRNA-seq datasets. ELeFHAnt can be accessed from GitHub at https://github.com/praneet1988/ELeFHAnt.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Jeremy A. Lombardo ◽  
Marzieh Aliaghaei ◽  
Quy H. Nguyen ◽  
Kai Kessenbrock ◽  
Jered B. Haun

AbstractTissues are complex mixtures of different cell subtypes, and this diversity is increasingly characterized using high-throughput single cell analysis methods. However, these efforts are hindered, as tissues must first be dissociated into single cell suspensions using methods that are often inefficient, labor-intensive, highly variable, and potentially biased towards certain cell subtypes. Here, we present a microfluidic platform consisting of three tissue processing technologies that combine tissue digestion, disaggregation, and filtration. The platform is evaluated using a diverse array of tissues. For kidney and mammary tumor, microfluidic processing produces 2.5-fold more single cells. Single cell RNA sequencing further reveals that endothelial cells, fibroblasts, and basal epithelium are enriched without affecting stress response. For liver and heart, processing time is dramatically reduced. We also demonstrate that recovery of cells from the system at periodic intervals during processing increases hepatocyte and cardiomyocyte numbers, as well as increases reproducibility from batch-to-batch for all tissues.


2019 ◽  
Vol 116 (13) ◽  
pp. 5979-5984 ◽  
Author(s):  
Yahui Ji ◽  
Dongyuan Qi ◽  
Linmei Li ◽  
Haoran Su ◽  
Xiaojie Li ◽  
...  

Extracellular vesicles (EVs) are important intercellular mediators regulating health and diseases. Conventional methods for EV surface marker profiling, which was based on population measurements, masked the cell-to-cell heterogeneity in the quantity and phenotypes of EV secretion. Herein, by using spatially patterned antibody barcodes, we realized multiplexed profiling of single-cell EV secretion from more than 1,000 single cells simultaneously. Applying this platform to profile human oral squamous cell carcinoma (OSCC) cell lines led to a deep understanding of previously undifferentiated single-cell heterogeneity underlying EV secretion. Notably, we observed that the decrement of certain EV phenotypes (e.g.,CD63+EV) was associated with the invasive feature of both OSCC cell lines and primary OSCC cells. We also realized multiplexed detection of EV secretion and cytokines secretion simultaneously from the same single cells to investigate the multidimensional spectrum of cellular communications, from which we resolved tiered functional subgroups with distinct secretion profiles by visualized clustering and principal component analysis. In particular, we found that different cell subgroups dominated EV secretion and cytokine secretion. The technology introduced here enables a comprehensive evaluation of EV secretion heterogeneity at single-cell level, which may become an indispensable tool to complement current single-cell analysis and EV research.


eLife ◽  
2013 ◽  
Vol 2 ◽  
Author(s):  
Daniel R Larson ◽  
Christoph Fritzsch ◽  
Liang Sun ◽  
Xiuhau Meng ◽  
David S Lawrence ◽  
...  

Single-cell analysis has revealed that transcription is dynamic and stochastic, but tools are lacking that can determine the mechanism operating at a single gene. Here we utilize single-molecule observations of RNA in fixed and living cells to develop a single-cell model of steroid-receptor mediated gene activation. We determine that steroids drive mRNA synthesis by frequency modulation of transcription. This digital behavior in single cells gives rise to the well-known analog dose response across the population. To test this model, we developed a light-activation technology to turn on a single steroid-responsive gene and follow dynamic synthesis of RNA from the activated locus.


Sign in / Sign up

Export Citation Format

Share Document