scholarly journals Fast and robust non-negative matrix factorization for single-cell experiments

2021 ◽  
Author(s):  
Zachary J. DeBruine ◽  
Karsten Melcher ◽  
Timothy J. Triche

AbstractNon-negative matrix factorization (NMF) is an intuitively appealing method to extract additive combinations of measurements from noisy or complex data. NMF is applied broadly to text and image processing, time-series analysis, and genomics, where recent technological advances permit sequencing experiments to measure the representation of tens of thousands of features in millions of single cells. In these experiments, a count of zero for a given feature in a given cell may indicate either the absence of that feature or an insufficient read coverage to detect that feature (“dropout”). In contrast to spectral decompositions such as the Singular Value Decomposition (SVD), the strictly positive imputation of signal by NMF is an ideal fit for single-cell data with ambiguous zeroes. Nevertheless, most single-cell analysis pipelines apply SVD or Principal Component Analysis (PCA) on transformed counts because these implementations are fast while current NMF implementations are slow. To address this need, we present an accessible NMF implementation that is much faster than PCA and rivals the runtimes of state-of-the-art SVD. NMF models learned with our implementation from raw count matrices yield intuitive summaries of complex biological processes, capturing coordinated gene activity and enrichment of sample metadata. Our NMF implementation, available in the RcppML (Rcpp Machine Learning library) R package, improves upon current NMF implementations by introducing a scaling diagonal to enable convex L1 regularization for feature engineering, reproducible factor scalings, and symmetric factorizations. RcppML NMF easily handles sparse datasets with millions of samples, making NMF an attractive replacement for PCA in the analysis of single-cell experiments.

2019 ◽  
Vol 116 (13) ◽  
pp. 5979-5984 ◽  
Author(s):  
Yahui Ji ◽  
Dongyuan Qi ◽  
Linmei Li ◽  
Haoran Su ◽  
Xiaojie Li ◽  
...  

Extracellular vesicles (EVs) are important intercellular mediators regulating health and diseases. Conventional methods for EV surface marker profiling, which was based on population measurements, masked the cell-to-cell heterogeneity in the quantity and phenotypes of EV secretion. Herein, by using spatially patterned antibody barcodes, we realized multiplexed profiling of single-cell EV secretion from more than 1,000 single cells simultaneously. Applying this platform to profile human oral squamous cell carcinoma (OSCC) cell lines led to a deep understanding of previously undifferentiated single-cell heterogeneity underlying EV secretion. Notably, we observed that the decrement of certain EV phenotypes (e.g.,CD63+EV) was associated with the invasive feature of both OSCC cell lines and primary OSCC cells. We also realized multiplexed detection of EV secretion and cytokines secretion simultaneously from the same single cells to investigate the multidimensional spectrum of cellular communications, from which we resolved tiered functional subgroups with distinct secretion profiles by visualized clustering and principal component analysis. In particular, we found that different cell subgroups dominated EV secretion and cytokine secretion. The technology introduced here enables a comprehensive evaluation of EV secretion heterogeneity at single-cell level, which may become an indispensable tool to complement current single-cell analysis and EV research.


2019 ◽  
Author(s):  
Wu Liu ◽  
Mehmet U. Caglar ◽  
Zhangming Mao ◽  
Andrew Woodman ◽  
Jamie J. Arnold ◽  
...  

SUMMARYDevelopment of antiviral therapeutics emphasizes minimization of the effective dose and maximization of the toxic dose, first in cell culture and later in animal models. Long-term success of an antiviral therapeutic is determined not only by its efficacy but also by the duration of time required for drug-resistance to evolve. We have developed a microfluidic device comprised of ~6000 wells, with each well containing a microstructure to capture single cells. We have used this device to characterize enterovirus inhibitors with distinct mechanisms of action. In contrast to population methods, single-cell analysis reveals that each class of inhibitor interferes with the viral infection cycle in a manner that can be distinguished by principal component analysis. Single-cell analysis of antiviral candidates reveals not only efficacy but also properties of the members of the viral population most sensitive to the drug, the stage of the lifecycle most affected by the drug, and perhaps even if the drug targets an interaction of the virus with its host.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Maeve O’Huallachain ◽  
Felice-Alessio Bava ◽  
Mary Shen ◽  
Carolina Dallett ◽  
Sri Paladugu ◽  
...  

AbstractSingle-cell omics provide insight into cellular heterogeneity and function. Recent technological advances have accelerated single-cell analyses, but workflows remain expensive and complex. We present a method enabling simultaneous, ultra-high throughput single-cell barcoding of millions of cells for targeted analysis of proteins and RNAs. Quantum barcoding (QBC) avoids isolation of single cells by building cell-specific oligo barcodes dynamically within each cell. With minimal instrumentation (four 96-well plates and a multichannel pipette), cell-specific codes are added to each tagged molecule within cells through sequential rounds of classical split-pool synthesis. Here we show the utility of this technology in mouse and human model systems for as many as 50 antibodies to targeted proteins and, separately, >70 targeted RNA regions. We demonstrate that this method can be applied to multi-modal protein and RNA analyses. It can be scaled by expansion of the split-pool process and effectively renders sequencing instruments as versatile multi-parameter flow cytometers.


Micromachines ◽  
2019 ◽  
Vol 10 (5) ◽  
pp. 311 ◽  
Author(s):  
Iordania Constantinou ◽  
Michael Jendrusch ◽  
Théo Aspert ◽  
Frederik Görlitz ◽  
André Schulze ◽  
...  

Single-cell analysis commonly requires the confinement of cell suspensions in an analysis chamber or the precise positioning of single cells in small channels. Hydrodynamic flow focusing has been broadly utilized to achieve stream confinement in microchannels for such applications. As imaging flow cytometry gains popularity, the need for imaging-compatible microfluidic devices that allow for precise confinement of single cells in small volumes becomes increasingly important. At the same time, high-throughput single-cell imaging of cell populations produces vast amounts of complex data, which gives rise to the need for versatile algorithms for image analysis. In this work, we present a microfluidics-based platform for single-cell imaging in-flow and subsequent image analysis using variational autoencoders for unsupervised characterization of cellular mixtures. We use simple and robust Y-shaped microfluidic devices and demonstrate precise 3D particle confinement towards the microscope slide for high-resolution imaging. To demonstrate applicability, we use these devices to confine heterogeneous mixtures of yeast species, brightfield-image them in-flow and demonstrate fully unsupervised, as well as few-shot classification of single-cell images with 88% accuracy.


2021 ◽  
Author(s):  
Haotian Zhuang ◽  
Zhicheng Ji

Principal component analysis (PCA) is widely used in analyzing single-cell genomic data. Selecting the optimal number of PCs is a crucial step for downstream analyses. The elbow method is most commonly used for this task, but it requires one to visually inspect the elbow plot and manually choose the elbow point. To address this limitation, we developed six methods to automatically select the optimal number of PCs based on the elbow method. We evaluated the performance of these methods on real single-cell RNA-seq data from multiple human and mouse tissues. The perpendicular line method with 20 PCs has the best overall performance, and its results are highly consistent with the numbers of PCs identified manually. We implemented the six methods in an R package, findPC, that objectively selects the number of PCs and can be easily incorporated into any automatic analysis pipeline.


2021 ◽  
Author(s):  
Konrad Thorner ◽  
Aaron M. Zorn ◽  
Praneet Chaturvedi

AbstractAnnotation of single cells has become an important step in the single cell analysis framework. With advances in sequencing technology thousands to millions of cells can be processed to understand the intricacies of the biological system in question. Annotation through manual curation of markers based on a priori knowledge is cumbersome given this exponential growth. There are currently ~200 computational tools available to help researchers automatically annotate single cells using supervised/unsupervised machine learning, cell type markers, or tissue-based markers from bulk RNA-seq. But with the expansion of publicly available data there is also a need for a tool which can help integrate multiple references into a unified atlas and understand how annotations between datasets compare. Here we present ELeFHAnt: Ensemble learning for harmonization and annotation of single cells. ELeFHAnt is an easy-to-use R package that employs support vector machine and random forest algorithms together to perform three main functions: 1) CelltypeAnnotation 2) LabelHarmonization 3) DeduceRelationship. CelltypeAnnotation is a function to annotate cells in a query Seurat object using a reference Seurat object with annotated cell types. LabelHarmonization can be utilized to integrate multiple cell atlases (references) into a unified cellular atlas with harmonized cell types. Finally, DeduceRelationship is a function that compares cell types between two scRNA-seq datasets. ELeFHAnt can be accessed from GitHub at https://github.com/praneet1988/ELeFHAnt.


Author(s):  
Matthew P. Mulè ◽  
Andrew J. Martins ◽  
John S. Tsang

AbstractRecent methods enable simultaneous measurement of protein expression with the transcriptome in single cells by combining protein labeling with DNA barcoded antibodies followed by droplet based single cell capture and sequencing (e.g. CITE-seq). While data normalization and denoising have received considerable attention for single cell RNA-seq data, such methods for protein data have been less explored. Here we showed that a major source of noise in CITE-seq data originated from unbound antibody encapsulated in droplets. We also found that the counts of isotype controls and those of the “negative” population inferred from all protein counts of each cell are significantly correlated, suggesting that their covariation likely reflects cell-to-cell differences due to technical factors such as non-specific antibody binding and droplet-to-droplet differences in capture efficiency of the DNA tags. Motivated by these observations, we developed a normalization method for CITE-seq protein expression data called Denoised and Scaled by Background (DSB). DSB corrects for 1) protein-specific background noise as reflected by empty droplets, 2) the technical cell-to-cell variation as captured by the latent noise component described above. DSB normalization improves separation between positive and negative populations for each protein, centers the negative-staining population around zero, and can improve unbiased protein expression-based clustering. DSB is available through the open source R package “DSB” via a single function call and can be readily integrated with existing single cell analysis workflows, including those in Bioconductor and Seurat.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xiao Li ◽  
Guangjie Zeng ◽  
Angsheng Li ◽  
Zhihua Zhang

AbstractTopologically associating domains (TAD) are a key structure of the 3D mammalian genomes. However, the prevalence and dynamics of TAD-like domains in single cells remain elusive. Here we develop a new algorithm, named deTOKI, to decode TAD-like domains with single-cell Hi-C data. By non-negative matrix factorization, deTOKI seeks regions that insulate the genome into blocks with minimal chance of clustering. deTOKI outperforms competing tools and reliably identifies TAD-like domains in single cells. Finally, we find that TAD-like domains are not only prevalent, but also subject to tight regulation in single cells.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Yixin Kong ◽  
Ariangela Kozik ◽  
Cindy H. Nakatsu ◽  
Yava L. Jones-Hall ◽  
Hyonho Chun

Abstract A latent factor model for count data is popularly applied in deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the accuracy of the estimates could be much improved. However, the advantage quickly disappears in the presence of excessive zeros. To correctly account for this phenomenon in both mixed and pure samples, we propose a zero-inflated non-negative matrix factorization and derive an effective multiplicative parameter updating rule. In simulation studies, our method yielded the smallest bias. We applied our approach to brain gene expression as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Jeremy A. Lombardo ◽  
Marzieh Aliaghaei ◽  
Quy H. Nguyen ◽  
Kai Kessenbrock ◽  
Jered B. Haun

AbstractTissues are complex mixtures of different cell subtypes, and this diversity is increasingly characterized using high-throughput single cell analysis methods. However, these efforts are hindered, as tissues must first be dissociated into single cell suspensions using methods that are often inefficient, labor-intensive, highly variable, and potentially biased towards certain cell subtypes. Here, we present a microfluidic platform consisting of three tissue processing technologies that combine tissue digestion, disaggregation, and filtration. The platform is evaluated using a diverse array of tissues. For kidney and mammary tumor, microfluidic processing produces 2.5-fold more single cells. Single cell RNA sequencing further reveals that endothelial cells, fibroblasts, and basal epithelium are enriched without affecting stress response. For liver and heart, processing time is dramatically reduced. We also demonstrate that recovery of cells from the system at periodic intervals during processing increases hepatocyte and cardiomyocyte numbers, as well as increases reproducibility from batch-to-batch for all tissues.


Sign in / Sign up

Export Citation Format

Share Document