Weakly Supervised Learning of Single-Cell Feature Embeddings

AbstractWe study the problem of learning representations for single cells in microscopy images to discover biological relationships between their experimental conditions. Many new applications in drug discovery and functional genomics require capturing the morphology of individual cells as comprehensively as possible. Deep convolutional neural networks (CNNs) can learn powerful visual representations, but require ground truth for training; this is rarely available in biomedical profiling experiments. While we do not know which experimental treatments produce cells that look alike, we do know that cells exposed to the same experimental treatment should generally look similar. Thus, we explore training CNNs using a weakly supervised approach that uses this information for feature learning. In addition, the training stage is regularized to control for unwanted variations using mixup or RNNs. We conduct experiments on two different datasets; the proposed approach yields single-cell embeddings that are more accurate than the widely adopted classical features, and are competitive with previously proposed transfer learning approaches.

Download Full-text

propeller: testing for differences in cell type proportions in single cell data

10.1101/2021.11.28.470236 ◽

2021 ◽

Author(s):

Belinda Phipson ◽

Choon Boon Sim ◽

Enzo R. Porrello ◽

Alex W Hewitt ◽

Joseph Powell ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

R Package ◽

Cell Type ◽

Experimental Conditions ◽

Cell Type Composition ◽

Type Composition ◽

Biological Replication ◽

Cell Data ◽

Different Sources

Single cell RNA Sequencing (scRNA-seq) has rapidly gained popularity over the last few years for profiling the transcriptomes of thousands to millions of single cells. To date, there are more than a thousand software packages that have been developed to analyse scRNA-seq data. These focus predominantly on visualization, dimensionality reduction and cell type identification. Single cell technology is now being used to analyse experiments with complex designs including biological replication. One question that can be asked from single cell experiments which has not been possible to address with bulk RNA-seq data is whether the cell type proportions are different between two or more experimental conditions. As well as gene expression changes, the relative depletion or enrichment of a particular cell type can be the functional consequence of disease or treatment. However, cell type proportions estimates from scRNA-seq data are variable and statistical methods that can correctly account for different sources of variability are needed to confidently identify statistically significant shifts in cell type composition between experimental conditions. We present propeller, a robust and flexible method that leverages biological replication to find statistically significant differences in cell type proportions between groups. The propeller method is publicly available in the open source speckle R package (https://github.com/Oshlack/speckle).

Download Full-text

DeLTA 2.0: A deep learning pipeline for quantifying single-cell spatial and temporal dynamics

10.1101/2021.08.10.455795 ◽

2021 ◽

Author(s):

Owen M. O'Connor ◽

Razan N. Alnahhas ◽

Jean-Baptiste Lugagne ◽

Mary Dunlop

Keyword(s):

Single Cell ◽

Temporal Dynamics ◽

Single Cells ◽

Time Lapse ◽

Error Rates ◽

Two Dimensions ◽

Age And Growth ◽

Two Dimensional ◽

Spatial Effects ◽

Deep Convolutional Neural Networks

Improvements in microscopy software and hardware have dramatically increased the pace of image acquisition, making analysis a major bottleneck in generating quantitative, single-cell data. Although tools for segmenting and tracking bacteria within time-lapse images exist, most require human input, are specialized to the experimental set up, or lack accuracy. Here, we introduce DeLTA 2.0, a purely Python workflow that can rapidly and accurately analyze single cells on two-dimensional surfaces to quantify gene expression and cell growth. The algorithm uses deep convolutional neural networks to extract single-cell information from time-lapse images, requiring no human input after training. DeLTA 2.0 retains all the functionality of the original version, which was optimized for bacteria growing in the mother machine microfluidic device, but extends results to two-dimensional growth environments. Two-dimensional environments represent an important class of data because they are more straightforward to implement experimentally, they offer the potential for studies using co-cultures of cells, and they can be used to quantify spatial effects and multi-generational phenomena. However, segmentation and tracking are significantly more challenging tasks in two-dimensions due to exponential increases in the number of cells that must be tracked. To showcase this new functionality, we analyze mixed populations of antibiotic resistant and susceptible cells, and also track pole age and growth rate across generations. In addition to the two-dimensional capabilities, we also introduce several major improvements to the code that increase accessibility, including the ability to accept many standard microscopy file formats and arbitrary image sizes as inputs. DeLTA 2.0 is rapid, with run times of less than 10 minutes for complete movies with hundreds of cells, and is highly accurate, with error rates around 1%, making it a powerful tool for analyzing time-lapse microscopy data.

Download Full-text

Automated population identification and sorting algorithms for high-dimensional single-cell data

10.1101/046664 ◽

2016 ◽

Cited By ~ 1

Author(s):

Benedict Anchang ◽

Sylvia K. Plevritis

Keyword(s):

Single Cell ◽

Cell Sorting ◽

Intracellular Signaling ◽

Expert Knowledge ◽

Single Cells ◽

Cell Type ◽

Experimental Conditions ◽

Cell Type Specific ◽

Cell Subpopulations ◽

Cell Data

AbstractCell sorting or gating homogenous subpopulations from single-cell data enables cell-type specific characterization, such as cell-type genomic profiling as well as the study of tumor progression. This highlight summarizes recently developed automated gating algorithms that are optimized for both population identification and sorting homogeneous single cells in heterogeneous single-cell data. Data-driven gating strategies identify and/or sort homogeneous subpopulations from a heterogeneous population without relying on expert knowledge thereby removing human bias and variability. We further describe an optimized cell sorting strategy called CCAST based on Clustering, Classification and Sorting Trees which identifies the relevant gating markers, gating hierarchy and partitions that define underlying cell subpopulations. CCAST identifies more homogeneous subpopulations in several applications compared to prior sorting strategies and reveals simultaneous intracellular signaling across different lineage subtypes under different experimental conditions.

Download Full-text

Quantifying the effect of experimental perturbations at single-cell resolution

10.1101/532846 ◽

2019 ◽

Cited By ~ 9

Author(s):

Daniel B. Burkhardt ◽

Jay S. Stanley ◽

Alexander Tong ◽

Ana Luisa Perdigoto ◽

Scott A. Gigante ◽

...

Keyword(s):

Single Cell ◽

Ground Truth ◽

Likelihood Estimate ◽

Cell Populations ◽

Continuous Measure ◽

Single Cell Level ◽

Cell Level ◽

Experimental Conditions ◽

Transcriptional State ◽

Multiple Conditions

Abstract Current methods for comparing scRNA-seq datasets collected in multiple conditions focus on discrete regions of the transcriptional state space, such as clusters of cells. Here, we quantify the effects of perturbations at the single-cell level using a continuous measure of the effect of a perturbation across the transcriptomic space. We describe this space as a manifold and develop a relative likelihood estimate of observing each cell in each of the experimental conditions using graph signal processing. This likelihood estimate can be used to identify cell populations specifically affected by a perturbation. We also develop vertex frequency clustering to extract populations of affected cells at the level of granularity that matches the perturbation response. The accuracy of our algorithm to identify clusters of cells that are enriched or depleted in each condition is on average 57% higher than the next best-performing algorithm tested. Gene signatures derived from these clusters are more accurate compared to six alternative algorithms in ground-truth comparisons.

Download Full-text

Single Cell Systems Analysis: Decision Geometry in Outliers

10.31219/osf.io/2zagn ◽

2019 ◽

Author(s):

Lianne Abrahams

Keyword(s):

Decision Making ◽

Graph Theory ◽

Single Cell ◽

Cancer Biology ◽

Single Cells ◽

Optimal Level ◽

Clinical Relapse ◽

Learning Approaches ◽

Cell Systems ◽

Substrate Variation

Anticer-therapeutics of the highest calibre currently focus on combinatorial targeting of specific oncoproteins and tumour suppressors. Clinical relapse depends upon intratumoral heterogeneity which serves as substrate variation during evolution of resistance to therapeutic regimens. The present review advocates single cell systems biology as the optimal level of analysis for remediation of clinical relapse. Graph theory approaches to understanding decision-making in single cells may be abstracted one level further, to the geometry of decision-making in outlier cells, in order to define evolution-resistant cancer biomarkers. Systems biologists currently working with omics data are invited to consider phase portrait analysis as a mediator between graph theory and deep learning approaches. Perhaps counter-intuitively, the tangible clinical needs of cancer patients may depend upon the adoption of higher level mathematical abstractions of cancer biology.

Download Full-text

Cluster similarity spectrum integration of single-cell genomics data

10.1101/2020.02.27.968560 ◽

2020 ◽

Cited By ~ 2

Author(s):

Zhisong He ◽

Agnieska Brazovskaja ◽

Sebastian Ebert ◽

J. Gray Camp ◽

Barbara Treutlein

Keyword(s):

Single Cell ◽

Single Cells ◽

Data Representation ◽

Cellular Heterogeneity ◽

Biological Information ◽

Sequencing Data ◽

Experimental Conditions ◽

Complex Biological System ◽

Free Data ◽

Powerful Approach

Technologies to sequence the transcriptome, genome or epigenome from thousands of single cells in an experiment provide extraordinary resolution into the molecular states present within a complex biological system at any given moment. However, it is a major challenge to integrate single-cell sequencing data across experiments, conditions, batches, timepoints and other technical considerations. New computational methods are required that can integrate samples while simultaneously preserving biological information. Here, we propose an unsupervised reference-free data representation, Cluster Similarity Spectrum (CSS), where each cell is represented by its similarities to clusters independently identified across samples. We show that CSS can be used to assess cellular heterogeneity and enable reconstruction of differentiation trajectories from cerebral organoid single-cell transcriptomic data, and to integrate data across experimental conditions and human individuals. We compare CSS to other integration algorithms and show that it can outperform other methods in certain integration scenarios. We also show that CSS allows projection of single-cell genomic data of different modalities to the CSS-represented reference atlas for visualization and cell type identity prediction. In summary, CSS provides a straightforward and powerful approach to understand and integrate challenging single-cell multi-omic data.

Download Full-text

Mis-Classified Vector Guided Softmax Loss for Face Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6906 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12241-12248 ◽

Cited By ~ 3

Author(s):

Xiaobo Wang ◽

Shifeng Zhang ◽

Shuo Wang ◽

Tianyu Fu ◽

Hailin Shi ◽

...

Keyword(s):

Face Recognition ◽

Loss Function ◽

State Of The Art ◽

Feature Learning ◽

Ground Truth ◽

Significant Progress ◽

Deep Convolutional Neural Networks ◽

Face Features ◽

Discriminative Feature ◽

Feature Mining

Face recognition has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs), the central task of which is how to improve the feature discrimination. To this end, several margin-based (e.g., angular, additive and additive angular margins) softmax loss functions have been proposed to increase the feature margin between different classes. However, despite great achievements have been made, they mainly suffer from three issues: 1) Obviously, they ignore the importance of informative features mining for discriminative learning; 2) They encourage the feature margin only from the ground truth class, without realizing the discriminability from other non-ground truth classes; 3) The feature margin between different classes is set to be same and fixed, which may not adapt the situations very well. To cope with these issues, this paper develops a novel loss function, which adaptively emphasizes the mis-classified feature vectors to guide the discriminative feature learning. Thus we can address all the above issues and achieve more discriminative face features. To the best of our knowledge, this is the first attempt to inherit the advantages of feature margin and feature mining into a unified loss function. Experimental results on several benchmarks have demonstrated the effectiveness of our method over state-of-the-art alternatives. Our code is available at http://www.cbsr.ia.ac.cn/users/xiaobowang/.

Download Full-text

Binarization of music score with complex background by deep convolutional neural networks

Multimedia Tools and Applications ◽

10.1007/s11042-020-10272-2 ◽

2021 ◽

Author(s):

Minh-Trieu Tran ◽

Quang-Nhat Vo ◽

Guee-Sang Lee

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Feature Learning ◽

Ground Truth ◽

Learning Ability ◽

Dense Layer ◽

Deep Convolutional Neural Networks ◽

Music Score ◽

Complex Background ◽

Network Backbone

AbstractBinarization is an important step for most of document analysis systems. Regarding music score images with a complex background, the existence of background clutters with a variety of shapes and colors creates many challenges for the binarization. This paper presents a model for binarization of the complex background music score images by fusion of deep convolutional neural networks. Our model is directly trained from image regions using pixel values as inputs and the binary ground truth as labels. By utilizing the generalization capability of the residual network backbone and useful feature learning ability of dense layer, the proposed network structures can differentiate foreground pixels from background clutters, minimize the possibility of overfitting phenomenon and thus can deal with complex background noises appearing in the music score images. Comparing to traditional algorithms, binary images generated by our method have a cleaner background and better-preserved strokes. The experiments with captured and synthetic music score images show promising results compared to existing methods.

Download Full-text

Single Cell Systems Analysis: Decision Geometry In Outliers

Bioinformatics ◽

10.1093/bioinformatics/btaa1078 ◽

2020 ◽

Author(s):

Lianne Abrahams

Keyword(s):

Decision Making ◽

Graph Theory ◽

Single Cell ◽

Cancer Biology ◽

Single Cells ◽

Cancer Therapeutics ◽

Supplementary Information ◽

Clinical Relapse ◽

Learning Approaches ◽

Cell Systems

Abstract Motivation Anti-cancer therapeutics of the highest calibre currently focus on combinatorial targeting of specific oncoproteins and tumour suppressors. Clinical relapse depends upon intratumoral heterogeneity which serves as substrate variation during evolution of resistance to therapeutic regimens. Results The present review advocates single cell systems biology as the optimal level of analysis for remediation of clinical relapse. Graph theory approaches to understanding decision-making in single cells may be abstracted one level further, to the geometry of decision-making in outlier cells, in order to define evolution-resistant cancer biomarkers. Systems biologists currently working with omics data are invited to consider phase portrait analysis as a mediator between graph theory and deep learning approaches. Perhaps counter-intuitively, the tangible clinical needs of cancer patients may depend upon the adoption of higher level mathematical abstractions of cancer biology. Supplementary Information supplementary data available at Bioinformatics online.

Download Full-text

High-throughput imaging of mRNA at the single-cell level in human primary immune cells

10.1101/2020.11.10.377283 ◽

2020 ◽

Author(s):

Manasi Gadkari ◽

Jing Sun ◽

Adrian Carcamo ◽

Hugh Alessi ◽

Zonghui Hu ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Immune Cells ◽

Single Cells ◽

Transcript Abundance ◽

Hybridization Chain Reaction ◽

Single Cell Level ◽

Cell Level ◽

Experimental Conditions ◽

General Applicability

AbstractMeasurement of gene expression at the single-cell level has led to important advances in the study of transcriptional regulation programs in healthy and disease states. In particular, single-cell gene expression approaches have shed light on the high level of transcriptional heterogeneity of individual cells, both at baseline and in response to experimental or environmental perturbations. We have developed a method for High-Content Imaging (HCI)-based quantification of transcript abundance at the single-cell level in primary human immune cells and have validated its performance under multiple experimental conditions to demonstrate its general applicability. This method, which we abbreviate as hcHCR, combines the high sensitivity of the hybridization chain reaction (HCR) for the visualization of mRNA molecules in single cells, with the speed, scalability, and technical reproducibility of HCI. We first tested eight microscopy-compatible attachment substrates for short-term culture of primary human B cells, T cells, monocytes, or neutrophils. We then miniaturized HCR in a 384-well format and documented the ability of the method to detect increased or decreased transcript abundance at the single-cell level in thousands of cells for each experimental condition by HCI. Furthermore, we demonstrated the feasibility of multiplexing gene expression measurements by simultaneously assaying the abundance of two transcripts per cell, both at baseline and in response to an experimental stimulus. Finally, we tested the robustness of the assay to technical and biological variation. We anticipate that hcHCR will be a suitable and cost-effective assay for low- to medium-throughput chemical, genetic or functional genomic screens in primary human cells, with the possibility of performing personalized screens or screens on cells obtained from patients with a specific disease.

Download Full-text