Deciphering regulatory protein activity in human pancreatic islets via reverse engineering of single-cell sequencing data

BackgroundSingle-cell sequencing technology has opened an unprecedented ability to interrogate cancer. It reveals significant insights into the intratumoral heterogeneity, metastasis, therapeutic resistance, which facilitates target discovery and validation in cancer treatment. With rapid advancements in throughput and strategies, a particular immuno-oncology study can produce multi-omics profiles for several thousands of individual cells. This overflow of single-cell data poses formidable challenges, including standardizing data formats across studies, performing reanalysis for individual datasets and meta-analysis.MethodsN/AResultsWe present BioTuring Browser, an interactive platform for accessing and reanalyzing published single-cell omics data. The platform is currently hosting a curated database of more than 10 million cells from 247 projects, covering more than 120 immune cell types and subtypes, and 15 different cancer types. All data are processed and annotated with standardized labels of cell types, diseases, therapeutic responses, etc. to be instantly accessed and explored in a uniform visualization and analytics interface. Based on this massive curated database, BioTuring Browser supports searching similar expression profiles, querying a target across datasets and automatic cell type annotation. The platform supports single-cell RNA-seq, CITE-seq and TCR-seq data. BioTuring Browser is now available for download at www.bioturing.com.ConclusionsN/A

Download Full-text

Reference-free inference of tumor phylogenies from single-cell sequencing data

2014 IEEE 4th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS) ◽

10.1109/iccabs.2014.6863944 ◽

2014 ◽

Author(s):

Ayshwarya Subramanian ◽

Russell Schwartz

Keyword(s):

Single Cell ◽

Sequencing Data ◽

Single Cell Sequencing

Download Full-text

Effective clustering for single cell sequencing cancer data

10.1101/586545 ◽

2019 ◽

Cited By ~ 2

Author(s):

Simone Ciccolella ◽

Murray Patterson ◽

Paola Bonizzoni ◽

Gianluca Della Vedova

Keyword(s):

Single Cell ◽

Categorical Data ◽

Euclidean Distance ◽

Missing Values ◽

False Negative ◽

Ground Truth ◽

Sequencing Data ◽

Large Space ◽

Cancer Data ◽

Single Cell Sequencing

AbstractBackgroundSingle cell sequencing (SCS) technologies provide a level of resolution that makes it indispensable for inferring from a sequenced tumor, evolutionary trees or phylogenies representing an accumulation of cancerous mutations. A drawback of SCS is elevated false negative and missing value rates, resulting in a large space of possible solutions, which in turn makes infeasible using some approaches and tools. While this has not inhibited the development of methods for inferring phylogenies from SCS data, the continuing increase in size and resolution of these data begin to put a strain on such methods.One possible solution is to reduce the size of an SCS instance — usually represented as a matrix of presence, absence and missing values of the mutations found in the different sequenced cells — and infer the tree from this reduced-size instance. Previous approaches have used k-means to this end, clustering groups of mutations and/or cells, and using these means as the reduced instance. Such an approach typically uses the Euclidean distance for computing means. However, since the values in these matrices are of a categorical nature (having the three categories: present, absent and missing), we explore techniques for clustering categorical data — commonly used in data mining and machine learning — to SCS data, with this goal in mind.ResultsIn this work, we present a new clustering procedure aimed at clustering categorical vector, or matrix data — here representing SCS instances, called celluloid. We demonstrate that celluloid clusters mutations with high precision: never pairing too many mutations that are unrelated in the ground truth, but also obtains accurate results in terms of the phylogeny inferred downstream from the reduced instance produced by this method.Finally, we demonstrate the usefulness of a clustering step by applying the entire pipeline (clustering + inference method) to a real dataset, showing a significant reduction in the runtime, raising considerably the upper bound on the size of SCS instances which can be solved in practice.AvailabilityOur approach, celluloid: clustering single cell sequencing data around centroids is available at https://github.com/AlgoLab/celluloid/ under an MIT license.

Download Full-text

K-mer counting with low memory consumption enables fast clustering of single-cell sequencing data without read alignment

10.1101/723833 ◽

2019 ◽

Author(s):

Christina Huan Shi ◽

Kevin Y. Yip

Keyword(s):

Single Cell ◽

State Of The Art ◽

Rna Seq ◽

Sequencing Data ◽

Memory Consumption ◽

Analysis Pipeline ◽

Cell Clusters ◽

Single Cell Sequencing ◽

Sequencing Errors ◽

Full Analysis

AbstractK-mer counting has many applications in sequencing data processing and analysis. However, sequencing errors can produce many false k-mers that substantially increase the memory requirement during counting. We propose a fast k-mer counting method, CQF-deNoise, which has a novel component for dynamically identifying and removing false k-mers while preserving counting accuracy. Compared with four state-of-the-art k-mer counting methods, CQF-deNoise consumed 49-76% less memory than the second best method, but still ran competitively fast. The k-mer counts from CQF-deNoise produced cell clusters from single-cell RNA-seq data highly consistent with CellRanger but required only 5% of the running time at the same memory consumption, suggesting that CQF-deNoise can be used for a preview of cell clusters for an early detection of potential data problems, before running a much more time-consuming full analysis pipeline.

Download Full-text

scTree: An R package to generate antibody-compatible classifiers from single-cell sequencing data

The Journal of Open Source Software ◽

10.21105/joss.02061 ◽

2020 ◽

Vol 5 (48) ◽

pp. 2061

Author(s):

J. Paez ◽

Michael Wendt ◽

Nadia Lanman

Keyword(s):

Single Cell ◽

R Package ◽

Sequencing Data ◽

Single Cell Sequencing

Download Full-text

Tumor Copy Number Deconvolution Integrating Bulk and Single-Cell Sequencing Data

Journal of Computational Biology ◽

10.1089/cmb.2019.0302 ◽

2020 ◽

Vol 27 (4) ◽

pp. 565-598 ◽

Cited By ~ 1

Author(s):

Haoyun Lei ◽

Bochuan Lyu ◽

E. Michael Gertz ◽

Alejandro A. Schäffer ◽

Xulian Shi ◽

...

Keyword(s):

Single Cell ◽

Copy Number ◽

Sequencing Data ◽

Single Cell Sequencing

Download Full-text

Comparison of single cell sequencing data between two whole genome amplification methods on two sequencing platforms

Scientific Reports ◽

10.1038/s41598-018-23325-2 ◽

2018 ◽

Vol 8 (1) ◽

Cited By ~ 6

Author(s):

DaYang Chen ◽

HeFu Zhen ◽

Yong Qiu ◽

Ping Liu ◽

Peng Zeng ◽

...

Keyword(s):

Single Cell ◽

Whole Genome Amplification ◽

Whole Genome ◽

Sequencing Data ◽

Genome Amplification ◽

Single Cell Sequencing ◽

Sequencing Platforms

Download Full-text

rCASC: reproducible classification analysis of single-cell sequencing data

GigaScience ◽

10.1093/gigascience/giz105 ◽

2019 ◽

Vol 8 (9) ◽

Cited By ~ 5

Author(s):

Luca Alessandrì ◽

Francesca Cordero ◽

Marco Beccuti ◽

Maddalena Arigoni ◽

Martina Olivero ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cellular Heterogeneity ◽

Cell Subpopulation ◽

Integrated Analysis ◽

Specific Gene ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Cell Stability ◽

Reproducible Analysis

Abstract Background Single-cell RNA sequencing is essential for investigating cellular heterogeneity and highlighting cell subpopulation-specific signatures. Single-cell sequencing applications have spread from conventional RNA sequencing to epigenomics, e.g., ATAC-seq. Many related algorithms and tools have been developed, but few computational workflows provide analysis flexibility while also achieving functional (i.e., information about the data and the tools used are saved as metadata) and computational reproducibility (i.e., a real image of the computational environment used to generate the data is stored) through a user-friendly environment. Findings rCASC is a modular workflow providing an integrated analysis environment (from count generation to cell subpopulation identification) exploiting Docker containerization to achieve both functional and computational reproducibility in data analysis. Hence, rCASC provides preprocessing tools to remove low-quality cells and/or specific bias, e.g., cell cycle. Subpopulation discovery can instead be achieved using different clustering techniques based on different distance metrics. Cluster quality is then estimated through the new metric "cell stability score" (CSS), which describes the stability of a cell in a cluster as a consequence of a perturbation induced by removing a random set of cells from the cell population. CSS provides better cluster robustness information than the silhouette metric. Moreover, rCASC's tools can identify cluster-specific gene signatures. Conclusions rCASC is a modular workflow with new features that could help researchers define cell subpopulations and detect subpopulation-specific markers. It uses Docker for ease of installation and to achieve a computation-reproducible analysis. A Java GUI is provided to welcome users without computational skills in R.

Download Full-text

Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data

Genome Biology ◽

10.1186/s13059-020-02100-5 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Nikolaus Fortelny ◽

Christoph Bock

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Single Cell ◽

Sequencing Data ◽

Single Cell Sequencing

Download Full-text