A divide and conquer metacell algorithm for scalable scRNA-seq analysis
Scaling scRNA-seq to profile millions of cells is increasingly feasible. Such data is crucial for the construction of high-resolution maps of transcriptional manifolds. But current analysis strategies, in particular dimensionality reduction and two-phase clustering, offers only limited scaling and sensitivity to define such manifolds. Here we introduce Metacell-2, a recursive divide and conquer algorithm allowing efficient decomposition of scRNA-seq datasets of any size into small and cohesive groups of cells denoted as metacells. We show the algorithm outperforms current solutions in time, memory and quality. Importantly, Metacell-2 also improves outlier cell detection and rare cell type identification, as we exemplify by analysis of human bone marrow cell atlas and mouse embryonic data. Metacell-2 is implemented over the scanpy framework for easy integration in any analysis pipeline.