EmptyNN: A neural network based on positive-unlabeled learning to remove cell-free droplets and recover lost cells in single-cell RNA sequencing data
ABSTRACTDroplet-based single-cell RNA sequencing (scRNA-seq) has significantly increased the number of cells profiled per experiment and revolutionized the study of individual transcriptomes. However, to maximize the biological signal robust computational methods are needed to distinguish cell-free from cell-containing droplets. Here, we introduce a novel cell-calling algorithm called EmptyNN, which trains a neural network based on positive-unlabeled learning for improved filtering of barcodes. We leveraged cell hashing and genetic variation to provide ground-truth. EmptyNN accurately removed cell-free droplets while recovering lost cell clusters, and achieved an Area Under the Receiver Operating Characteristics (AUROC) of 94.73% and 96.30%, respectively. The comparisons to current state-of-the-art cell-calling algorithms demonstrated the superior performance of EmptyNN, as measured by the number of recovered cell-containing droplets and cell types. EmptyNN was further applied to two additional datasets and showed good performance. Therefore, EmptyNN represents a powerful tool to enhance scRNA-seq quality control analyses.