A Universal Deep Neural Network for In-Depth Cleaning of Single-Cell RNA-Seq Data
AbstractSingle cell RNA sequencing (scRNA-Seq) has been widely used in biomedical research and generated enormous volume and diversity of data. The raw data contain multiple types of noise and technical artifacts and need thorough cleaning. The existing denoising and imputation methods largely focus on a single type of noise (i.e. dropouts) and have strong distribution assumptions which greatly limit their performance and application. We designed and developed the AutoClass model, integrating two deep neural network components, an autoencoder and a classifier, as to maximize both noise removal and signal retention. AutoClass is free of distribution assumptions, hence can effectively clean a wide range of noises and artifacts. AutoClass outperforms the state-of-art methods in multiple types of scRNA-Seq data analyses, including data recovery, differential expression analysis, clustering analysis and batch effect removal. Importantly, AutoClass is robust on key hyperparameter settings including bottleneck layer size, pre-clustering number and classifier weight. We have made AutoClass open source at: https://github.com/datapplab/AutoClass.