Expression Based Species Deconvolution and Realignment Removes Misalignent Error in Multi-species Single Cell Data
Background: Although single cell RNAseq of xenograft samples are widely used, there is no comprehensive pipeline for human and mouse mixed single cell analysis. Method: We used public data to assess misalignment error when using human and mouse combined reference, and generated a pipeline based on expression-based species deconvolution with species matching reference realignment to remove errors. We also found false-positive signals presumed to originate from ambient RNA of the other species, and use computational method to adequately remove them. Result: Misaligned reads account to on average 0.5% of total reads but expression of few genees were greatly affected leading to 99.8% loss in expression. Human and mouse mixed single cell data analyzed by our pipeline clustered well with unmixed data. We also applied our pipeline to multi-species multi-sample single cell library containing breast cancer xenograft tissue and successfully identified all identities along with the diverse cell types of tumor microenvironment. Conclusion: We present our pipeline for mixed human and mose single cell data which can also be applied to pooled libraries to obtain cost effective single cell data. We also address consideration points when analyzing mixed single cell data for future development.