scholarly journals Correlation Imputation in Single cell RNA-seq using Auxiliary Information and Ensemble Learning

Author(s):  
Luqin Gan ◽  
Giuseppe Vinci ◽  
Genevera I. Allen
2020 ◽  
Author(s):  
Luqin Gan ◽  
Giuseppe Vinci ◽  
Genevera I. Allen

AbstractSingle cell RNA sequencing is a powerful technique that measures the gene expression of individual cells in a high throughput fashion. However, due to sequencing inefficiency, the data is unreliable due to dropout events, or technical artifacts where genes erroneously appear to have zero expression. Many data imputation methods have been proposed to alleviate this issue. Yet, effective imputation can be difficult and biased because the data is sparse and high-dimensional, resulting in major distortions in downstream analyses. In this paper, we propose a completely novel approach that imputes the gene-by-gene correlations rather than the data itself. We call this method SCENA: Single cell RNA-seq Correlation completion by ENsemble learning and Auxiliary information. The SCENA gene-by-gene correlation matrix estimate is obtained by model stacking of multiple imputed correlation matrices based on known auxiliary information about gene connections. In an extensive simulation study based on real scRNA-seq data, we demonstrate that SCENA not only accurately imputes gene correlations but also outperforms existing imputation approaches in downstream analyses such as dimension reduction, cell clustering, graphical model estimation.


Sign in / Sign up

Export Citation Format

Share Document