Improved Unsupervised Representation Learning of Spatial Transcriptomic Data with Sparse Filtering
We have developed representation learning methods, specifically to address the constraints and advantages of complex spatial data. Sparse filtering (SFt), uses principles of sparsity and mutual information to build representations from both global and local features from a minimal list of samples. Critically, the samples that comprise each representation are listed and ranked by informativeness. We used the Allen Mouse Brain Atlas gene expression data for prototyping and established performance metrics based on representation accuracy to labeled anatomy. SFt, implemented with the PyTorch machine learning libraries for Python, returned the most accurate reconstruction of anatomical ground truth of any method tested. SFt generated gene lists could be further compressed, retaining 95% of informativeness with only 580 genes. Finally, we build classifiers capable of parsing anatomy with >95% accuracy using only 10 derived genes. Sparse learning is a powerful, but underexplored means to derive biologically meaningful representations from complex datasets and a quantitative basis for compressed sensing of classifiable phenomena. SFt should be considered as an alternative to PCA or manifold learning for any high dimensional dataset and the basis for future spatial learning algorithms.