CpG Transformer for imputation of single-cell methylomes
Motivation: The adoption of current single-cell DNA methylation sequencing protocols is hindered by incomplete coverage, outlining the need for effective imputation techniques. The task of imputing single-cell (methylation) data requires models to build an understanding of underlying biological processes. Current approaches compress intercellular methylation dependencies in some way and, hence, do not provide a general-purpose way of learning interactions between neighboring CpG sites both within- and between cells. Results: We adapt the transformer neural network architecture to operate on methylation matrices through the introduction of a novel 2D sliding window self-attention. The obtained CpG Transformer displays state-of-the-art performances on a wide range of scBS-seq and scRRBS-seq datasets. Furthermore, we demonstrate the interpretability of CpG Transformer and illustrate its rapid transfer learning properties, allowing practitioners to train models on new datasets with a limited computational and time budget. Availability and Implementation: CpG Transformer is freely available at https://github.com/gdewael/cpg-transformer.