Hubness reduction improves clustering and trajectory inference in single-cell transcriptomic data
Background. Single-cell RNA-seq datasets are characterized by large ambient dimensionality, and their analyses, such as clustering or cell trajectory inference, can be affected by various manifestations of the dimensionality curse. One of these manifestations is the hubness phenomenon, i.e. existence of data points with surprisingly large incoming connectivity degree in the neighbourhood graph. Conventional approach to dampen the unwanted effects of high dimension consists in applying drastic dimensionality reduction, which is critical especially in scRNA-seq. It remains unexplored if this step can be avoided thus retaining more information than contained in the low-dimensional projections, by correcting directly an effect of the high dimension. Results. We investigate the phenomenon of hubness in scRNA-seq data, and its manifestation in spaces of increasing dimensionality. We also link increasing hubness to increased levels of dropout in sequencing data. We show that hub cells do not represent any visible technical or biological bias. The effect of various hubness reduction methods is investigated with respect to the visualization, clustering and trajectory inference tasks in scRNA-seq datasets. We show that applying hubness reduction generates neighbourhood graphs with properties more suitable for applying machine learning methods; and that hubness reduction outperforms other state-of-the-art methods for improving neighbourhood graphs. As a consequence, clustering, trajectory inference and visualisation perform better, especially for datasets characterized by large intrinsic dimensionality. Conclusion. Hubness is an important phenomenon in sequencing data. Reducing hubness can be beneficial for the analysis of scRNA-seq data characterized by large intrinsic dimensionality in which case it can be used as an alternative to drastic dimensionality reduction.