Hyperbolic geometry of gene expression
AbstractUnderstanding the patterns of gene expression is key to elucidating the differences between cell types and across disease conditions. The overwhelmingly large number of genes involved generally makes this problem intractable. Yet, we find that gene expression patterns in five different data datasets can all be described using a small number of variables. These variables describe differences between cells according to a hyperbolic metric. We reach this conclusion by developing methods that, starting with an initial assumption of a Euclidean geometry, can detect the presence of other geometries in the data. The Euclidean metric is used in most of current studies of gene expression, primarily because it is difficult to use other non-linear metrics in high dimensional spaces. The hyperbolic metric is much more suitable for describing data produced by a hierarchically organized network, which is relevant for many biological processes. We find that the hyperbolic effects, but not the space dimensionality, increase with the number of genes that are taken into account. The hyperbolic curvature was the smallest for mouse embryonic stem cells, stronger for mouse kidney, lung and brain cells, and reached the largest value in a set of human cells integrated from multiple sources. We show that taking into account hyperbolic geometry strongly improves the visualization of gene expression data compared to leading visualization methods. These results demonstrate the advantages of knowing the underlying geometry when analyzing high-dimensional data.