Hypoxia classifier for transcriptome datasets
Molecular gene signatures are useful tools to characterize the physiological state of cell populations according to their gene expression profiles. However, most molecular gene signatures have been developed under a very limited set of conditions and cell types, and are often restricted to a set of gene identities linked to an event or biological process, therefore making necessary to develop and test additional procedures for its application to new data. Focusing on the transcriptional response to hypoxia, we aimed to generate widely applicable classifiers capable of detecting hypoxic samples while maintaining transparency and ease of use and interpretation. Here we describe several tree-based classifiers sourced from the results of a meta-analysis of 69 differential expression datasets which included 425 individual RNA-seq experiments from 33 different human cell types exposed to different degrees of hypoxia (0.1-5%O2) for a time spanning between 2 and 48h. These decision trees include both the identities of genes key in the response to hypoxia and defined quantitative boundaries, allowing for the classification of individual samples without needing a control or normoxic reference. Despite their simplicity and ease of use, these classifiers achieve over 95% accuracy in cross validation and over 80% accuracy when applied to additional challenging datasets. Moreover, the explicit structure of the trees allowed for the identification of relevant biological features in cases where prediction was not accurate. Finally, we demonstrate that the classifiers can be applied to spatial gene expression data to identify hypoxic regions within histological sections. Although we have focused on the identification of hypoxia, this method can be applied to detect activation of other processes or cellular states.