Enhanced Cancer Subtyping via Pan-Transcriptomics Data Fusion, Monte-Carlo Consensus Clustering, and Auto Classifier Creation
ABSTRACTSubtyping of tumor transcriptome expression profiles is a routine method used to distinguish tumor heterogeneity. Unsupervised clustering techniques are often combined with survival analysis to decipher the relationship between genes and the survival times of patients. However, the reproducibility of these subtyping based studies is poor. There are multiple reports which have conflicting subtype and gene-survival time relationship results. In this study, we introduce the issues underlying the lack of reproducibility in transcriptomic subtyping studies. This problem arises from the routine analysis of small cohorts (< 100 individuals) and use of biased traditional consensus clustering techniques. Our approach carefully combines multiple RNA-sequencing and microarray datasets, followed by subtyping via Monte-Carlo Consensus Clustering and creation of deep subtyping classifiers. This paper demonstrates an improved subtyping methodology by investigating pancreatic ductal adenocarcinoma. Importantly, our methodology identifies six biologically novel pancreatic ductal adenocarcinoma subtypes. Our approach also enables a degree of reproducibility, via our pancreatic ductal adenocarcinoma classifier PDACNet, which classical subtyping studies have failed to establish.