Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data
Abstract Motivation The discovery of cancer subtyping can help explore cancer pathogenesis, determine clinical actionability in treatment, and improve patients' survival rates. However, due to the diversity and complexity of multi-omics data, it is still challenging to develop integrated clustering algorithms for tumor molecular subtyping. Results We propose Subtype-GAN, a deep adversarial learning approach based on the multiple-input multiple-output neural network to model the complex omics data accurately. With the latent variables extracted from the neural network, Subtype-GAN uses consensus clustering and the Gaussian Mixture model to identify tumor samples' molecular subtypes. Compared with other state-of-the-art subtyping approaches, Subtype-GAN achieved outstanding performance on the benchmark data sets consisting of ∼4,000 TCGA tumors from 10 types of cancer. We found that on the comparison data set, the clustering scheme of Subtype-GAN is not always similar to that of the deep learning method AE but is identical to that of NEMO, MCCA, VAE, and other excellent approaches. Finally, we applied Subtype-GAN to the BRCA data set and automatically obtained the number of subtypes and the subtype labels of 1031 BRCA tumors. Through the detailed analysis, we found that the identified subtypes are clinically meaningful and show distinct patterns in the feature space, demonstrating the practicality of Subtype-GAN. Availability The source codes, the clustering results of Subtype-GAN across the benchmark data sets are available at https://github.com/haiyang1986/Subtype-GAN. Supplementary information Supplementary data are available at Bioinformatics online.