Multi-cancer classification; an analysis of neural network complexity
Background: Cancer identification is generally framed as binary classification, normally discrimination of a control group from a single cancer group. However, such models lack any cancer-specific information, as they are only trained on one cancer type. The models fail to account for competing cancer risks. For example, an ostensibly healthy individual may have any number of different cancer types, and a tumor may originate from one of several primary sites. Pan-cancer evaluation requires a model trained on multiple cancer types, and controls, simultaneously, so that a physician can be directed to the correct area of the body for further testing. Methods: We introduce novel neural network models to address multi-cancer classification problems across several data types commonly applied in cancer prediction, including circulating miRNA expression, protein, and mRNA. In particular, we present an analysis of neural network depth and complexity, and investigate how this relates to classification performance. Comparisons of our models with state-of-the-art neural networks from the literature are also presented. Results: Our analysis evidences that shallow, feed-forward neural net architectures offer greater performance when compared to more complex deep feed-forward, Convolutional Neural Network (CNN), and Graph CNN (GCNN) architectures considered in the literature. Conclusion: The results show that multiple cancers and controls can be classified accurately using the proposed models, across a range of expression technologies in cancer prediction. Impact: This study addresses the important problem of pan-cancer classification, which is often overlooked in the literature. The promising results highlight the urgency for further research.