Despite the expectation of heterogeneity in therapy outcomes, especially for complex diseases like cancer, analyzing differential response to experimental therapies in a randomized clinical trial (RCT) setting is typically done by dividing patients into responders and non-responders, usually based on a single endpoint. Given the existence of biological and patho-physiological differences among metastatic colorectal cancer (mCRC) patients, we hypothesized that a data-driven analysis of an RCT population outcomes can identify sub-types of patients founded on differential response to Panitumumab - a fully human monoclonal antibody directed against the epidermal growth factor receptor.
Outcome and response data of the RCT population were mined with heuristic, distance-based and model-based unsupervised clustering algorithms. The population sub-groups obtained by the best performing clustering approach were then examined in terms of molecular and clinical characteristics. The utility of this characterization was compared against that of the sub-groups obtained by the conventional responders' analysis and then contrasted with aetiological evidence around mCRC heterogeneity and biological functioning.
The Partition around Medoids clustering method results into the identification of seven sub-types of patients, statistically distinct from each other in survival outcomes, prognostic biomarkers and genetic characteristics. Conventional responders analysis was proven inferior in uncovering relationships between physical, clinical history, genetic attributes and differential treatment resistance mechanisms. Combined with improved characterization of the molecular subtypes of CRC, applying Machine Learning techniques, like unsupervised clustering, onto the wealth of data already collected by previous RCTs can support the design of further targeted, more efficient RCTs and better identification of patient groups who will respond to a given intervention.