Prediction of biomarker status, diagnosis and outcome from histology slides using deep learning-based hypothesis free feature extraction.
3140 Background: Recently, histological pattern signatures obtained from diagnostic H&E images have been found to predict mutation, biomarker status or outcome. We report here on a novel deep learning based framework designed to identify and extract predictive histological signatures. We have applied this framework in 3 experiments, predicting specifically the microsatellite status (MSS) of colorectal cancer (CRC), breast cancer (BC) micrometastasis in Lymph nodes (LN) and Pathologic Complete Response (pCR) in BC diagnostic biopsies. Methods: Our deep learning based algorithm was trained on histology images at 20X magnification. Algorithms were trained for binary classification for each of the three cohorts. We used 75% of the images for training and test our algorithm on the remaining 25% of the images. Cohort details are as follows: MSS for CRC: 94 patients’ H&E stained tissue images from the Roche internal CRC80 dataset (MSS n =24; MSI n = 70) were used. BC LN: 270 patients’ H&E stained tissue images from the CAMELYON16 dataset ( LN(+) n = 110 ; LN(-), n =160) were used. pCR for BC: 225 patients’ H&E stained tissue images from the Tryphaena Study BO22280, neoadjuvant, Trastuzumab/Pertuzumab chemotherapy combination trial. (pCR=111, non-pCR n=114). Results: We report and assess algorithm performance on each of the cohorts by Area Under the Curve (AUC). Prediction of MSS in the CRC80 status yielded AUC 0.9. Prediction of LN invasion on CAMELYON16 dataset yielded AUC 0.85. Prediction of pCR on the Tryphaena cohort yielded an AUC of 0.8. Conclusions: We present a new approach to generate predictive signatures based on conventional diagnostic H&E images and a novel machine learning framework. The CRC80 and CAMELYON16 cohorts served as a confidence building experiments with predictive features well known by clinicians and visually confirmed. The predictive algorithm for pCR in the Tryphaena cohort yielded both response prediction and the high predictive value FOVs. These included tissue patterns which have not until now been considered to influence on the prediction of pCR.