Prediction of Pathologic Complete Response to Neoadjuvant Chemotherapy Using Machine Learning Models in Patients with Breast Cancer
Abstract BackgroundThe aim of this study was to develop a machine learning(ML) based model to accurately predict pathologic complete response(pCR) to neoadjuvant chemotherapy(NAC) using pretreatment clinical and pathological characteristics of electronic medical record(EMR) data in breast cancer(BC).Methods The EMR data from patients diagnosed with early and locally advanced BC and who received NAC followed by curative surgery were reviewed. A total of 16 clinical and pathological characteristics was selected to develop ML model. We practiced six ML models using default settings for multivariate analysis with extracted variables. ResultsIn total, 2,065 patients were included in this analysis. Overall, 30.6% (n=632) of patients achieved pCR. Among six ML models, the LightGBM had the highest area under the curve (AUC) for pCR prediction. After hyper-parameter tuning with Bayesian optimization, AUC was 0.810. Performance of pCR prediction models in different histology-based subtypes was compared. The AUC was highest in HR+HER2- subgroup and lowest in HR-/HER2- subgroup (HR+/HER2- 0.841, HR+/HER2+ 0.716, HR-/HER2 0.753, HR-/HER2- 0.653).ConclusionsA ML based pCR prediction model using pre-treatment clinical and pathological characteristics provided useful information to predict pCR during NAC. This prediction model would help to determine treatment strategy in patients with BC planned NAC.