Machine Learning Models for Risk Prediction of Lymph Nodes Metastasis in Non-Small Cell Lung Cancer: Development and Validation Study
Abstract Background: To develop and validate machine learning models for risk prediction of lymph node metastasis (LNM) in non-small cell lung cancer (NSCLC) using clinicopathologic parameters and immunohistochemical features. Methods: From January 2010 to December 2019, 639 patients' data were continuously collected in Nanfang Hospital. We exacted immunohistochemical features and clinicopathological features from the electronic medical records of patients. We established two models (a full model and a selection model) and implemented three algorithms (random forest, support vector machine and penalized logistic regression). The model performance was evaluated in terms of discrimination (receiver operating characteristic curve (AUC)), calibration, and decision curve analysis. Results: AUROC (area under receiver operating characteristic curve) analysis (also calibration curves) showed that the selection model (AUC values for training and testing, 0.843 and 0.840 respectively) and the full model constructed using random forest (AUC values for training and testing, 0.855 and 0.863 respectively) performed best among all models. Decision curve analysis depicted that the full model and the selection model using random forest was clinically useful. The model performance of the full model and the selection model were comparable. Conclusion: The random forest model using clinicopathologic- immunohistochemical features can predict the LNM of NSCLC patients.