Using machine learning modeling to explore new immune-related prognostic markers in non-small cell lung cancer
Abstract BACKGROUND To find new immune-related prognostic markers for non-small cell lung cancer (NSCLC) METHODS We found suitable data chip (GSE14814) related to NSCLC in geo database. The non-small cell lung cancer observation (NSCLC-OBS) group was evaluated for immunity, and the NSCLC-OBS were divided into high and low groups for differential gene screening according to the score of immune evaluation.A single factor COX regression analysis was performed to select the genes related to prognosis. A prognostic model was constructed by machine learning, and the Receiver Operating Characteristic (ROC) model was analyzed to test whether the model has a test efficacy for prognosis, and then test the association between the selected prognostic genes and the patient's prognosis. A chip-in-chip non-small cell lung cancer chemotherapy (NSCLC-ACT) sample was used as a validation dataset for the same validation and prognostic analysis of the model. The relative infiltration scores of 24 immune cells in NSCLC-ACT patients were compared with those of high and low risk groups. The coexpression genes of hub genes were obtained by pearson analysis and gene enrichment, function enrichment and protein interaction analysis were carried out and the correlation between prognostic genes and immune checkpoints was further analyzed. The tumor samples of patients with different clinical stages were detected by immunohistochemistry and the expression difference of prognostic genes in tumor tissues of patients with different stages was compared. RESULTS By screening, we found that LYN、C3、COPG2IT1、HLA.DQA1、TNFRSF17 is closely related to prognosis. After machine learning we found that the immune prognosis model constructed from these 5 genes was ROC analyzed, and the AUC values were greater than 0.9 at three time periods of 1,3, and 5 years; the total survival period of the low-risk group containing these 5 hubgene was significantly better than that of the high-risk group.The Kaplan–Meier curve showed that the increase of COPG2IT1、HLA.DQA1 expression and the decrease of LYN、C3、TNFRSF17 expression were significantly related to the shortening of survival time.The results of prognosis analysis and ROC analysis in ACT samples were consistent with those of OBS groups. Hubgene was most expressed in fibroblasts, but there was no significant difference in immune infiltration in the high and low risk groups in 24 immune cells.The coexpression genes are mainly involved B cell receptor signaling pathway and mainly enriched in biological processes such as apoptotic cell clearance、Intestinal immune network for IgA Production. Prognostic key genes are highly correlated with PDCD1、PDCD1LG2、LAG3、CTLA4 immune checkpoints (p < 0.05). The immunohistochemical results showed that the expression of COPG2IT1 and HLA.DQA1 in stage III increased significantly and the expression of LYN、C3 and TNFRSF17 in stage III decreased significantly compared with that of stage I. The experimental results are consistent with the previous analysis. CONCLUSION LYN、C3、COPG2IT1、HLA.DQA1、TNFRSF17 may be a new immune marker to judge the prognosis of patients with non-small cell lung cancer.