BACKGROUND
Lung cancer is the leading cause of cancer death worldwide. Clinical staging of lung cancer plays a crucial role in treatment decision making and prognosis evaluation. However, in clinical practice, about one-half of the clinical stages of lung cancer patients are inconsistent with their pathological stages. As one of the most important diagnostic modalities for staging, chest computed tomography reports a wealth of information about cancer staging, but the free-text nature of the reports obstructs their computerized utilization.
OBJECTIVE
In this paper, we aim to automatically extract the staging-related information from CT reports to support accurate clinical staging.
METHODS
In this study, we developed an information extraction system to extract the staging-related information from CT reports. The system consisted of three parts, i.e., named entity recognition (NER), relation classification (RC), and question reasoning (QR). We first summarized 22 questions about lung cancer staging based on the TNM staging guideline. And then, two state-of-the-art NER algorithms were implemented to recognize the entities of interest. Next, we presented a novel RC method using the relation constraints to classify the relations between entities. Finally, a rule-based QR module was established to answer all questions by reasoning the results of NER and RC.
RESULTS
We evaluated the developed IE system on a clinical dataset containing 392 chest CT reports collected from the Department of Thoracic Surgery II of Peking University Cancer Hospital. The experimental results show that the Bi-LSTM-CRF outperforms the ID-CNN-CRF for the NER task with 77.27% and 89.96% macro F1 scores under the exact and inexact matching scheme, respectively. For the RC task, the proposed method, i.e., Attention-Bi-LSTM with relation constraints, achieves the best performances with 96.53% micro F1 score and 98.27% macro F1 score in comparison with CNN-MF and Attention-Bi-LSTM. Moreover, the rule-based QR module can correctly answer the staging questions by reasoning the extracted results of NER and RC, which achieves 93.56% macro F1 score and 94.73% micro F1 score for all 22 questions.
CONCLUSIONS
We conclude that the developed IE system can effectively and accurately extract the information about lung cancer staging from the CT reports. Experimental results show that the extracted results have great potential for further utilization in stage verification and prediction to facilitate accurate clinical staging.