BACKGROUND
Liver cancer remains to be a substantial disease burden in China. As one of the primary diagnostic means for liver cancer, the dynamic enhanced computed tomography (CT) scan provides detailed diagnosis evidence that is recorded in the free-text radiology reports.
OBJECTIVE
In this study, we combined knowledge-driven deep learning methods and data-driven natural language processing (NLP) methods to extract the radiological features from these reports, and designed a computer-aided liver cancer diagnosis framework.In this study, we combined knowledge-driven deep learning methods and data-driven natural language processing (NLP) methods to extract the radiological features from these reports, and designed a computer-aided liver cancer diagnosis framework.
METHODS
We collected 1089 CT radiology reports in Chinese. We proposed a pre-trained fine-tuning BERT (Bidirectional Encoder Representations from Transformers) language model for word embedding. The embedding served as the inputs for BiLSTM (Bidirectional Long Short-Term Memory) and CRF (Conditional Random Field) model (BERT-BiLSTM-CRF) to extract features of hyperintense enhancement in the arterial phase (APHE) and hypointense in the portal and delayed phases (PDPH). Furthermore, we also extracted features using the traditional rule-based NLP method based on the content of radiology reports. We then applied random forest for liver cancer diagnosis and calculated the Gini impurity for the identification of diagnosis evidence.
RESULTS
The BERT-BiLSTM-CRF predicted the features of APHE and PDPH with an F1 score of 98.40% and 90.67%, respectively. The prediction model using combined features had a higher performance (F1 score, 88.55%) than those using the single kind of features obtained by BERT-BiLSTM-CRF (84.88%) or traditional rule-based NLP method (83.52%). The features of APHE and PDPH were the top two essential features for the liver cancer diagnosis.
CONCLUSIONS
We proposed a BERT-based deep learning method for diagnosis evidence extraction based on clinical knowledge. With the recognized features of APHE and PDPH, the liver cancer diagnosis could get a high performance, which was further increased by combining with the radiological features obtained by the traditional rule-based NLP method. The BERT-BiLSTM-CRF had achieved the state-of-the-art performance in this study, which could be extended to other kinds of Chinese clinical texts.
CLINICALTRIAL
None