scholarly journals Identifying Diagnosis Evidence of Liver Cancer in Chinese Radiology Reports Using BERT-based Deep Learning Method (Preprint)

2020 ◽  
Author(s):  
Hui Chen ◽  
Honglei Liu ◽  
Ni Wang ◽  
Yanqun Huang ◽  
Zhiqiang Zhang ◽  
...  

BACKGROUND Liver cancer remains to be a substantial disease burden in China. As one of the primary diagnostic means for liver cancer, the dynamic enhanced computed tomography (CT) scan provides detailed diagnosis evidence that is recorded in the free-text radiology reports. OBJECTIVE In this study, we combined knowledge-driven deep learning methods and data-driven natural language processing (NLP) methods to extract the radiological features from these reports, and designed a computer-aided liver cancer diagnosis framework.In this study, we combined knowledge-driven deep learning methods and data-driven natural language processing (NLP) methods to extract the radiological features from these reports, and designed a computer-aided liver cancer diagnosis framework. METHODS We collected 1089 CT radiology reports in Chinese. We proposed a pre-trained fine-tuning BERT (Bidirectional Encoder Representations from Transformers) language model for word embedding. The embedding served as the inputs for BiLSTM (Bidirectional Long Short-Term Memory) and CRF (Conditional Random Field) model (BERT-BiLSTM-CRF) to extract features of hyperintense enhancement in the arterial phase (APHE) and hypointense in the portal and delayed phases (PDPH). Furthermore, we also extracted features using the traditional rule-based NLP method based on the content of radiology reports. We then applied random forest for liver cancer diagnosis and calculated the Gini impurity for the identification of diagnosis evidence. RESULTS The BERT-BiLSTM-CRF predicted the features of APHE and PDPH with an F1 score of 98.40% and 90.67%, respectively. The prediction model using combined features had a higher performance (F1 score, 88.55%) than those using the single kind of features obtained by BERT-BiLSTM-CRF (84.88%) or traditional rule-based NLP method (83.52%). The features of APHE and PDPH were the top two essential features for the liver cancer diagnosis. CONCLUSIONS We proposed a BERT-based deep learning method for diagnosis evidence extraction based on clinical knowledge. With the recognized features of APHE and PDPH, the liver cancer diagnosis could get a high performance, which was further increased by combining with the radiological features obtained by the traditional rule-based NLP method. The BERT-BiLSTM-CRF had achieved the state-of-the-art performance in this study, which could be extended to other kinds of Chinese clinical texts. CLINICALTRIAL None

2021 ◽  
Vol 11 (13) ◽  
pp. 5832
Author(s):  
Wei Gou ◽  
Zheng Chen

Chinese Spelling Error Correction is a hot subject in the field of natural language processing. Researchers have already produced many great solutions, from the initial rule-based solution to the current deep learning method. At present, SpellGCN, proposed by Alibaba’s team, achieves the best results of which character level precision over SIGHAN2013 is 98.4%. However, when we apply this algorithm to practical error correction tasks, it produces many false error correction results. We believe that this is because the corpus used for model training contains significantly more errors than the text used for model correcting. In response to this problem, we propose performing a post-processing operation on the error correction tasks. We employ the initial model’s output as a candidate character, obtain various features of the character itself and its context, and then use a classification model to filter the initial model’s false error correction results. The post-processing idea introduced in this paper can apply to most Chinese Spelling Error Correction models to improve their performance over practical error correction tasks.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 159110-159119
Author(s):  
Honglei Liu ◽  
Yan Xu ◽  
Zhiqiang Zhang ◽  
Ni Wang ◽  
Yanqun Huang ◽  
...  

2018 ◽  
Vol 30 (1) ◽  
pp. 90 ◽  
Author(s):  
Peng Zhang ◽  
Xinnan Xu ◽  
Hongwei Wang ◽  
Yuanli Feng ◽  
Haozhe Feng ◽  
...  

2020 ◽  
pp. 1-22 ◽  
Author(s):  
D. Sykes ◽  
A. Grivas ◽  
C. Grover ◽  
R. Tobin ◽  
C. Sudlow ◽  
...  

Abstract Using natural language processing, it is possible to extract structured information from raw text in the electronic health record (EHR) at reasonably high accuracy. However, the accurate distinction between negated and non-negated mentions of clinical terms remains a challenge. EHR text includes cases where diseases are stated not to be present or only hypothesised, meaning a disease can be mentioned in a report when it is not being reported as present. This makes tasks such as document classification and summarisation more difficult. We have developed the rule-based EdIE-R-Neg, part of an existing text mining pipeline called EdIE-R (Edinburgh Information Extraction for Radiology reports), developed to process brain imaging reports, (https://www.ltg.ed.ac.uk/software/edie-r/) and two machine learning approaches; one using a bidirectional long short-term memory network and another using a feedforward neural network. These were developed on data from the Edinburgh Stroke Study (ESS) and tested on data from routine reports from NHS Tayside (Tayside). Both datasets consist of written reports from medical scans. These models are compared with two existing rule-based models: pyConText (Harkema et al. 2009. Journal of Biomedical Informatics42(5), 839–851), a python implementation of a generalisation of NegEx, and NegBio (Peng et al. 2017. NegBio: A high-performance tool for negation and uncertainty detection in radiology reports. arXiv e-prints, p. arXiv:1712.05898), which identifies negation scopes through patterns applied to a syntactic representation of the sentence. On both the test set of the dataset from which our models were developed, as well as the largely similar Tayside test set, the neural network models and our custom-built rule-based system outperformed the existing methods. EdIE-R-Neg scored highest on F1 score, particularly on the test set of the Tayside dataset, from which no development data were used in these experiments, showing the power of custom-built rule-based systems for negation detection on datasets of this size. The performance gap of the machine learning models to EdIE-R-Neg on the Tayside test set was reduced through adding development Tayside data into the ESS training set, demonstrating the adaptability of the neural network models.


2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Yan Wang ◽  
Hao Zhang ◽  
Zhanliang Sang ◽  
Lingwei Xu ◽  
Conghui Cao ◽  
...  

Automatic modulation recognition has successfully used various machine learning methods and achieved certain results. As a subarea of machine learning, deep learning has made great progress in recent years and has made remarkable progress in the field of image and language processing. Deep learning requires a large amount of data support. As a communication field with a large amount of data, there is an inherent advantage of applying deep learning. However, the extensive application of deep learning in the field of communication has not yet been fully developed, especially in underwater acoustic communication. In this paper, we mainly discuss the modulation recognition process which is an important part of communication process by using the deep learning method. Different from the common machine learning methods that require feature extraction, the deep learning method does not require feature extraction and obtains more effects than common machine learning.


2021 ◽  
Author(s):  
Jacob Johnson ◽  
Kaneel Senevirathne ◽  
Lawrence Ngo

Here, we developed and validated a highly generalizable natural language processing algorithm based on deep-learning. Our algorithm was trained and tested on a highly diverse dataset from over 2,000 hospital sites and 500 radiologists. The resulting algorithm achieved an AUROC of 0.96 for the presence or absence of liver lesions while achieving a specificity of 0.99 and a sensitivity of 0.6.


Sign in / Sign up

Export Citation Format

Share Document