scholarly journals Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations (Preprint)

2020 ◽  
Author(s):  
Yongbin Li ◽  
Xiaohua Wang ◽  
Linhu Hui ◽  
Liping Zou ◽  
Hongjin Li ◽  
...  

BACKGROUND Clinical named entity recognition (CNER), whose goal is to automatically identify clinical entities in electronic medical records (EMRs), is an important research direction of clinical text data mining and information extraction. The promotion of CNER can provide support for clinical decision making and medical knowledge base construction, which could then improve overall medical quality. Compared with English CNER, and due to the complexity of Chinese word segmentation and grammar, Chinese CNER was implemented later and is more challenging. OBJECTIVE With the development of distributed representation and deep learning, a series of models have been applied in Chinese CNER. Different from the English version, Chinese CNER is mainly divided into character-based and word-based methods that cannot make comprehensive use of EMR information and cannot solve the problem of ambiguity in word representation. METHODS In this paper, we propose a lattice long short-term memory (LSTM) model combined with a variant contextualized character representation and a conditional random field (CRF) layer for Chinese CNER: the Embeddings from Language Models (ELMo)-lattice-LSTM-CRF model. The lattice LSTM model can effectively utilize the information from characters and words in Chinese EMRs; in addition, the variant ELMo model uses Chinese characters as input instead of the character-encoding layer of the ELMo model, so as to learn domain-specific contextualized character embeddings. RESULTS We evaluated our method using two Chinese CNER datasets from the China Conference on Knowledge Graph and Semantic Computing (CCKS): the CCKS-2017 CNER dataset and the CCKS-2019 CNER dataset. We obtained F1 scores of 90.13% and 85.02% on the test sets of these two datasets, respectively. CONCLUSIONS Our results show that our proposed method is effective in Chinese CNER. In addition, the results of our experiments show that variant contextualized character representations can significantly improve the performance of the model.

10.2196/19848 ◽  
2020 ◽  
Vol 8 (9) ◽  
pp. e19848
Author(s):  
Yongbin Li ◽  
Xiaohua Wang ◽  
Linhu Hui ◽  
Liping Zou ◽  
Hongjin Li ◽  
...  

Background Clinical named entity recognition (CNER), whose goal is to automatically identify clinical entities in electronic medical records (EMRs), is an important research direction of clinical text data mining and information extraction. The promotion of CNER can provide support for clinical decision making and medical knowledge base construction, which could then improve overall medical quality. Compared with English CNER, and due to the complexity of Chinese word segmentation and grammar, Chinese CNER was implemented later and is more challenging. Objective With the development of distributed representation and deep learning, a series of models have been applied in Chinese CNER. Different from the English version, Chinese CNER is mainly divided into character-based and word-based methods that cannot make comprehensive use of EMR information and cannot solve the problem of ambiguity in word representation. Methods In this paper, we propose a lattice long short-term memory (LSTM) model combined with a variant contextualized character representation and a conditional random field (CRF) layer for Chinese CNER: the Embeddings from Language Models (ELMo)-lattice-LSTM-CRF model. The lattice LSTM model can effectively utilize the information from characters and words in Chinese EMRs; in addition, the variant ELMo model uses Chinese characters as input instead of the character-encoding layer of the ELMo model, so as to learn domain-specific contextualized character embeddings. Results We evaluated our method using two Chinese CNER datasets from the China Conference on Knowledge Graph and Semantic Computing (CCKS): the CCKS-2017 CNER dataset and the CCKS-2019 CNER dataset. We obtained F1 scores of 90.13% and 85.02% on the test sets of these two datasets, respectively. Conclusions Our results show that our proposed method is effective in Chinese CNER. In addition, the results of our experiments show that variant contextualized character representations can significantly improve the performance of the model.


2019 ◽  
Vol 9 (18) ◽  
pp. 3658 ◽  
Author(s):  
Jianliang Yang ◽  
Yuenan Liu ◽  
Minghui Qian ◽  
Chenghua Guan ◽  
Xiangfei Yuan

Clinical named entity recognition is an essential task for humans to analyze large-scale electronic medical records efficiently. Traditional rule-based solutions need considerable human effort to build rules and dictionaries; machine learning-based solutions need laborious feature engineering. For the moment, deep learning solutions like Long Short-term Memory with Conditional Random Field (LSTM–CRF) achieved considerable performance in many datasets. In this paper, we developed a multitask attention-based bidirectional LSTM–CRF (Att-biLSTM–CRF) model with pretrained Embeddings from Language Models (ELMo) in order to achieve better performance. In the multitask system, an additional task named entity discovery was designed to enhance the model’s perception of unknown entities. Experiments were conducted on the 2010 Informatics for Integrating Biology & the Bedside/Veterans Affairs (I2B2/VA) dataset. Experimental results show that our model outperforms the state-of-the-art solution both on the single model and ensemble model. Our work proposes an approach to improve the recall in the clinical named entity recognition task based on the multitask mechanism.


2018 ◽  
Author(s):  
Yudi Wibisono ◽  
Masayu Leylia Khodra

Pengenalan entitas bernama (named-entity recognition atau NER) adalah proses otomatis mengekstraksi entitas bernama yang dianggap penting di dalam sebuah teks dan menentukan kategorinya ke dalam kategori terdefinisi. Sebagai contoh, untuk teks berita, NER dapat mengekstraksi nama orang, nama organisasi, dan nama lokasi. NER bermanfaat dalam berbagai aplikasi analisis teks, misalnya pencarian, sistem tanya jawab, peringkasan teks dan mesin penerjemah. Tantangan utama NER adalah penanganan ambiguitas makna karena konteks kata pada kalimat, misalnya kata “Cendana” dapat merupakan nama lokasi (Jalan Cendana), atau nama organisasi (Keluarga Cendana), atau nama tanaman. Tantangan lainnya adalah penentuan batas entitas, misalnya “[Istora Senayan] [Jakarta]”. Berbagai kakas NER telah dikembangkan untuk berbagai bahasa terutama Bahasa Inggris dengan kinerja yang baik, tetapi kakas NER bahasa Indonesia masih memiliki kinerja yang belum baik. Makalah ini membahas pendekatan berbasis pembelajaran mesin untuk menghasilkan model NER bahasa Indonesia. Pendekatan ini sangat bergantung pada korpus yang menjadi sumber belajar, dan teknik pembelajaran mesin yang digunakan. Teknik yang akan digunakan adalah LSTM - CRF (Long Short Term Memory – Conditional Random Field). Hasil terbaik (F-measure = 0.72) didapatkan dengan menggunakan word embedding GloVe Wikipedia Bahasa Indonesia.


Sign in / Sign up

Export Citation Format

Share Document