Research on Named Entity Recognition of Electronic Medical Records Based on RoBERTa and Radical-Level Feature

Clinical named entity recognition (CNER) identifies entities from unstructured medical records and classifies them into predefined categories. It is of great significance for follow-up clinical studies. Most of the existing CNER methods fail to give enough thought to Chinese radical-level characteristics and the specialty of the Chinese field. This paper proposes the Ra-RC model, which combines radical features and a deep learning structure to fix this problem. A bidirectional encoder representation of transformer (RoBERTa) is utilized to learn medical features thoroughly. Simultaneously, we use the bidirectional long short-term memory (BiLSTM) network to extract radical-level information to capture the internal relevance of characteristics and stitch the eigenvectors generated by RoBERTa. In addition, the relationship between labels is considered to obtain the optimal tag sequence by applying conditional random field (CRF). The experimental results demonstrate that the proposed Ra-RC model achieves F1 score 93.26% and 82.87% on the CCKS2017 and CCKS2019 datasets, respectively.

Download Full-text

Enhanced character embedding for Chinese named entity recognition

Measurement and Control ◽

10.1177/0020294020952456 ◽

2020 ◽

pp. 002029402095245

Author(s):

Bingjing Jia ◽

Zhongli Wu ◽

Bin Wu ◽

Yutong Liu ◽

Pengpeng Zhou

Keyword(s):

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Short Term ◽

Named Entity ◽

Ancient Literature ◽

Conditional Random Field Model ◽

Level Information ◽

Long Short Term Memory

Traditional named entity recognition methods mainly explore the application of hand-crafted features. Currently, with the popularity of deep learning, neural networks have been introduced to capture deep features for named entity recognition. However, most existing methods only aim at modern corpus. Named entity recognition in ancient literature is challenging because names in it have evolved over time. In this paper, we attempt to recognise entities by exploring the characteristics of characters and strokes. The enhanced character embedding model, named ECEM, is proposed on the basis of bidirectional encoder representations from transformers and strokes. First, ECEM can generate the semantic vectors dynamically according to the context of the words. Second, the proposed algorithm introduces morphological-level information of Chinese words. Finally, the enhanced character embedding is fed into the bidirectional long short term memory-conditional random field model for training. To explore the effect of our proposed algorithm, experiments are carried out on both ancient literature and modern corpus. The results indicate that our algorithm is very effective and powerful, compared with traditional ones.

Download Full-text

Information Extraction from Electronic Medical Records Using Multitask Recurrent Neural Network with Contextual Word Embedding

Applied Sciences ◽

10.3390/app9183658 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3658 ◽

Cited By ~ 6

Author(s):

Jianliang Yang ◽

Yuenan Liu ◽

Minghui Qian ◽

Chenghua Guan ◽

Xiangfei Yuan

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Large Scale ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Recognition Task ◽

Entity Recognition ◽

Language Models ◽

Named Entity

Clinical named entity recognition is an essential task for humans to analyze large-scale electronic medical records efficiently. Traditional rule-based solutions need considerable human effort to build rules and dictionaries; machine learning-based solutions need laborious feature engineering. For the moment, deep learning solutions like Long Short-term Memory with Conditional Random Field (LSTM–CRF) achieved considerable performance in many datasets. In this paper, we developed a multitask attention-based bidirectional LSTM–CRF (Att-biLSTM–CRF) model with pretrained Embeddings from Language Models (ELMo) in order to achieve better performance. In the multitask system, an additional task named entity discovery was designed to enhance the model’s perception of unknown entities. Experiments were conducted on the 2010 Informatics for Integrating Biology & the Bedside/Veterans Affairs (I2B2/VA) dataset. Experimental results show that our model outperforms the state-of-the-art solution both on the single model and ensemble model. Our work proposes an approach to improve the recall in the clinical named entity recognition task based on the multitask mechanism.

Download Full-text

Pengenalan Entitas Bernama Otomatis untuk Bahasa Indonesia dengan Pendekatan Pembelajaran Mesin

10.31227/osf.io/vud2p ◽

2018 ◽

Cited By ~ 2

Author(s):

Yudi Wibisono ◽

Masayu Leylia Khodra

Keyword(s):

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Short Term ◽

Term Memory ◽

Named Entity ◽

Long Short Term Memory ◽

F Measure ◽

Bahasa Indonesia

Pengenalan entitas bernama (named-entity recognition atau NER) adalah proses otomatis mengekstraksi entitas bernama yang dianggap penting di dalam sebuah teks dan menentukan kategorinya ke dalam kategori terdefinisi. Sebagai contoh, untuk teks berita, NER dapat mengekstraksi nama orang, nama organisasi, dan nama lokasi. NER bermanfaat dalam berbagai aplikasi analisis teks, misalnya pencarian, sistem tanya jawab, peringkasan teks dan mesin penerjemah. Tantangan utama NER adalah penanganan ambiguitas makna karena konteks kata pada kalimat, misalnya kata “Cendana” dapat merupakan nama lokasi (Jalan Cendana), atau nama organisasi (Keluarga Cendana), atau nama tanaman. Tantangan lainnya adalah penentuan batas entitas, misalnya “[Istora Senayan] [Jakarta]”. Berbagai kakas NER telah dikembangkan untuk berbagai bahasa terutama Bahasa Inggris dengan kinerja yang baik, tetapi kakas NER bahasa Indonesia masih memiliki kinerja yang belum baik. Makalah ini membahas pendekatan berbasis pembelajaran mesin untuk menghasilkan model NER bahasa Indonesia. Pendekatan ini sangat bergantung pada korpus yang menjadi sumber belajar, dan teknik pembelajaran mesin yang digunakan. Teknik yang akan digunakan adalah LSTM - CRF (Long Short Term Memory – Conditional Random Field). Hasil terbaik (F-measure = 0.72) didapatkan dengan menggunakan word embedding GloVe Wikipedia Bahasa Indonesia.

Download Full-text

Chinese Named Entity Recognition Method in History and Culture Field Based on BERT

International Journal of Computational Intelligence Systems ◽

10.1007/s44196-021-00019-8 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Shuang Liu ◽

Hui Yang ◽

Jiayi Li ◽

Simon Kolmanič

Keyword(s):

Short Term Memory ◽

Conditional Random Field ◽

Language Model ◽

Named Entity Recognition ◽

Entity Recognition ◽

Knowledge Graph ◽

Recognition Method ◽

Short Term ◽

Named Entity ◽

Long Short Term Memory

AbstractWith rapid development of the Internet, people have undergone tremendous changes in the way they obtain information. In recent years, knowledge graph is becoming a popular tool for the public to acquire knowledge. For knowledge graph of Chinese history and culture, most researchers adopted traditional named entity recognition methods to extract entity information from unstructured historical text data. However, the traditional named entity recognition method has certain defects, and it is easy to ignore the association between entities. To extract entities from a large amount of historical and cultural information more accurately and efficiently, this paper proposes one named entity recognition model combining Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory-Conditional Random Field (BERT-BiLSTM-CRF). First, a BERT pre-trained language model is used to encode a single character to obtain a vector representation corresponding to each character. Then one Bidirectional Long Short-Term Memory (BiLSTM) layer is applied to semantically encode the input text. Finally, the label with the highest probability is output through the Conditional Random Field (CRF) layer to obtain each character’s category. This model uses the Bidirectional Encoder Representations from Transformers (BERT) pre-trained language model to replace the static word vectors trained in the traditional way. In comparison, the BERT pre-trained language model can dynamically generate semantic vectors according to the context of words, which improves the representation ability of word vectors. The experimental results prove that the model proposed in this paper has achieved excellent results in the task of named entity recognition in the field of historical culture. Compared with the existing named entity identification methods, the precision rate, recall rate, and $$F_1$$ F 1 value have been significantly improved.

Download Full-text

Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations (Preprint)

10.2196/preprints.19848 ◽

2020 ◽

Author(s):

Yongbin Li ◽

Xiaohua Wang ◽

Linhu Hui ◽

Liping Zou ◽

Hongjin Li ◽

...

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Short Term Memory ◽

Named Entity Recognition ◽

Entity Recognition ◽

Language Models ◽

Short Term ◽

Term Memory ◽

Named Entity ◽

Long Short Term Memory

BACKGROUND Clinical named entity recognition (CNER), whose goal is to automatically identify clinical entities in electronic medical records (EMRs), is an important research direction of clinical text data mining and information extraction. The promotion of CNER can provide support for clinical decision making and medical knowledge base construction, which could then improve overall medical quality. Compared with English CNER, and due to the complexity of Chinese word segmentation and grammar, Chinese CNER was implemented later and is more challenging. OBJECTIVE With the development of distributed representation and deep learning, a series of models have been applied in Chinese CNER. Different from the English version, Chinese CNER is mainly divided into character-based and word-based methods that cannot make comprehensive use of EMR information and cannot solve the problem of ambiguity in word representation. METHODS In this paper, we propose a lattice long short-term memory (LSTM) model combined with a variant contextualized character representation and a conditional random field (CRF) layer for Chinese CNER: the Embeddings from Language Models (ELMo)-lattice-LSTM-CRF model. The lattice LSTM model can effectively utilize the information from characters and words in Chinese EMRs; in addition, the variant ELMo model uses Chinese characters as input instead of the character-encoding layer of the ELMo model, so as to learn domain-specific contextualized character embeddings. RESULTS We evaluated our method using two Chinese CNER datasets from the China Conference on Knowledge Graph and Semantic Computing (CCKS): the CCKS-2017 CNER dataset and the CCKS-2019 CNER dataset. We obtained F1 scores of 90.13% and 85.02% on the test sets of these two datasets, respectively. CONCLUSIONS Our results show that our proposed method is effective in Chinese CNER. In addition, the results of our experiments show that variant contextualized character representations can significantly improve the performance of the model.

Download Full-text

Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations

JMIR Medical Informatics ◽

10.2196/19848 ◽

2020 ◽

Vol 8 (9) ◽

pp. e19848

Author(s):

Yongbin Li ◽

Xiaohua Wang ◽

Linhu Hui ◽

Liping Zou ◽

Hongjin Li ◽

...

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Short Term Memory ◽

Named Entity Recognition ◽

Entity Recognition ◽

Language Models ◽

Short Term ◽

Term Memory ◽

Named Entity ◽

Long Short Term Memory

Background Clinical named entity recognition (CNER), whose goal is to automatically identify clinical entities in electronic medical records (EMRs), is an important research direction of clinical text data mining and information extraction. The promotion of CNER can provide support for clinical decision making and medical knowledge base construction, which could then improve overall medical quality. Compared with English CNER, and due to the complexity of Chinese word segmentation and grammar, Chinese CNER was implemented later and is more challenging. Objective With the development of distributed representation and deep learning, a series of models have been applied in Chinese CNER. Different from the English version, Chinese CNER is mainly divided into character-based and word-based methods that cannot make comprehensive use of EMR information and cannot solve the problem of ambiguity in word representation. Methods In this paper, we propose a lattice long short-term memory (LSTM) model combined with a variant contextualized character representation and a conditional random field (CRF) layer for Chinese CNER: the Embeddings from Language Models (ELMo)-lattice-LSTM-CRF model. The lattice LSTM model can effectively utilize the information from characters and words in Chinese EMRs; in addition, the variant ELMo model uses Chinese characters as input instead of the character-encoding layer of the ELMo model, so as to learn domain-specific contextualized character embeddings. Results We evaluated our method using two Chinese CNER datasets from the China Conference on Knowledge Graph and Semantic Computing (CCKS): the CCKS-2017 CNER dataset and the CCKS-2019 CNER dataset. We obtained F1 scores of 90.13% and 85.02% on the test sets of these two datasets, respectively. Conclusions Our results show that our proposed method is effective in Chinese CNER. In addition, the results of our experiments show that variant contextualized character representations can significantly improve the performance of the model.

Download Full-text

Thai Named-Entity Recognition Using Variational Long Short-Term Memory with Conditional Random Field

Advances in Intelligent Systems and Computing - Advances in Intelligent Informatics, Smart Technology and Natural Language Processing ◽

10.1007/978-3-319-94703-7_8 ◽

2018 ◽

pp. 82-92

Author(s):

Can Udomcharoenchaikit ◽

Peerapon Vateekul ◽

Prachya Boonkwan

Keyword(s):

Random Field ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Short Term ◽

Term Memory ◽

Named Entity ◽

Long Short Term Memory

Download Full-text

Named entity recognition for Chinese judgment documents based on BiLSTM and CRF

EURASIP Journal on Image and Video Processing ◽

10.1186/s13640-020-00539-x ◽

2020 ◽

Vol 2020 (1) ◽

Author(s):

Wenming Huang ◽

Dengrui Hu ◽

Zhenrong Deng ◽

Jianyun Nie

Keyword(s):

Viterbi Algorithm ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Input Sequence ◽

Entity Recognition ◽

Computational Time ◽

Short Term ◽

Named Entity ◽

Long Short Term Memory

AbstractChinese named entity recognition (CNER) in the judicial domain is an important and fundamental task in the analysis of judgment documents. However, only a few researches have been devoted to this task so far. For Chinese named entity recognition in judgment documents, we propose the use a bidirectional long-short-term memory (BiLSTM) model, which uses character vectors and sentence vectors trained by distributed memory model of paragraph vectors (PV-DM). The output of BiLSTM is used by conditional random field (CRF) to tag the input sequence. We also improved the Viterbi algorithm to increase the efficiency of the model by cutting the path with the lowest score. At last, a novel dataset with manual annotations is constructed. The experimental results on our corpus show that the proposed method is effective not only in reducing the computational time, but also in improving the effectiveness of named entity recognition in the judicial domain.

Download Full-text

The effect of morphology in named entity recognition with sequence tagging

Natural Language Engineering ◽

10.1017/s1351324918000281 ◽

2018 ◽

Vol 25 (1) ◽

pp. 147-169 ◽

Cited By ~ 3

Author(s):

ONUR GÜNGÖR ◽

TUNGA GÜNGÖR ◽

SUZAN ÜSKÜDARLI

Keyword(s):

Morphological Analysis ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Short Term ◽

Named Entity ◽

Long Short Term Memory ◽

Morphologically Rich Languages ◽

The Impact

AbstractThis work proposes a sequential tagger for named entity recognition in morphologically rich languages. Several schemes for representing the morphological analysis of a word in the context of named entity recognition are examined. Word representations are formed by concatenating word and character embeddings with the morphological embeddings based on these schemes. The impact of these representations is measured by training and evaluating a sequential tagger composed of a conditional random field layer on top of a bidirectional long short-term memory layer. Experiments with Turkish, Czech, Hungarian, Finnish and Spanish produce the state-of-the-art results for all these languages, indicating that the representation of morphological information improves performance.

Download Full-text