Extraction of Traditional Chinese Medicine Entity: Design of a Novel Span-Level Named Entity Recognition Method With Distant Supervision (Preprint)

Mapping Intimacies ◽

10.2196/preprints.28219 ◽

2021 ◽

Author(s):

Qi Jia ◽

Dezheng Zhang ◽

Haifeng Xu ◽

Yonghong Xie

Keyword(s):

Chinese Medicine ◽

Gold Standard ◽

False Negative ◽

Named Entity Recognition ◽

Entity Recognition ◽

Data Set ◽

Standard Data ◽

Named Entity ◽

Medical Entity ◽

Clinical Records

BACKGROUND Traditional Chinese medicine (TCM) clinical records contain the symptoms of patients, diagnoses, and subsequent treatment of doctors. These records are important resources for research and analysis of TCM diagnosis knowledge. However, most of TCM clinical records are unstructured text. Therefore, a method to automatically extract medical entities from TCM clinical records is indispensable. OBJECTIVE Training a medical entity extracting model needs a large number of annotated corpus. The cost of annotated corpus is very high and there is a lack of gold-standard data sets for supervised learning methods. Therefore, we utilized distantly supervised named entity recognition (NER) to respond to the challenge. METHODS We propose a span-level distantly supervised NER approach to extract TCM medical entity. It utilizes the pretrained language model and a simple multilayer neural network as classifier to detect and classify entity. We also designed a negative sampling strategy for the span-level model. The strategy randomly selects negative samples in every epoch and filters the possible false-negative samples periodically. It reduces the bad influence from the false-negative samples. RESULTS We compare our methods with other baseline methods to illustrate the effectiveness of our method on a gold-standard data set. The F1 score of our method is 77.34 and it remarkably outperforms the other baselines. CONCLUSIONS We developed a distantly supervised NER approach to extract medical entity from TCM clinical records. We estimated our approach on a TCM clinical record data set. Our experimental results indicate that the proposed approach achieves a better performance than other baselines.

Download Full-text

Extraction of Traditional Chinese Medicine Entity: Design of a Novel Span-Level Named Entity Recognition Method With Distant Supervision

JMIR Medical Informatics ◽

10.2196/28219 ◽

2021 ◽

Vol 9 (6) ◽

pp. e28219

Author(s):

Qi Jia ◽

Dezheng Zhang ◽

Haifeng Xu ◽

Yonghong Xie

Keyword(s):

Chinese Medicine ◽

Gold Standard ◽

False Negative ◽

Named Entity Recognition ◽

Entity Recognition ◽

Data Set ◽

Standard Data ◽

Named Entity ◽

Medical Entity ◽

Clinical Records

Background Traditional Chinese medicine (TCM) clinical records contain the symptoms of patients, diagnoses, and subsequent treatment of doctors. These records are important resources for research and analysis of TCM diagnosis knowledge. However, most of TCM clinical records are unstructured text. Therefore, a method to automatically extract medical entities from TCM clinical records is indispensable. Objective Training a medical entity extracting model needs a large number of annotated corpus. The cost of annotated corpus is very high and there is a lack of gold-standard data sets for supervised learning methods. Therefore, we utilized distantly supervised named entity recognition (NER) to respond to the challenge. Methods We propose a span-level distantly supervised NER approach to extract TCM medical entity. It utilizes the pretrained language model and a simple multilayer neural network as classifier to detect and classify entity. We also designed a negative sampling strategy for the span-level model. The strategy randomly selects negative samples in every epoch and filters the possible false-negative samples periodically. It reduces the bad influence from the false-negative samples. Results We compare our methods with other baseline methods to illustrate the effectiveness of our method on a gold-standard data set. The F1 score of our method is 77.34 and it remarkably outperforms the other baselines. Conclusions We developed a distantly supervised NER approach to extract medical entity from TCM clinical records. We estimated our approach on a TCM clinical record data set. Our experimental results indicate that the proposed approach achieves a better performance than other baselines.

Download Full-text

Text Mining of Hazard and Operability Analysis Reports Based on Active Learning

Processes ◽

10.3390/pr9071178 ◽

2021 ◽

Vol 9 (7) ◽

pp. 1178

Author(s):

Zhenhua Wang ◽

Beike Zhang ◽

Dong Gao

Keyword(s):

Active Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Chemical System ◽

Chemical Safety ◽

High Quality ◽

Data Set ◽

Final Model ◽

Named Entity ◽

Sampling Algorithms

In the field of chemical safety, a named entity recognition (NER) model based on deep learning can mine valuable information from hazard and operability analysis (HAZOP) text, which can guide experts to carry out a new round of HAZOP analysis, help practitioners optimize the hidden dangers in the system, and be of great significance to improve the safety of the whole chemical system. However, due to the standardization and professionalism of chemical safety analysis text, it is difficult to improve the performance of traditional models. To solve this problem, in this study, an improved method based on active learning is proposed, and three novel sampling algorithms are designed, Variation of Token Entropy (VTE), HAZOP Confusion Entropy (HCE) and Amplification of Least Confidence (ALC), which improve the ability of the model to understand HAZOP text. In this method, a part of data is used to establish the initial model. The sampling algorithm is then used to select high-quality samples from the data set. Finally, these high-quality samples are used to retrain the whole model to obtain the final model. The experimental results show that the performance of the VTE, HCE, and ALC algorithms are better than that of random sampling algorithms. In addition, compared with other methods, the performance of the traditional model is improved effectively by the method proposed in this paper, which proves that the method is reliable and advanced.

Download Full-text

ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition

BioMed Research International ◽

10.1155/2016/4248026 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 5

Author(s):

Abbas Akkasi ◽

Ekrem Varoğlu ◽

Nazife Dimililer

Keyword(s):

Conditional Random Fields ◽

Named Entity Recognition ◽

Classification Performance ◽

Entity Recognition ◽

Support Vector ◽

Learning Approaches ◽

Data Set ◽

Rule Based ◽

Named Entity ◽

Vector Machines

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.

Download Full-text

Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2013-001837 ◽

2014 ◽

Vol 21 (3) ◽

pp. 406-413 ◽

Cited By ~ 20

Author(s):

Todd Lingren ◽

Louise Deleger ◽

Katalin Molnar ◽

Haijun Zhai ◽

Jareen Meinzen-Derr ◽

...

Keyword(s):

Clinical Trial ◽

Natural Language Processing ◽

Language Processing ◽

Gold Standard ◽

Named Entity Recognition ◽

Entity Recognition ◽

Potential Bias ◽

Named Entity ◽

The Impact ◽

Standard Development

Download Full-text

Named Entity Recognition (NER) for Tibetan and Mongolian Newspapers

10.33774/coe-2021-xhw9l-v2 ◽

2021 ◽

Author(s):

Robert Barnett ◽

Christian Faggionato ◽

Marieke Meelen ◽

Sargai Yunshaab ◽

Tsering Samdrup ◽

...

Keyword(s):

Gold Standard ◽

Named Entity Recognition ◽

The State ◽

Training Data ◽

Entity Recognition ◽

People's Republic Of China ◽

Republic Of China ◽

Named Entity ◽

Policy Analysts ◽

Standard Training

Modern Tibetan and Vertical (Traditional) Mongolian are scripts used by c.11m people, mostly within the People’s Republic of China. In terms of publicly available tools for NLP, these languages and their scripts are extremely low-resourced and under-researched. We set out firstly to survey the state of NLP for these languages, and secondly to facilitate research by historians and policy analysts working on Tibetan newspapers. Their primary need is to be able to carry out Named Entity Recognition (NER) in Modern Tibetan, a script which has no word or sentence boundaries and for which no segmenters have been developed. Working on LightTag, an online tagger using character-based modelling, we were able to produce gold-standard training data for NER for use with Modern Tibetan.

Download Full-text

RUREBUS-2020 SHARED TASK: RUSSIAN RELATION EXTRACTION FOR BUSINESS

Computational Linguistics and Intellectual Technologies ◽

10.28995/2075-7182-2020-19-416-431 ◽

2020 ◽

Cited By ~ 1

Author(s):

V. A. Ivanin ◽

◽

E. L. Artemova ◽

T. V. Batura ◽

V. V. Ivanov ◽

...

Keyword(s):

Named Entity Recognition ◽

Relation Extraction ◽

Business Case ◽

Entity Recognition ◽

Quality Data ◽

Shared Task ◽

Annotation Scheme ◽

Data Set ◽

Named Entity ◽

Short Time

In this paper, we present a shared task on core information extraction problems, named entity recognition and relation extraction. In contrast to popular shared tasks on related problems, we try to move away from strictly academic rigor and rather model a business case. As a source for textual data we choose the corpus of Russian strategic documents, which we annotated according to our own annotation scheme. To speed up the annotation process, we exploit various active learning techniques. In total we ended up with more than two hundred annotated documents. Thus we managed to create a high-quality data set in short time. The shared task consisted of three tracks, devoted to 1) named entity recognition, 2) relation extraction and 3) joint named entity recognition and relation extraction. We provided with the annotated texts as well as a set of unannotated texts, which could of been used in any way to improve solutions. In the paper we overview and compare solutions, submitted by the shared task participants. We release both raw and annotated corpora along with annotation guidelines, evaluation scripts and results at https://github.com/dialogue-evaluation/RuREBus.

Download Full-text

A Nested Named Entity Recognition Method for Traditional Chinese Medicine Records

Advances in Artificial Intelligence and Security - Communications in Computer and Information Science ◽

10.1007/978-3-030-78615-1_43 ◽

2021 ◽

pp. 488-497

Author(s):

Haifeng Xu ◽

Honglan Liu ◽

Qi Jia ◽

Yuxiao Zhan ◽

Yan Zhang ◽

...

Keyword(s):

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Named Entity Recognition ◽

Entity Recognition ◽

Recognition Method ◽

Named Entity

Download Full-text

Named Entity Recognition Model of Chinese Clinical Electronic Medical Record Based on XLNet-BiLSTM

10.21203/rs.3.rs-218833/v1 ◽

2021 ◽

Author(s):

Shen Zhou Feng ◽

Su Qian Min ◽

Guo Jing Lei

Keyword(s):

Medical Record ◽

Electronic Medical Record ◽

Named Entity Recognition ◽

Entity Recognition ◽

Vector Model ◽

Global Maximum ◽

Data Set ◽

Recognition Model ◽

Named Entity ◽

Feature Sequence

Abstract The recognition of named entities in Chinese clinical electronic medical records is one of the basic tasks to realize smart medical care. Aiming at the insufficient text semantic representation of the traditional word vector model and the inability of the recurrent neural network (RNN) model to solve the problems of long-term dependence, a Chinese clinical electronic medical record named entity recognition model XLNet-BiLSTM-MHA-CRF based on XLNet is proposed. Use the XLNet pre-training language model as the embedding layer to vectorize the medical record text to solve the problem of ambiguity; use the bidirectional long and short-term memory network (BiLSTM) gate control unit to obtain the forward and backward semantic feature information of the sentence; Then input the feature sequence to the multi-head attention layer (multi-head attention, MHA), use MHA to obtain information represented by different subspaces of the feature sequence, enhance the relevance of context semantics and eliminate noise; finally, input the conditional random field CRF to identify the global maximum 优 sequence. The experimental results show that the XLNet-BiLSTM-Attention-CRF model has achieved good results on the CCKS-2017 named entity recognition data set.

Download Full-text

Named Entity Recognition in Traditional Chinese Medicine Clinical Cases Combining BiLSTM-CRF with Knowledge Graph

Knowledge Science, Engineering and Management - Lecture Notes in Computer Science ◽

10.1007/978-3-030-29551-6_48 ◽

2019 ◽

pp. 537-548 ◽

Cited By ~ 2

Author(s):

Zhe Jin ◽

Yin Zhang ◽

Haodan Kuang ◽

Liang Yao ◽

Wenjin Zhang ◽

...

Keyword(s):

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Named Entity Recognition ◽

Entity Recognition ◽

Knowledge Graph ◽

Named Entity ◽

Clinical Cases

Download Full-text

NER for Tibetan and Mongolian Newspapers

10.33774/coe-2021-xhw9l ◽

2021 ◽

Author(s):

Robert Barnett ◽

Christian Faggionato ◽

Marieke Meelen ◽

Sargai Yunshaab ◽

Tsering Samdrup ◽

...

Keyword(s):

Gold Standard ◽

Named Entity Recognition ◽

The State ◽

Training Data ◽

Entity Recognition ◽

People's Republic Of China ◽

Republic Of China ◽

Named Entity ◽

Policy Analysts ◽

Standard Training

Download Full-text