Chinese agricultural diseases and pests named entity recognition with multi-scale local context features and self-attention mechanism

2020 ◽  
Vol 179 ◽  
pp. 105830
Author(s):  
Xuchao Guo ◽  
Han Zhou ◽  
Jie Su ◽  
Xia Hao ◽  
Zhan Tang ◽  
...  
Author(s):  
Hui Chen ◽  
Zijia Lin ◽  
Guiguang Ding ◽  
Jianguang Lou ◽  
Yusen Zhang ◽  
...  

The dominant approaches for named entity recognitionm (NER) mostly adopt complex recurrent neural networks (RNN), e.g., long-short-term-memory (LSTM). However, RNNs are limited by their recurrent nature in terms of computational efficiency. In contrast, convolutional neural networks (CNN) can fully exploit the GPU parallelism with their feedforward architectures. However, little attention has been paid to performing NER with CNNs, mainly owing to their difficulties in capturing the long-term context information in a sequence. In this paper, we propose a simple but effective CNN-based network for NER, i.e., gated relation network (GRN), which is more capable than common CNNs in capturing long-term context. Specifically, in GRN we firstly employ CNNs to explore the local context features of each word. Then we model the relations between words and use them as gates to fuse local context features into global ones for predicting labels. Without using recurrent layers that process a sentence in a sequential manner, our GRN allows computations to be performed in parallel across the entire sentence. Experiments on two benchmark NER datasets (i.e., CoNLL2003 and Ontonotes 5.0) show that, our proposed GRN can achieve state-of-the-art performance with or without external knowledge. It also enjoys lower time costs to train and test.


2020 ◽  
Vol 72 (4) ◽  
pp. 52-64
Author(s):  
Thiyagu Meenachisundaram ◽  
Manjula Dhanabalachandran

Biomedical Named Entity Recognition (BNER) is identification of entities such as drugs, genes, and chemicals from biomedical text, which help in information extraction from the domain literature. It would allow extracting information such as drug profiles, similar or related drugs and associations between drugs and their targets. This venue presents opportunities for improvement even though many machine learning methods have been applied. The efficiency can be improved in case of biological related chemical entities as there are varied structure and properties. This new approach combines two state-of-the-art algorithms and aims to improve the performance by applying it to varied sets of features including linguistic, orthographic, Morphological, domain features and local context features. It uses the sequence tagging capability of CRF to identify the boundary of the entity and classification efficiency of SVM to detect subtypes in BNER. The method is tested on two different datasets 1) GENIA and 2) CHEMDNER corpus with different types of entities. The result shows that proposed hybrid method enhances the BNER compared to the conventional machine learning algorithms. Moreover the detailed study of SVM and the methodologies has been discussed clearly. The linear and non linear text classification can be mapped clearly in the section 3. The final section describes the results and the evaluation of the proposed method.


2019 ◽  
Vol 11 (8) ◽  
pp. 180
Author(s):  
Fei Liao ◽  
Liangli Ma ◽  
Jingjing Pei ◽  
Linshan Tan

Military named entity recognition (MNER) is one of the key technologies in military information extraction. Traditional methods for the MNER task rely on cumbersome feature engineering and specialized domain knowledge. In order to solve this problem, we propose a method employing a bidirectional long short-term memory (BiLSTM) neural network with a self-attention mechanism to identify the military entities automatically. We obtain distributed vector representations of the military corpus by unsupervised learning and the BiLSTM model combined with the self-attention mechanism is adopted to capture contextual information fully carried by the character vector sequence. The experimental results show that the self-attention mechanism can improve effectively the performance of MNER task. The F-score of the military documents and network military texts identification was 90.15% and 89.34%, respectively, which was better than other models.


Information ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 45 ◽  
Author(s):  
Shardrom Johnson ◽  
Sherlock Shen ◽  
Yuanchen Liu

Usually taken as linguistic features by Part-Of-Speech (POS) tagging, Named Entity Recognition (NER) is a major task in Natural Language Processing (NLP). In this paper, we put forward a new comprehensive-embedding, considering three aspects, namely character-embedding, word-embedding, and pos-embedding stitched in the order we give, and thus get their dependencies, based on which we propose a new Character–Word–Position Combined BiLSTM-Attention (CWPC_BiAtt) for the Chinese NER task. Comprehensive-embedding via the Bidirectional Llong Short-Term Memory (BiLSTM) layer can get the connection between the historical and future information, and then employ the attention mechanism to capture the connection between the content of the sentence at the current position and that at any location. Finally, we utilize Conditional Random Field (CRF) to decode the entire tagging sequence. Experiments show that CWPC_BiAtt model we proposed is well qualified for the NER task on Microsoft Research Asia (MSRA) dataset and Weibo NER corpus. A high precision and recall were obtained, which verified the stability of the model. Position-embedding in comprehensive-embedding can compensate for attention-mechanism to provide position information for the disordered sequence, which shows that comprehensive-embedding has completeness. Looking at the entire model, our proposed CWPC_BiAtt has three distinct characteristics: completeness, simplicity, and stability. Our proposed CWPC_BiAtt model achieved the highest F-score, achieving the state-of-the-art performance in the MSRA dataset and Weibo NER corpus.


2021 ◽  
pp. 1-12
Author(s):  
Qinghui Zhang ◽  
Meng Wu ◽  
Pengtao Lv ◽  
Mengya Zhang ◽  
Hongwei Yang

In the medical field, Named Entity Recognition (NER) plays a crucial role in the process of information extraction through electronic medical records and medical texts. To address the problems of long distance entity, entity confusion, and difficulty in boundary division in the Chinese electronic medical record NER task, we propose a Chinese electronic medical record NER method based on the multi-head attention mechanism and character-word fusion. This method uses a new character-word joint feature representation based on the pre-training model BERT and self-constructed domain dictionary, which can accurately divide the entity boundary and solve the impact of unregistered words. Subsequently, on the basis of the BiLSTM-CRF model, a multi-head attention mechanism is introduced to learn the dependency relationship between remote entities and entity information in different semantic spaces, which effectively improves the performance of the model. Experiments show that our models have better performance and achieves significant improvement compared to baselines. The specific performance is that the F1 value on the Chinese electronic medical record data set reaches 95.22%, which is 2.67%higher than the F1 value of the baseline model.


Sign in / Sign up

Export Citation Format

Share Document