Named Entity Recognition of Chinese Diabetic Literature by Integrating General Domain Knowledge

Author(s):  
Duochuan Zhang
Author(s):  
Elena Álvarez-Mellado ◽  
María Luisa Díez-Platas ◽  
Pablo Ruiz-Fabo ◽  
Helena Bermúdez ◽  
Salvador Ros ◽  
...  

AbstractMedieval documents are a rich source of historical data. Performing named-entity recognition (NER) on this genre of texts can provide us with valuable historical evidence. However, traditional NER categories and schemes are usually designed with modern documents in mind (i.e. journalistic text) and the general-domain NER annotation schemes fail to capture the nature of medieval entities. In this paper we explore the challenges of performing named-entity annotation on a corpus of Spanish medieval documents: we discuss the mismatches that arise when applying traditional NER categories to a corpus of Spanish medieval documents and we propose a novel humanist-friendly TEI-compliant annotation scheme and guidelines intended to capture the particular nature of medieval entities.


2019 ◽  
Vol 11 (8) ◽  
pp. 180
Author(s):  
Fei Liao ◽  
Liangli Ma ◽  
Jingjing Pei ◽  
Linshan Tan

Military named entity recognition (MNER) is one of the key technologies in military information extraction. Traditional methods for the MNER task rely on cumbersome feature engineering and specialized domain knowledge. In order to solve this problem, we propose a method employing a bidirectional long short-term memory (BiLSTM) neural network with a self-attention mechanism to identify the military entities automatically. We obtain distributed vector representations of the military corpus by unsupervised learning and the BiLSTM model combined with the self-attention mechanism is adopted to capture contextual information fully carried by the character vector sequence. The experimental results show that the self-attention mechanism can improve effectively the performance of MNER task. The F-score of the military documents and network military texts identification was 90.15% and 89.34%, respectively, which was better than other models.


2015 ◽  
Vol 7 (S1) ◽  
Author(s):  
Tsendsuren Munkhdalai ◽  
Meijing Li ◽  
Khuyagbaatar Batsuren ◽  
Hyeon Ah Park ◽  
Nak Hyeon Choi ◽  
...  

2012 ◽  
Vol 3 (1) ◽  
pp. 55-71 ◽  
Author(s):  
O. Isaac Osesina ◽  
John Talburt

Over the past decade, huge volumes of valuable information have become available to organizations. However, the existence of a substantial part of the information in unstructured form makes the automated extraction of business intelligence and decision support information from it difficult. By identifying the entities and their roles within unstructured text in a process known as semantic named entity recognition, unstructured text can be made more readily available for traditional business processes. The authors present a novel NER approach that is independent of the text language and subject domain making it applicable within different organizations. It departs from the natural language and machine learning methods in that it leverages the wide availability of huge amounts of data as well as high-performance computing to provide a data-intensive solution. Also, it does not rely on external resources such as dictionaries and gazettes for the language or domain knowledge.


2021 ◽  
Vol 22 (S1) ◽  
Author(s):  
Cong Sun ◽  
Zhihao Yang ◽  
Lei Wang ◽  
Yin Zhang ◽  
Hongfei Lin ◽  
...  

Abstract Background The recognition of pharmacological substances, compounds and proteins is essential for biomedical relation extraction, knowledge graph construction, drug discovery, as well as medical question answering. Although considerable efforts have been made to recognize biomedical entities in English texts, to date, only few limited attempts were made to recognize them from biomedical texts in other languages. PharmaCoNER is a named entity recognition challenge to recognize pharmacological entities from Spanish texts. Because there are currently abundant resources in the field of natural language processing, how to leverage these resources to the PharmaCoNER challenge is a meaningful study. Methods Inspired by the success of deep learning with language models, we compare and explore various representative BERT models to promote the development of the PharmaCoNER task. Results The experimental results show that deep learning with language models can effectively improve model performance on the PharmaCoNER dataset. Our method achieves state-of-the-art performance on the PharmaCoNER dataset, with a max F1-score of 92.01%. Conclusion For the BERT models on the PharmaCoNER dataset, biomedical domain knowledge has a greater impact on model performance than the native language (i.e., Spanish). The BERT models can obtain competitive performance by using WordPiece to alleviate the out of vocabulary limitation. The performance on the BERT model can be further improved by constructing a specific vocabulary based on domain knowledge. Moreover, the character case also has a certain impact on model performance.


2020 ◽  
Author(s):  
YUANHE TIAN ◽  
Wang Shen ◽  
Yan Song ◽  
Fei Xia ◽  
Min He ◽  
...  

Abstract Background Biomedical named entity recognition (BioNER) is an important task for understanding biomedical texts. The task can be challenging due to the lack of large-scale labeled training data and domain knowledge. Previous studies have shown that syntactic information can be useful for named entity recognition; however, most of them fail to weigh that information with respect to its contribution as they treat the syntactic information as gold reference. Results In this paper, we propose BioKMNER, a BioNER model for biomedical texts with key-value memory networks to incorporate syntactic information, which is extracted from syntactic structures automatically generated by existing toolkits. Our approach outperforms baselines without memories and achieves new state-of-the-art results on on four biomedical datasets compared with previous studies, i.e., 85.67% on BC2GM, 94.22% on BC5CDR-chemical, 90.11% on NCBI-diease, and 76.33% on Species-800. Conclusion Experimental results on four benchmark datasets demonstrate the effectiveness of our method, where the state-of-the-art performance is achieved on all of them.


Sign in / Sign up

Export Citation Format

Share Document