scholarly journals HTLinker: A Head-to-Tail Linker for Nested Named Entity Recognition

Symmetry ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1596
Author(s):  
Xiang Li ◽  
Junan Yang ◽  
Hui Liu ◽  
Pengjiang Hu

Named entity recognition (NER) aims to extract entities from unstructured text, and a nested structure often exists between entities. However, most previous studies paid more attention to flair named entity recognition while ignoring nested entities. The importance of words in the text should vary for different entity categories. In this paper, we propose a head-to-tail linker for nested NER. The proposed model exploits the extracted entity head as conditional information to locate the corresponding entity tails under different entity categories. This strategy takes part of the symmetric boundary information of the entity as a condition and effectively leverages the information from the text to improve the entity boundary recognition effectiveness. The proposed model considers the variability in the semantic correlation between tokens for different entity heads under different entity categories. To verify the effectiveness of the model, numerous experiments were implemented on three datasets: ACE2004, ACE2005, and GENIA, with F1-scores of 80.5%, 79.3%, and 76.4%, respectively. The experimental results show that our model is the most effective of all the methods used for comparison.

Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 79 ◽  
Author(s):  
Xiaoyu Han ◽  
Yue Zhang ◽  
Wenkai Zhang ◽  
Tinglei Huang

Relation extraction is a vital task in natural language processing. It aims to identify the relationship between two specified entities in a sentence. Besides information contained in the sentence, additional information about the entities is verified to be helpful in relation extraction. Additional information such as entity type getting by NER (Named Entity Recognition) and description provided by knowledge base both have their limitations. Nevertheless, there exists another way to provide additional information which can overcome these limitations in Chinese relation extraction. As Chinese characters usually have explicit meanings and can carry more information than English letters. We suggest that characters that constitute the entities can provide additional information which is helpful for the relation extraction task, especially in large scale datasets. This assumption has never been verified before. The main obstacle is the lack of large-scale Chinese relation datasets. In this paper, first, we generate a large scale Chinese relation extraction dataset based on a Chinese encyclopedia. Second, we propose an attention-based model using the characters that compose the entities. The result on the generated dataset shows that these characters can provide useful information for the Chinese relation extraction task. By using this information, the attention mechanism we used can recognize the crucial part of the sentence that can express the relation. The proposed model outperforms other baseline models on our Chinese relation extraction dataset.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Ming Cheng ◽  
Shufeng Xiong ◽  
Fei Li ◽  
Pan Liang ◽  
Jianbo Gao

Abstract Background Named entity recognition (NER) on Chinese electronic medical/healthcare records has attracted significantly attentions as it can be applied to building applications to understand these records. Most previous methods have been purely data-driven, requiring high-quality and large-scale labeled medical data. However, labeled data is expensive to obtain, and these data-driven methods are difficult to handle rare and unseen entities. Methods To tackle these problems, this study presents a novel multi-task deep neural network model for Chinese NER in the medical domain. We incorporate dictionary features into neural networks, and a general secondary named entity segmentation is used as auxiliary task to improve the performance of the primary task of named entity recognition. Results In order to evaluate the proposed method, we compare it with other currently popular methods, on three benchmark datasets. Two of the datasets are publicly available, and the other one is constructed by us. Experimental results show that the proposed model achieves 91.07% average f-measure on the two public datasets and 87.05% f-measure on private dataset. Conclusions The comparison results of different models demonstrated the effectiveness of our model. The proposed model outperformed traditional statistical models.


Author(s):  
Zeqi Tan ◽  
Yongliang Shen ◽  
Shuai Zhang ◽  
Weiming Lu ◽  
Yueting Zhuang

Named entity recognition (NER) is a widely studied task in natural language processing. Recently, a growing number of studies have focused on the nested NER. The span-based methods, considering the entity recognition as a span classification task, can deal with nested entities naturally. But they suffer from the huge search space and the lack of interactions between entities. To address these issues, we propose a novel sequence-to-set neural network for nested NER. Instead of specifying candidate spans in advance, we provide a fixed set of learnable vectors to learn the patterns of the valuable spans. We utilize a non-autoregressive decoder to predict the final set of entities in one pass, in which we are able to capture dependencies between entities. Compared with the sequence-to-sequence method, our model is more suitable for such unordered recognition task as it is insensitive to the label order. In addition, we utilize the loss function based on bipartite matching to compute the overall training loss. Experimental results show that our proposed model achieves state-of-the-art on three nested NER corpora: ACE 2004, ACE 2005 and KBP 2017. The code is available at https://github.com/zqtan1024/sequence-to-set.


2012 ◽  
Vol 3 (1) ◽  
pp. 55-71 ◽  
Author(s):  
O. Isaac Osesina ◽  
John Talburt

Over the past decade, huge volumes of valuable information have become available to organizations. However, the existence of a substantial part of the information in unstructured form makes the automated extraction of business intelligence and decision support information from it difficult. By identifying the entities and their roles within unstructured text in a process known as semantic named entity recognition, unstructured text can be made more readily available for traditional business processes. The authors present a novel NER approach that is independent of the text language and subject domain making it applicable within different organizations. It departs from the natural language and machine learning methods in that it leverages the wide availability of huge amounts of data as well as high-performance computing to provide a data-intensive solution. Also, it does not rely on external resources such as dictionaries and gazettes for the language or domain knowledge.


2021 ◽  
Vol 9 (3) ◽  
pp. 435
Author(s):  
Ni Putu Ayu Sherly Anggita S ◽  
Ngurah Agus Sanjaya ER

In Natural Language Processing (NLP), Named Recognition Entity (NER) is a sub-discussion widely used for research. The NER’s main task is to help identify and detect the entity-named in the sentence, such as personal names, locations, organizations, and many other entities. In this paper, we present a Location NER system for Balinese texts using a rule-based approach. NER in the Balinese document is an essential and challenging task because there is no research on this. The rule-based approach using human-made rules to extract entity name is one of the most famous ways to extract entity names as well as machine learning. The system aims to identify proper names in the corpus and classify them into locations class. Precision, recall, and F-measure used for the evaluation. Our results show that our proposed model is trustworthy enough, having average recall, precision, and f-measure values for the specific location entity, respectively, 0.935, 0.936, and 0.92. These results prove that our system is capable of recognizing named-entities of Balinese texts.


2014 ◽  
Vol 571-572 ◽  
pp. 339-344
Author(s):  
Yong He Lu ◽  
Ming Hui Liang

The answer extraction model has a direct impact on the performance of the Automatic Question Answering System (QA System). In this paper, an answer extraction model based on named entity recognition was presented. It mainly answers specific questions whose answers are related with the named entity. Firstly, it classified the questions according to answer types. And then it identified named entities with suitable types in the fragmented information. Finally, it got the final answer based on scores. The experiments in the paper proved that the model could accurately answer the questions provided by Text REtrieval Conference (TREC). Thus, the proposed model is easy to implement and its performance is good for specific questions.


2020 ◽  
Vol 10 (11) ◽  
pp. 3740
Author(s):  
Hongjin Kim ◽  
Harksoo Kim

In well-spaced Korean sentences, morphological analysis is the first step in natural language processing, in which a Korean sentence is segmented into a sequence of morphemes and the parts of speech of the segmented morphemes are determined. Named entity recognition is a natural language processing task carried out to obtain morpheme sequences with specific meanings, such as person, location, and organization names. Although morphological analysis and named entity recognition are closely associated with each other, they have been independently studied and have exhibited the inevitable error propagation problem. Hence, we propose an integrated model based on label attention networks that simultaneously performs morphological analysis and named entity recognition. The proposed model comprises two layers of neural network models that are closely associated with each other. The lower layer performs a morphological analysis, whereas the upper layer performs a named entity recognition. In our experiments using a public gold-labeled dataset, the proposed model outperformed previous state-of-the-art models used for morphological analysis and named entity recognition. Furthermore, the results indicated that the integrated architecture could alleviate the error propagation problem.


2017 ◽  
Vol 5 ◽  
pp. 247-261 ◽  
Author(s):  
Gáabor Berend

In this paper we propose and carefully evaluate a sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations. The proposed model obtains (near) state-of-the art performance for both part-of-speech tagging and named entity recognition for a variety of languages. Our model relies only on a few thousand sparse coding-derived features, without applying any modification of the word representations employed for the different tasks. The proposed model has favorable generalization properties as it retains over 89.8% of its average POS tagging accuracy when trained at 1.2% of the total available training data, i.e. 150 sentences per language.


TEM Journal ◽  
2021 ◽  
pp. 82-94
Author(s):  
Maganti Syamala ◽  
N.J. Nalini

Aspect based sentient analysis (ABSA) is identified as one of the current research problems in Natural Language Processing (NLP). Traditional ABSA requires manual aspect assignment for aspect extraction and sentiment analysis. In this paper, to automate the process, a domain-independent dynamic ABSA model by the fusion of Efficient Named Entity Recognition (E-NER) guided dependency parsing technique with Neural Networks (NN) is proposed. The extracted aspects and sentiment terms by E-NER are trained to a Convolutional Neural Network (CNN) using Word embedding’s technique. Aspect categorybased polarity prediction is evaluated using NLTK Vader Sentiment package. The proposed model was compared to traditional rule-based approach, and the proposed dynamic model proved to yield better results by 17% when validated in terms of correctly classified instances, accuracy, precision, recall and F-Score using machine learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document