HAMNER: Headword Amplified Multi-Span Distantly Supervised Method for Domain Specific Named Entity Recognition

Shifeng Liu; Yifang Sun; Bing Li; Wei Wang; Xiang Zhao

doi:10.1609/aaai.v34i05.6358

HAMNER: Headword Amplified Multi-Span Distantly Supervised Method for Domain Specific Named Entity Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6358 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8401-8408 ◽

Cited By ~ 1

Author(s):

Shifeng Liu ◽

Yifang Sun ◽

Bing Li ◽

Wei Wang ◽

Xiang Zhao

Keyword(s):

Dynamic Programming Algorithm ◽

Named Entity Recognition ◽

Entity Recognition ◽

Programming Algorithm ◽

Exact Matching ◽

Named Entity ◽

Distant Supervision ◽

Benchmark Datasets ◽

Supervised Methods ◽

Level Model

To tackle Named Entity Recognition (NER) tasks, supervised methods need to obtain sufficient cleanly annotated data, which is labor and time consuming. On the contrary, distantly supervised methods acquire automatically annotated data using dictionaries to alleviate this requirement. Unfortunately, dictionaries hinder the effectiveness of distantly supervised methods for NER due to its limited coverage, especially in specific domains. In this paper, we aim at the limitations of the dictionary usage and mention boundary detection. We generalize the distant supervision by extending the dictionary with headword based non-exact matching. We apply a function to better weight the matched entity mentions. We propose a span-level model, which classifies all the possible spans then infers the selected spans with a proposed dynamic programming algorithm. Experiments on all three benchmark datasets demonstrate that our method outperforms previous state-of-the-art distantly supervised methods.

Download Full-text

Towards corpus and model: Hierarchical structured-attention-based features for Indonesian named entity recognition

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202286 ◽

2021 ◽

pp. 1-12

Author(s):

Yingwen Fu ◽

Nankai Lin ◽

Xiaotian Lin ◽

Shengyi Jiang

Keyword(s):

Language Processing ◽

State Of The Art ◽

Named Entity Recognition ◽

Entity Recognition ◽

Language Models ◽

Neural Models ◽

Performance Models ◽

Named Entity ◽

High Resource ◽

Benchmark Datasets

Named entity recognition (NER) is fundamental to natural language processing (NLP). Most state-of-the-art researches on NER are based on pre-trained language models (PLMs) or classic neural models. However, these researches are mainly oriented to high-resource languages such as English. While for Indonesian, related resources (both in dataset and technology) are not yet well-developed. Besides, affix is an important word composition for Indonesian language, indicating the essentiality of character and token features for token-wise Indonesian NLP tasks. However, features extracted by currently top-performance models are insufficient. Aiming at Indonesian NER task, in this paper, we build an Indonesian NER dataset (IDNER) comprising over 50 thousand sentences (over 670 thousand tokens) to alleviate the shortage of labeled resources in Indonesian. Furthermore, we construct a hierarchical structured-attention-based model (HSA) for Indonesian NER to extract sequence features from different perspectives. Specifically, we use an enhanced convolutional structure as well as an enhanced attention structure to extract deeper features from characters and tokens. Experimental results show that HSA establishes competitive performance on IDNER and three benchmark datasets.

Download Full-text

Named Entity Recognition and Relation Extraction

ACM Computing Surveys ◽

10.1145/3445965 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-39

Author(s):

Zara Nasar ◽

Syed Waqar Jaffry ◽

Muhammad Kamran Malik

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Named Entity Recognition ◽

Relation Extraction ◽

The State ◽

Entity Recognition ◽

Joint Models ◽

Named Entity ◽

Textual Data ◽

Benchmark Datasets

With the advent of Web 2.0, there exist many online platforms that result in massive textual-data production. With ever-increasing textual data at hand, it is of immense importance to extract information nuggets from this data. One approach towards effective harnessing of this unstructured textual data could be its transformation into structured text. Hence, this study aims to present an overview of approaches that can be applied to extract key insights from textual data in a structured way. For this, Named Entity Recognition and Relation Extraction are being majorly addressed in this review study. The former deals with identification of named entities, and the latter deals with problem of extracting relation between set of entities. This study covers early approaches as well as the developments made up till now using machine learning models. Survey findings conclude that deep-learning-based hybrid and joint models are currently governing the state-of-the-art. It is also observed that annotated benchmark datasets for various textual-data generators such as Twitter and other social forums are not available. This scarcity of dataset has resulted into relatively less progress in these domains. Additionally, the majority of the state-of-the-art techniques are offline and computationally expensive. Last, with increasing focus on deep-learning frameworks, there is need to understand and explain the under-going processes in deep architectures.

Download Full-text

Pattern-enhanced Named Entity Recognition with Distant Supervision

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378052 ◽

2020 ◽

Author(s):

Xuan Wang ◽

Yingjun Guan ◽

Yu Zhang ◽

Qi Li ◽

Jiawei Han

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Distant Supervision

Download Full-text

Recursively Binary Modification Model for Nested Named Entity Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6329 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8164-8171

Author(s):

Bing Li ◽

Shifeng Liu ◽

Yifang Sun ◽

Wei Wang ◽

Xiang Zhao

Keyword(s):

Strong Evidence ◽

State Of The Art ◽

Named Entity Recognition ◽

Bayesian Framework ◽

Entity Recognition ◽

Named Entities ◽

Named Entity ◽

Nested Structures ◽

Benchmark Datasets ◽

Head Component

Recently, there has been an increasing interest in identifying named entities with nested structures. Existing models only make independent typing decisions on the entire entity span while ignoring strong modification relations between sub-entity types. In this paper, we present a novel Recursively Binary Modification model for nested named entity recognition. Our model utilizes the modification relations among sub-entities types to infer the head component on top of a Bayesian framework and uses entity head as a strong evidence to determine the type of the entity span. The process is recursive, allowing lower-level entities to help better model those on the outer-level. To the best of our knowledge, our work is the first effort that uses modification relation in nested NER task. Extensive experiments on four benchmark datasets demonstrate that our model outperforms state-of-the-art models in nested NER tasks, and delivers competitive results with state-of-the-art models in flat NER task, without relying on any extra annotations or NLP tools.

Download Full-text

Multi-task learning for Chinese clinical named entity recognition with external knowledge

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01717-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Ming Cheng ◽

Shufeng Xiong ◽

Fei Li ◽

Pan Liang ◽

Jianbo Gao

Keyword(s):

Large Scale ◽

Named Entity Recognition ◽

Entity Recognition ◽

Data Driven ◽

Named Entity ◽

Proposed Model ◽

Comparison Results ◽

Benchmark Datasets ◽

Public Datasets ◽

F Measure

Abstract Background Named entity recognition (NER) on Chinese electronic medical/healthcare records has attracted significantly attentions as it can be applied to building applications to understand these records. Most previous methods have been purely data-driven, requiring high-quality and large-scale labeled medical data. However, labeled data is expensive to obtain, and these data-driven methods are difficult to handle rare and unseen entities. Methods To tackle these problems, this study presents a novel multi-task deep neural network model for Chinese NER in the medical domain. We incorporate dictionary features into neural networks, and a general secondary named entity segmentation is used as auxiliary task to improve the performance of the primary task of named entity recognition. Results In order to evaluate the proposed method, we compare it with other currently popular methods, on three benchmark datasets. Two of the datasets are publicly available, and the other one is constructed by us. Experimental results show that the proposed model achieves 91.07% average f-measure on the two public datasets and 87.05% f-measure on private dataset. Conclusions The comparison results of different models demonstrated the effectiveness of our model. The proposed model outperformed traditional statistical models.

Download Full-text

Named Entity Recognition Using Distant Supervision and Active Bagging

Journal of KIISE ◽

10.5626/jok.2016.43.2.269 ◽

2016 ◽

Vol 43 (2) ◽

pp. 269-274 ◽

Cited By ~ 1

Author(s):

Seong-hee Lee ◽

Yeong-kil Song ◽

Hark-soo Kim

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Distant Supervision

Download Full-text

Adaptive Named Entity Recognition Using Distant Supervision for Contemporary Written Texts

IEEE Access ◽

10.1109/access.2021.3067315 ◽

2021 ◽

pp. 1-1

Author(s):

Juae Kim ◽

Yejin Kim ◽

Sangwoo Kang ◽

Jungyun Seo

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Distant Supervision ◽

Written Texts

Download Full-text

Bagging-based active learning model for named entity recognition with distant supervision

2016 International Conference on Big Data and Smart Computing (BigComp) ◽

10.1109/bigcomp.2016.7425938 ◽

2016 ◽

Author(s):

Sunghee Lee ◽

Yeongkil Song ◽

Maengsik Choi ◽

Harksoo Kim

Keyword(s):

Active Learning ◽

Named Entity Recognition ◽

Learning Model ◽

Entity Recognition ◽

Named Entity ◽

Distant Supervision

Download Full-text

A Statistical Language Model for Pre-Trained Sequence Labeling: A Case Study on Vietnamese

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3483524 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-21

Author(s):

Xianwen Liao ◽

Yongzhong Huang ◽

Peng Yang ◽

Lei Chen

Keyword(s):

Language Model ◽

Dynamic Programming Algorithm ◽

Named Entity Recognition ◽

Word Segmentation ◽

Training Data ◽

Entity Recognition ◽

Divide And Conquer ◽

Programming Algorithm ◽

Statistical Language Model ◽

Sequence Labeling

By defining the computable word segmentation unit and studying its probability characteristics, we establish an unsupervised statistical language model (SLM) for a new pre-trained sequence labeling framework in this article. The proposed SLM is an optimization model, and its objective is to maximize the total binding force of all candidate word segmentation units in sentences under the condition of no annotated datasets and vocabularies. To solve SLM, we design a recursive divide-and-conquer dynamic programming algorithm. By integrating SLM with the popular sequence labeling models, Vietnamese word segmentation, part-of-speech tagging and named entity recognition experiments are performed. The experimental results show that our SLM can effectively promote the performance of sequence labeling tasks. Just using less than 10% of training data and without using a dictionary, the performance of our sequence labeling framework is better than the state-of-the-art Vietnamese word segmentation toolkit VnCoreNLP on the cross-dataset test. SLM has no hyper-parameter to be tuned, and it is completely unsupervised and applicable to any other analytic language. Thus, it has good domain adaptability.

Download Full-text

Named Entity Recognition for Open Domain Data Based on Distant Supervision

Communications in Computer and Information Science - Knowledge Graph and Semantic Computing: Knowledge Computing and Language Understanding ◽

10.1007/978-981-15-1956-7_17 ◽

2019 ◽

pp. 185-197

Author(s):

Junshuang Wu ◽

Richong Zhang ◽

Ting Deng ◽

Jinpeng Huai

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Open Domain ◽

Named Entity ◽

Distant Supervision

Download Full-text