Fine-Grained Named Entity Recognition Using a Multi-Stacked Feature Fusion and Dual-Stacked Output in Korean

Hongjin Kim; Harksoo Kim

doi:10.3390/app112210795

Fine-Grained Named Entity Recognition Using a Multi-Stacked Feature Fusion and Dual-Stacked Output in Korean

Applied Sciences ◽

10.3390/app112210795 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10795

Author(s):

Hongjin Kim ◽

Harksoo Kim

Keyword(s):

Feature Fusion ◽

Named Entity Recognition ◽

Entity Recognition ◽

Coarse Grained ◽

Experimental Result ◽

Unbalanced Data ◽

Fine Grained ◽

Named Entity ◽

Proposed Model ◽

Different Levels

Named entity recognition (NER) is a natural language processing task to identify spans that mention named entities and to annotate them with predefined named entity classes. Although many NER models based on machine learning have been proposed, their performance in terms of processing fine-grained NER tasks was less than acceptable. This is because the training data of a fine-grained NER task is much more unbalanced than those of a coarse-grained NER task. To overcome the problem presented by unbalanced data, we propose a fine-grained NER model that compensates for the sparseness of fine-grained NEs by using the contextual information of coarse-grained NEs. From another viewpoint, many NER models have used different levels of features, such as part-of-speech tags and gazetteer look-up results, in a nonhierarchical manner. Unfortunately, these models experience the feature interference problem. Our solution to this problem is to adopt a multi-stacked feature fusion scheme, which accepts different levels of features as its input. The proposed model is based on multi-stacked long short-term memories (LSTMs) with a multi-stacked feature fusion layer for acquiring multilevel embeddings and a dual-stacked output layer for predicting fine-grained NEs based on the categorical information of coarse-grained NEs. Our experiments indicate that the proposed model is capable of state-of-the-art performance. The results show that the proposed model can effectively alleviate the unbalanced data problem that frequently occurs in a fine-grained NER task. In addition, the multi-stacked feature fusion layer contributes to the improvement of NER performance, confirming that the proposed model can alleviate the feature interference problem. Based on this experimental result, we conclude that the proposed model is well-designed to effectively perform NER tasks.

Innovative Deep Neural Network Modeling for Fine-Grained Chinese Entity Recognition

Electronics ◽

10.3390/electronics9061001 ◽

2020 ◽

Vol 9 (6) ◽

pp. 1001 ◽

Cited By ~ 1

Author(s):

Jingang Liu ◽

Chunhe Xia ◽

Haihua Yan ◽

Wenjing Xu

Keyword(s):

Neural Network ◽

Language Processing ◽

Short Term Memory ◽

Named Entity Recognition ◽

Training Model ◽

Entity Recognition ◽

Coarse Grained ◽

Neural Network Modeling ◽

Fine Grained ◽

Named Entity

Named entity recognition (NER) is a basic but crucial task in the field of natural language processing (NLP) and big data analysis. The recognition of named entities based on Chinese is more complicated and difficult than English, which makes the task of NER in Chinese more challenging. In particular, fine-grained named entity recognition is more challenging than traditional named entity recognition tasks, mainly because fine-grained tasks have higher requirements for the ability of automatic feature extraction and information representation of deep neural models. In this paper, we propose an innovative neural network model named En2BiLSTM-CRF to improve the effect of fine-grained Chinese entity recognition tasks. This proposed model including the initial encoding layer, the enhanced encoding layer, and the decoding layer combines the advantages of pre-training model encoding, dual bidirectional long short-term memory (BiLSTM) networks, and a residual connection mechanism. Hence, it can encode information multiple times and extract contextual features hierarchically. We conducted sufficient experiments on two representative datasets using multiple important metrics and compared them with other advanced baselines. We present promising results showing that our proposed En2BiLSTM-CRF has better performance as well as better generalization ability in both fine-grained and coarse-grained Chinese entity recognition tasks.

An Attention-Based Model Using Character Composition of Entities in Chinese Relation Extraction

Information ◽

10.3390/info11020079 ◽

2020 ◽

Vol 11 (2) ◽

pp. 79 ◽

Cited By ~ 2

Author(s):

Xiaoyu Han ◽

Yue Zhang ◽

Wenkai Zhang ◽

Tinglei Huang

Keyword(s):

Language Processing ◽

Large Scale ◽

Named Entity Recognition ◽

Relation Extraction ◽

Entity Recognition ◽

Additional Information ◽

Named Entity ◽

Proposed Model ◽

The Relationship ◽

Crucial Part

Relation extraction is a vital task in natural language processing. It aims to identify the relationship between two specified entities in a sentence. Besides information contained in the sentence, additional information about the entities is verified to be helpful in relation extraction. Additional information such as entity type getting by NER (Named Entity Recognition) and description provided by knowledge base both have their limitations. Nevertheless, there exists another way to provide additional information which can overcome these limitations in Chinese relation extraction. As Chinese characters usually have explicit meanings and can carry more information than English letters. We suggest that characters that constitute the entities can provide additional information which is helpful for the relation extraction task, especially in large scale datasets. This assumption has never been verified before. The main obstacle is the lack of large-scale Chinese relation datasets. In this paper, first, we generate a large scale Chinese relation extraction dataset based on a Chinese encyclopedia. Second, we propose an attention-based model using the characters that compose the entities. The result on the generated dataset shows that these characters can provide useful information for the Chinese relation extraction task. By using this information, the attention mechanism we used can recognize the crucial part of the sentence that can express the relation. The proposed model outperforms other baseline models on our Chinese relation extraction dataset.

Fine-grained Dutch named entity recognition

Language Resources and Evaluation ◽

10.1007/s10579-013-9255-y ◽

2013 ◽

Vol 48 (2) ◽

pp. 307-343 ◽

Cited By ~ 7

Author(s):

Bart Desmet ◽

Véronique Hoste

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Fine Grained ◽

Named Entity

Fine-Grained Named Entity Recognition in Question Answering with DBpedia

Journal of Physics Conference Series ◽

10.1088/1742-6596/1087/3/032003 ◽

2018 ◽

Vol 1087 ◽

pp. 032003

Author(s):

Shimin Zhong ◽

Yajun Du ◽

Zhen Wei Gao

Keyword(s):

Question Answering ◽

Named Entity Recognition ◽

Entity Recognition ◽

Fine Grained ◽

Named Entity

Fine-Grained Mechanical Chinese Named Entity Recognition Based on ALBERT-AttBiLSTM-CRF and Transfer Learning

Symmetry ◽

10.3390/sym12121986 ◽

2020 ◽

Vol 12 (12) ◽

pp. 1986

Author(s):

Liguo Yao ◽

Haisong Huang ◽

Kuan-Wei Wang ◽

Shih-Huan Chen ◽

Qiaoqiao Xiong

Keyword(s):

Active Learning ◽

Manufacturing Industry ◽

Learning Strategy ◽

Named Entity Recognition ◽

Entity Recognition ◽

Utilization Rate ◽

Data Types ◽

Fine Grained ◽

Named Entity ◽

Model Transfer

Manufacturing text often exists as unlabeled data; the entity is fine-grained and the extraction is difficult. The above problems mean that the manufacturing industry knowledge utilization rate is low. This paper proposes a novel Chinese fine-grained NER (named entity recognition) method based on symmetry lightweight deep multinetwork collaboration (ALBERT-AttBiLSTM-CRF) and model transfer considering active learning (MTAL) to research fine-grained named entity recognition of a few labeled Chinese textual data types. The method is divided into two stages. In the first stage, the ALBERT-AttBiLSTM-CRF was applied for verification in the CLUENER2020 dataset (Public dataset) to get a pretrained model; the experiments show that the model obtains an F1 score of 0.8962, which is better than the best baseline algorithm, an improvement of 9.2%. In the second stage, the pretrained model was transferred into the Manufacturing-NER dataset (our dataset), and we used the active learning strategy to optimize the model effect. The final F1 result of Manufacturing-NER was 0.8931 after the model transfer (it was higher than 0.8576 before the model transfer); so, this method represents an improvement of 3.55%. Our method effectively transfers the existing knowledge from public source data to scientific target data, solving the problem of named entity recognition with scarce labeled domain data, and proves its effectiveness.

Fine-grained Chinese Named Entity Recognition in Entertainment News Using Adversarial Multi-task Learning

2019 IEEE 5th International Conference on Computer and Communications (ICCC) ◽

10.1109/iccc47050.2019.9064233 ◽

2019 ◽

Author(s):

Xu Man ◽

Peng Yang

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Fine Grained ◽

Named Entity ◽

Task Learning

Named entity recognition for Polish

Poznan Studies in Contemporary Linguistics ◽

10.1515/psicl-2019-0010 ◽

2019 ◽

Vol 55 (2) ◽

pp. 239-269

Author(s):

Michał Marcińczuk ◽

Aleksander Wawer

Keyword(s):

Open Source ◽

State Of The Art ◽

Proper Names ◽

Named Entity Recognition ◽

Entity Recognition ◽

Coarse Grained ◽

Named Entity ◽

Current State ◽

Annotated Corpora ◽

Available Resources

Abstract In this article we discuss the current state-of-the-art for named entity recognition for Polish. We present publicly available resources and open-source tools for named entity recognition. The overview includes various kind of resources, i.e. guidelines, annotated corpora (NKJP, KPWr, CEN, PST) and lexicons (NELexiconS, PNET, Gazetteer). We present the major NER tools for Polish (Sprout, NERF, Liner2, Parallel LSTM-CRFs and PolDeepNer) and discuss their performance on the reference datasets. In the article we cover identification of named entity mentions in the running text, local and global entity categorization, fine- and coarse-grained categorization and lemmatization of proper names.

Fine-grained named entity recognition and relation extraction for question answering

Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07 ◽

10.1145/1277741.1277915 ◽

2007 ◽

Cited By ~ 14

Author(s):

Changki Lee ◽

Yi-Gyu Hwang ◽

Myung-Gil Jang

Keyword(s):

Question Answering ◽

Named Entity Recognition ◽

Relation Extraction ◽

Entity Recognition ◽

Fine Grained ◽

Named Entity

Fine-Grained Named Entity Recognition in Legal Documents

Lecture Notes in Computer Science - Semantic Systems. The Power of AI and Knowledge Graphs ◽

10.1007/978-3-030-33220-4_20 ◽

2019 ◽

pp. 272-287 ◽

Cited By ~ 2

Author(s):

Elena Leitner ◽

Georg Rehm ◽

Julian Moreno-Schneider

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Fine Grained ◽

Named Entity ◽

Legal Documents

HTLinker: A Head-to-Tail Linker for Nested Named Entity Recognition

Symmetry ◽

10.3390/sym13091596 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1596

Author(s):

Xiang Li ◽

Junan Yang ◽

Hui Liu ◽

Pengjiang Hu

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Semantic Correlation ◽

Named Entity ◽

Unstructured Text ◽

Boundary Recognition ◽

Proposed Model ◽

Boundary Information ◽

Nested Structure ◽

Conditional Information

Named entity recognition (NER) aims to extract entities from unstructured text, and a nested structure often exists between entities. However, most previous studies paid more attention to flair named entity recognition while ignoring nested entities. The importance of words in the text should vary for different entity categories. In this paper, we propose a head-to-tail linker for nested NER. The proposed model exploits the extracted entity head as conditional information to locate the corresponding entity tails under different entity categories. This strategy takes part of the symmetric boundary information of the entity as a condition and effectively leverages the information from the text to improve the entity boundary recognition effectiveness. The proposed model considers the variability in the semantic correlation between tokens for different entity heads under different entity categories. To verify the effectiveness of the model, numerous experiments were implemented on three datasets: ACE2004, ACE2005, and GENIA, with F1-scores of 80.5%, 79.3%, and 76.4%, respectively. The experimental results show that our model is the most effective of all the methods used for comparison.