Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation

Named Entity Recognition (NER) is the fundamental task for Natural Language Processing (NLP) and the initial step in building a Knowledge Graph (KG). Recently, BERT (Bidirectional Encoder Representations from Transformers), which is a pre-training model, has achieved state-of-the-art (SOTA) results in various NLP tasks, including the NER. However, Chinese NER is still a more challenging task for BERT because there are no physical separations between Chinese words, and BERT can only obtain the representations of Chinese characters. Nevertheless, the Chinese NER cannot be well handled with character-level representations, because the meaning of a Chinese word is quite different from that of the characters, which make up the word. ERNIE (Enhanced Representation through kNowledge IntEgration), which is an improved pre-training model of BERT, is more suitable for Chinese NER because it is designed to learn language representations enhanced by the knowledge masking strategy. However, the potential of ERNIE has not been fully explored. ERNIE only utilizes the token-level features and ignores the sentence-level feature when performing the NER task. In this paper, we propose the ERNIE-Joint, which is a joint model based on ERNIE. The ERNIE-Joint can utilize both the sentence-level and token-level features by joint training the NER and text classification tasks. In order to use the raw NER datasets for joint training and avoid additional annotations, we perform the text classification task according to the number of entities in the sentences. The experiments are conducted on two datasets: MSRA-NER and Weibo. These datasets contain Chinese news data and Chinese social media data, respectively. The results demonstrate that the ERNIE-Joint not only outperforms BERT and ERNIE but also achieves the SOTA results on both datasets.

Download Full-text

Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach

Computational Linguistics ◽

10.1162/089120105775299177 ◽

2005 ◽

Vol 31 (4) ◽

pp. 531-574 ◽

Cited By ~ 95

Author(s):

Jianfeng Gao ◽

Mu Li ◽

Chang-Ning Huang ◽

Andi Wu

Keyword(s):

Language Processing ◽

Named Entity Recognition ◽

Word Segmentation ◽

Computer Applications ◽

Entity Recognition ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Pragmatic Approach ◽

Named Entity ◽

Test Sets

This article presents a pragmatic approach to Chinese word segmentation. It differs from most previous approaches mainly in three respects. First, while theoretical linguists have defined Chinese words using various linguistic criteria, Chinese words in this study are defined pragmatically as segmentation units whose definition depends on how they are used and processed in realistic computer applications. Second, we propose a pragmatic mathematical framework in which segmenting known words and detecting unknown words of different types (i.e., morphologically derived words, factoids, named entities, and other unlisted words) can be performed simultaneously in a unified way. These tasks are usually conducted separately in other systems. Finally, we do not assume the existence of a universal word segmentation standard that is application-independent. Instead, we argue for the necessity of multiple segmentation standards due to the pragmatic fact that different natural language processing applications might require different granularities of Chinese words. These pragmatic approaches have been implemented in an adaptive Chinese word segmenter, called MSRSeg, which will be described in detail. It consists of two components: (1) a generic segmenter that is based on the framework of linear mixture models and provides a unified approach to the five fundamental features of word-level Chinese language processing: lexicon word processing, morphological analysis, factoid detection, named entity recognition, and new word identification; and (2) a set of output adaptors for adapting the output of (1) to different application-specific standards. Evaluation on five test sets with different standards shows that the adaptive system achieves state-of-the-art performance on all the test sets.

Download Full-text

Improving Thai word segmentation with Named Entity Recognition

2010 10th International Symposium on Communications and Information Technologies ◽

10.1109/iscit.2010.5665124 ◽

2010 ◽

Cited By ~ 5

Author(s):

Sayan Tepdang ◽

Choochart Haruechaiyasak ◽

Rachada Kongkachandra

Keyword(s):

Named Entity Recognition ◽

Word Segmentation ◽

Entity Recognition ◽

Named Entity

Download Full-text

Isarn Dharma Word Segmentation Using a Statistical Approach with Named Entity Recognition

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3359990 ◽

2020 ◽

Vol 19 (2) ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Sittichai Somsap ◽

Pusadee Seresangtakul

Keyword(s):

Statistical Approach ◽

Named Entity Recognition ◽

Word Segmentation ◽

Entity Recognition ◽

Named Entity

Download Full-text

A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6443 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9090-9097

Author(s):

Niels Van der Heijden ◽

Samira Abnar ◽

Ekaterina Shutova

Keyword(s):

Language Processing ◽

State Of The Art ◽

Named Entity Recognition ◽

Entity Recognition ◽

Word Embeddings ◽

Named Entity ◽

Pos Tagging ◽

Part Of Speech ◽

Joint Training ◽

Comprehensive Comparison

The lack of annotated data in many languages is a well-known challenge within the field of multilingual natural language processing (NLP). Therefore, many recent studies focus on zero-shot transfer learning and joint training across languages to overcome data scarcity for low-resource languages. In this work we (i) perform a comprehensive comparison of state-of-the-art multilingual word and sentence encoders on the tasks of named entity recognition (NER) and part of speech (POS) tagging; and (ii) propose a new method for creating multilingual contextualized word embeddings, compare it to multiple baselines and show that it performs at or above state-of-the-art level in zero-shot transfer settings. Finally, we show that our method allows for better knowledge sharing across languages in a joint training setting.

Download Full-text