Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection

For languages with no annotated resources, transferring knowledge from rich-resource languages is an effective solution for named entity recognition (NER). While all existing methods directly transfer from source-learned model to a target language, in this paper, we propose to fine-tune the learned model with a few similar examples given a test case, which could benefit the prediction by leveraging the structural and semantic information conveyed in such similar examples. To this end, we present a meta-learning algorithm to find a good model parameter initialization that could fast adapt to the given test case and propose to construct multiple pseudo-NER tasks for meta-training by computing sentence similarities. To further improve the model's generalization ability across different languages, we introduce a masking scheme and augment the loss function with an additional maximum term during meta-training. We conduct extensive experiments on cross-lingual named entity recognition with minimal resources over five target languages. The results show that our approach significantly outperforms existing state-of-the-art methods across the board.

Download Full-text

Towards a Novel Weakly Supervised Joint Approach of Named Entity Recognition and Normalization for Noisy Text

SSRN Electronic Journal ◽

10.2139/ssrn.3176123 ◽

2018 ◽

Author(s):

Assia Mezhar ◽

Mohammed Ramdani ◽

Amal El Mzabi

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Joint Approach ◽

Weakly Supervised ◽

Noisy Text

Download Full-text

Finding next of kin: Cross-lingual embedding spaces for related languages

Natural Language Engineering ◽

10.1017/s1351324919000354 ◽

2019 ◽

Vol 26 (2) ◽

pp. 163-182 ◽

Cited By ~ 1

Author(s):

Serge Sharoff

Keyword(s):

Similarity Measure ◽

Named Entity Recognition ◽

Entity Recognition ◽

Levenshtein Distance ◽

Next Of Kin ◽

Named Entity ◽

Genre Classification ◽

Lexical Similarity ◽

Cross Lingual ◽

Embedding Methods

AbstractSome languages have very few NLP resources, while many of them are closely related to better-resourced languages. This paper explores how the similarity between the languages can be utilised by porting resources from better- to lesser-resourced languages. The paper introduces a way of building a representation shared across related languages by combining cross-lingual embedding methods with a lexical similarity measure which is based on the weighted Levenshtein distance. One of the outcomes of the experiments is a Panslavonic embedding space for nine Balto-Slavonic languages. The paper demonstrates that the resulting embedding space helps in such applications as morphological prediction, named-entity recognition and genre classification.

Download Full-text