Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning

We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.

Download Full-text

Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios

10.18653/v1/2020.emnlp-main.391 ◽

2020 ◽

Author(s):

Ramy Eskander ◽

Smaranda Muresan ◽

Michael Collins

Keyword(s):

Low Resource ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Cross Lingual ◽

Speech Tagging

Download Full-text

Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00205 ◽

2013 ◽

Vol 1 ◽

pp. 1-12 ◽

Cited By ~ 23

Author(s):

Oscar Täckström ◽

Dipanjan Das ◽

Slav Petrov ◽

Ryan McDonald ◽

Joakim Nivre

Keyword(s):

Conditional Random Field ◽

Target Language ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

European Languages ◽

Partially Observed ◽

Resource Poor ◽

Conditional Random Field Model ◽

Cross Lingual ◽

Speech Tagging

We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random field model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages.

Download Full-text

Cross-lingual Annotation Projection Is Effective for Neural Part-of-Speech Tagging

10.18653/v1/w19-1425 ◽

2019 ◽

Cited By ~ 1

Author(s):

Matthias Huck ◽

Diana Dutka ◽

Alexander Fraser

Keyword(s):

Part Of Speech Tagging ◽

Part Of Speech ◽

Cross Lingual ◽

Speech Tagging

Download Full-text

A Cross-lingual Part-of-Speech Tagging for Malay Language

Proceedings of the International Conference on Agents and Artificial Intelligence ◽

10.5220/0005150602320240 ◽

2015 ◽

Author(s):

Norshuhani Zamin ◽

Zainab Abu Bakar

Keyword(s):

Part Of Speech Tagging ◽

Part Of Speech ◽

Cross Lingual ◽

Speech Tagging

Download Full-text

Lexical Rule and Lexicon Effect for Part of Speech Tagging Bahasa Madura

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v18i1.332 ◽

2018 ◽

Vol 18 (1) ◽

pp. 65-72

Author(s):

Nindian Puspa Dewi ◽

Ubaidi Ubaidi

Keyword(s):

Text Processing ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Speech Tagging ◽

Bahasa Indonesia

POS Tagging adalah dasar untuk pengembangan Text Processing suatu bahasa. Dalam penelitian ini kita meneliti pengaruh penggunaan lexicon dan perubahan morfologi kata dalam penentuan tagset yang tepat untuk suatu kata. Aturan dengan pendekatan morfologi kata seperti awalan, akhiran, dan sisipan biasa disebut sebagai lexical rule. Penelitian ini menerapkan lexical rule hasil learner dengan menggunakan algoritma Brill Tagger. Bahasa Madura adalah bahasa daerah yang digunakan di Pulau Madura dan beberapa pulau lainnya di Jawa Timur. Objek penelitian ini menggunakan Bahasa Madura yang memiliki banyak sekali variasi afiksasi dibandingkan dengan Bahasa Indonesia. Pada penelitian ini, lexicon selain digunakan untuk pencarian kata dasar Bahasa Madura juga digunakan sebagai salah satu tahap pemberian POS Tagging. Hasil ujicoba dengan menggunakan lexicon mencapai akurasi yaitu 86.61% sedangkan jika tidak menggunakan lexicon hanya mencapai akurasi 28.95 %. Dari sini dapat disimpulkan bahwa ternyata lexicon sangat berpengaruh terhadap POS Tagging.

Download Full-text