Joint Model of Korean Part-of-Speech Tagging and Dependency Parsing with Partial Tagged Corpus

Joint morphological and syntactic analysis has been proposed as a way of improving parsing accuracy for richly inflected languages. Starting from a transition-based model for joint part-of-speech tagging and dependency parsing, we explore different ways of integrating morphological features into the model. We also investigate the use of rule-based morphological analyzers to provide hard or soft lexical constraints and the use of word clusters to tackle the sparsity of lexical features. Evaluation on five morphologically rich languages (Czech, Finnish, German, Hungarian, and Russian) shows consistent improvements in both morphological and syntactic accuracy for joint prediction over a pipeline model, with further improvements thanks to lexical constraints and word clusters. The final results improve the state of the art in dependency parsing for all languages.

Download Full-text

To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP

Computational Linguistics ◽

10.1162/coli_a_00425 ◽

2021 ◽

pp. 1-38

Author(s):

Gözde Gül Şahin

Keyword(s):

Language Models ◽

Semantic Role ◽

Semantic Role Labeling ◽

Dependency Parsing ◽

Low Resource ◽

Part Of Speech Tagging ◽

High Resource ◽

Part Of Speech ◽

Augmentation Techniques ◽

Speech Tagging

Abstract Data-hungry deep neural networks have established themselves as the defacto standard for many NLP tasks including the traditional sequence tagging ones. Despite their state-of-the-art performance on high-resource languages, they still fall behind of their statistical counter-parts in low-resource scenarios. One methodology to counter attack this problem is text augmentation, i.e., generating new synthetic training data points from existing data. Although NLP has recently witnessed a load of textual augmentation techniques, the field still lacks a systematic performance analysis on a diverse set of languages and sequence tagging tasks. To fill this gap, we investigate three categories of text augmentation methodologies which perform changes on the syntax (e.g., cropping sub-sentences), token (e.g., random word insertion) and character (e.g., character swapping) levels.We systematically compare the methods on part-of-speech tagging, dependency parsing and semantic role labeling for a diverse set of language families using various models including the architectures that rely on pretrained multilingual contextualized language models such as mBERT. Augmentation most significantly improves dependency parsing, followed by part-of-speech tagging and semantic role labeling. We find the experimented techniques to be effective on morphologically rich languages in general rather than analytic languages such as Vietnamese. Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT, especially for dependency parsing. We identify the character-level methods as the most consistent performers, while synonym replacement and syntactic augmenters provide inconsistent improvements. Finally, we discuss that the results most heavily depend on the task, language pair (e.g., syntactic-level techniques mostly benefit higher-level tasks and morphologically richer languages), and the model type (e.g., token-level augmentation provide significant improvements for BPE, while character-level ones give generally higher scores for char and mBERT based models).

Download Full-text

Women’s Syntactic Resilience and Men’s Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing

10.18653/v1/p19-1339 ◽

2019 ◽

Cited By ~ 1

Author(s):

Aparna Garimella ◽

Carmen Banea ◽

Dirk Hovy ◽

Rada Mihalcea

Keyword(s):

Gender Bias ◽

Dependency Parsing ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Neural Joint Model for Part-of-Speech Tagging and Entity Extraction

2021 13th International Conference on Machine Learning and Computing ◽

10.1145/3457682.3457718 ◽

2021 ◽

Author(s):

Wazir Ali ◽

Rajesh Kumar ◽

Yong Dai ◽

Jay Kumar ◽

Saifullah Tumrani

Keyword(s):

Joint Model ◽

Entity Extraction ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

LVBERT: Transformer-Based Model for Latvian Language Understanding

Frontiers in Artificial Intelligence and Applications - Human Language Technologies – The Baltic Perspective ◽

10.3233/faia200610 ◽

2020 ◽

Author(s):

Artūrs Znotiņš ◽

Guntis Barzdiņš

Keyword(s):

State Of The Art ◽

Language Model ◽

Named Entity Recognition ◽

Entity Recognition ◽

Future Research ◽

Dependency Parsing ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

This paper presents LVBERT – the first publicly available monolingual language model pre-trained for Latvian. We show that LVBERT improves the state-of-the-art for three Latvian NLP tasks including Part-of-Speech tagging, Named Entity Recognition and Universal Dependency parsing. We release LVBERT to facilitate future research and downstream applications for Latvian NLP.

Download Full-text

A Joint Model for Vietnamese Part-of-Speech Tagging Using Dual Decomposition

Knowledge-Based Information Systems in Practice - Smart Innovation, Systems and Technologies ◽

10.1007/978-3-319-13545-8_20 ◽

2015 ◽

pp. 353-367

Author(s):

Ngo Xuan Bach ◽

Kunihiko Hiraishi ◽

Nguyen Le Minh ◽

Akira Shimazu

Keyword(s):

Joint Model ◽

Dual Decomposition ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Exploiting meta features for dependency parsing and part-of-speech tagging

Artificial Intelligence ◽

10.1016/j.artint.2015.09.002 ◽

2016 ◽

Vol 230 ◽

pp. 173-191 ◽

Cited By ~ 3

Author(s):

Wenliang Chen ◽

Min Zhang ◽

Yue Zhang ◽

Xiangyu Duan

Keyword(s):

Dependency Parsing ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Lexical Rule and Lexicon Effect for Part of Speech Tagging Bahasa Madura

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v18i1.332 ◽

2018 ◽

Vol 18 (1) ◽

pp. 65-72

Author(s):

Nindian Puspa Dewi ◽

Ubaidi Ubaidi

Keyword(s):

Text Processing ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Speech Tagging ◽

Bahasa Indonesia

POS Tagging adalah dasar untuk pengembangan Text Processing suatu bahasa. Dalam penelitian ini kita meneliti pengaruh penggunaan lexicon dan perubahan morfologi kata dalam penentuan tagset yang tepat untuk suatu kata. Aturan dengan pendekatan morfologi kata seperti awalan, akhiran, dan sisipan biasa disebut sebagai lexical rule. Penelitian ini menerapkan lexical rule hasil learner dengan menggunakan algoritma Brill Tagger. Bahasa Madura adalah bahasa daerah yang digunakan di Pulau Madura dan beberapa pulau lainnya di Jawa Timur. Objek penelitian ini menggunakan Bahasa Madura yang memiliki banyak sekali variasi afiksasi dibandingkan dengan Bahasa Indonesia. Pada penelitian ini, lexicon selain digunakan untuk pencarian kata dasar Bahasa Madura juga digunakan sebagai salah satu tahap pemberian POS Tagging. Hasil ujicoba dengan menggunakan lexicon mencapai akurasi yaitu 86.61% sedangkan jika tidak menggunakan lexicon hanya mencapai akurasi 28.95 %. Dari sini dapat disimpulkan bahwa ternyata lexicon sangat berpengaruh terhadap POS Tagging.

Download Full-text