Part-of-Speech Tagging

Mapping Intimacies ◽

10.1093/oxfordhb/9780199276349.013.0011 ◽

2012 ◽

Author(s):

Atro Voutilainen

Keyword(s):

Markov Models ◽

Language Model ◽

Language Models ◽

Symbolic Language ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Text Corpora ◽

History Of ◽

General Architecture ◽

Speech Tagging

This article outlines the recently used methods for designing part-of-speech taggers; computer programs for assigning contextually appropriate grammatical descriptors to words in texts. It begins with the description of general architecture and task setting. It gives an overview of the history of tagging and describes the central approaches to tagging. These approaches are: taggers based on handwritten local rules, taggers based on n-grams automatically derived from text corpora, taggers based on hidden Markov models, taggers using automatically generated symbolic language models derived using methods from machine tagging, taggers based on handwritten global rules, and hybrid taggers, which combine the advantages of handwritten and automatically generated taggers. This article focuses on handwritten tagging rules. Well-tagged training corpora are a valuable resource for testing and improving language model. The text corpus reminds the grammarian about any oversight while designing a rule.

Download Full-text

A Text Categorization Model Based on Hidden Markov Models

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais539 ◽

2013 ◽

Cited By ~ 1

Author(s):

Kwan Yi ◽

Jamshid Beheshti

Keyword(s):

Text Categorization ◽

Classification Scheme ◽

Markov Models ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Digital Documents ◽

Part Of Speech ◽

Speech Tagging ◽

Standard Library ◽

Categorization Model

The Hidden Markov model (HMM) has been successfully used for speech recognition, part of speech tagging, and pattern recognition. In this study, we apply the HMM to automatically categorize digital documents into a standard library classification scheme. In the proposed framework, A HMM-based system is viewed as a model to generate a list of words and each document is seen as. . .

Download Full-text

To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP

Computational Linguistics ◽

10.1162/coli_a_00425 ◽

2021 ◽

pp. 1-38

Author(s):

Gözde Gül Şahin

Keyword(s):

Language Models ◽

Semantic Role ◽

Semantic Role Labeling ◽

Dependency Parsing ◽

Low Resource ◽

Part Of Speech Tagging ◽

High Resource ◽

Part Of Speech ◽

Augmentation Techniques ◽

Speech Tagging

Abstract Data-hungry deep neural networks have established themselves as the defacto standard for many NLP tasks including the traditional sequence tagging ones. Despite their state-of-the-art performance on high-resource languages, they still fall behind of their statistical counter-parts in low-resource scenarios. One methodology to counter attack this problem is text augmentation, i.e., generating new synthetic training data points from existing data. Although NLP has recently witnessed a load of textual augmentation techniques, the field still lacks a systematic performance analysis on a diverse set of languages and sequence tagging tasks. To fill this gap, we investigate three categories of text augmentation methodologies which perform changes on the syntax (e.g., cropping sub-sentences), token (e.g., random word insertion) and character (e.g., character swapping) levels.We systematically compare the methods on part-of-speech tagging, dependency parsing and semantic role labeling for a diverse set of language families using various models including the architectures that rely on pretrained multilingual contextualized language models such as mBERT. Augmentation most significantly improves dependency parsing, followed by part-of-speech tagging and semantic role labeling. We find the experimented techniques to be effective on morphologically rich languages in general rather than analytic languages such as Vietnamese. Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT, especially for dependency parsing. We identify the character-level methods as the most consistent performers, while synonym replacement and syntactic augmenters provide inconsistent improvements. Finally, we discuss that the results most heavily depend on the task, language pair (e.g., syntactic-level techniques mostly benefit higher-level tasks and morphologically richer languages), and the model type (e.g., token-level augmentation provide significant improvements for BPE, while character-level ones give generally higher scores for char and mBERT based models).

Download Full-text

Amazigh Part-of-Speech Tagging Using Markov Models and Decision Trees

International Journal of Computer Science and Information Technology ◽

10.5121/ijcsit.2016.8505 ◽

2016 ◽

Vol 8 (5) ◽

pp. 61-71 ◽

Cited By ~ 3

Author(s):

Samir AMRI ◽

Lahbib ZENKOUAR ◽

Mohamed OUTAHAJALA

Keyword(s):

Decision Trees ◽

Markov Models ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Morpheme based Language Model for Part-of-Speech Tagging

Polibits ◽

10.17562/pb-38-2 ◽

2008 ◽

Vol 38 ◽

pp. 19-25 ◽

Cited By ~ 6

Author(s):

S. Lakshmana Pandian ◽

T.V. Geetha

Keyword(s):

Language Model ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Self-organizing Markov models and their application to part-of-speech tagging

10.3115/1075096.1075134 ◽

2003 ◽

Cited By ~ 3

Author(s):

Jin-Dong Kim ◽

Hae-Chang Rim ◽

Jun'ich Tsujii

Keyword(s):

Markov Models ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging ◽

Self Organizing

Download Full-text

LVBERT: Transformer-Based Model for Latvian Language Understanding

Frontiers in Artificial Intelligence and Applications - Human Language Technologies – The Baltic Perspective ◽

10.3233/faia200610 ◽

2020 ◽

Author(s):

Artūrs Znotiņš ◽

Guntis Barzdiņš

Keyword(s):

State Of The Art ◽

Language Model ◽

Named Entity Recognition ◽

Entity Recognition ◽

Future Research ◽

Dependency Parsing ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

This paper presents LVBERT – the first publicly available monolingual language model pre-trained for Latvian. We show that LVBERT improves the state-of-the-art for three Latvian NLP tasks including Part-of-Speech tagging, Named Entity Recognition and Universal Dependency parsing. We release LVBERT to facilitate future research and downstream applications for Latvian NLP.

Download Full-text

Lexicalized hidden Markov models for part-of-speech tagging

Proceedings of the 18th conference on Computational linguistics - ◽

10.3115/990820.990890 ◽

2000 ◽

Cited By ~ 13

Author(s):

Sang-Zoo Lee ◽

Jun-ichi Tsujii ◽

Hae-Chang Rim

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

TWO-STAGE MODEL SELECTION WITH PARAMETERS WEIGHTED HIDDEN MARKOV MODELS AND LIKELIHOOD RATIO FOR PART-OF-SPEECH TAGGING

Neural Network World ◽

10.14311/nnw.2012.22.014 ◽

2012 ◽

Vol 22 (3) ◽

pp. 245-262

Author(s):

Shichang Sun ◽

Hongbo Liu ◽

Pixi Zhao ◽

Hongfei Lin

Keyword(s):

Model Selection ◽

Hidden Markov Models ◽

Likelihood Ratio ◽

Markov Models ◽

Hidden Markov ◽

Stage Model ◽

Two Stage ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Lexical Rule and Lexicon Effect for Part of Speech Tagging Bahasa Madura

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v18i1.332 ◽

2018 ◽

Vol 18 (1) ◽

pp. 65-72

Author(s):

Nindian Puspa Dewi ◽

Ubaidi Ubaidi

Keyword(s):

Text Processing ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Speech Tagging ◽

Bahasa Indonesia

POS Tagging adalah dasar untuk pengembangan Text Processing suatu bahasa. Dalam penelitian ini kita meneliti pengaruh penggunaan lexicon dan perubahan morfologi kata dalam penentuan tagset yang tepat untuk suatu kata. Aturan dengan pendekatan morfologi kata seperti awalan, akhiran, dan sisipan biasa disebut sebagai lexical rule. Penelitian ini menerapkan lexical rule hasil learner dengan menggunakan algoritma Brill Tagger. Bahasa Madura adalah bahasa daerah yang digunakan di Pulau Madura dan beberapa pulau lainnya di Jawa Timur. Objek penelitian ini menggunakan Bahasa Madura yang memiliki banyak sekali variasi afiksasi dibandingkan dengan Bahasa Indonesia. Pada penelitian ini, lexicon selain digunakan untuk pencarian kata dasar Bahasa Madura juga digunakan sebagai salah satu tahap pemberian POS Tagging. Hasil ujicoba dengan menggunakan lexicon mencapai akurasi yaitu 86.61% sedangkan jika tidak menggunakan lexicon hanya mencapai akurasi 28.95 %. Dari sini dapat disimpulkan bahwa ternyata lexicon sangat berpengaruh terhadap POS Tagging.

Download Full-text