FLORS: Fast and Simple Domain Adaptation for Part-of-Speech Tagging

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00162 ◽

2014 ◽

Vol 2 ◽

pp. 15-26 ◽

Cited By ~ 9

Author(s):

Tobias Schnabel ◽

Hinrich Schütze

Keyword(s):

Domain Adaptation ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Unknown Words ◽

Speech Tagging

We present FLORS, a new part-of-speech tagger for domain adaptation. FLORS uses robust representations that work especially well for unknown words and for known words with unseen tags. FLORS is simpler and faster than previous domain adaptation methods, yet it has significantly better accuracy than several baselines.

Download Full-text

Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2012-001453 ◽

2013 ◽

Vol 20 (5) ◽

pp. 931-939 ◽

Cited By ~ 16

Author(s):

Jeffrey P Ferraro ◽

Hal Daumé ◽

Scott L DuVall ◽

Wendy W Chapman ◽

Henk Harkema ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Domain Adaptation ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Domain adaptation for part-of-speech tagging of noisy user-generated text

10.18653/v1/n19-1345 ◽

2019 ◽

Cited By ~ 1

Author(s):

Luisa März ◽

Dietrich Trautmann ◽

Benjamin Roth

Keyword(s):

Domain Adaptation ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Part-of-speech tagging for Chinese unknown words in a domain-specific small corpus using morphological and contextual rules

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) ◽

10.1109/nlpke.2010.5587771 ◽

2010 ◽

Author(s):

Tao-Hsing Chang ◽

Fu-Yuan Hsu ◽

Chia-Hoang Lee ◽

Hahn-Ming Lee

Keyword(s):

Domain Specific ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Unknown Words ◽

Speech Tagging

Download Full-text

Pelabelan Kelas Kata Bahasa Jawa Menggunakan Hidden Markov Model

Mobile and Forensics ◽

10.12928/mf.v2i2.2450 ◽

2020 ◽

Vol 2 (2) ◽

pp. 71-83

Author(s):

Mohammad Mursyit ◽

Aji Prasetya Wibawa ◽

Ilham Ari Elbaith Zaeni ◽

Harits Ar Rosyid

Keyword(s):

Short Stories ◽

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Improve Accuracy ◽

Unknown Words ◽

Speech Tagging

Part of Speech TaggingÂ atauÂ POS TaggingÂ adalah proses memberikan label pada setiap kata dalam sebuah kalimat secara otomatis. Penelitian ini menggunakan algoritmaÂ Hidden Markov ModelÂ (HMM) untuk prosesÂ POS Tagging. Perlakuan untukÂ unknown wordsÂ menggunakanÂ Most Probable POS-Tag.Â DatasetÂ yang digunakan berupa 10 cerita pendek berbahasa Jawa terdiri dari 10.180 kata yang telah diberikanÂ tagsetBahasa Jawa. Pada penelitian ini prosesÂ POS TaggingÂ menggunakan dua skenario. Skenario pertama yaitu menggunakan algoritmaÂ Hidden Markov ModelÂ (HMM) tanpa menggunakan perlakuan untukÂ unknown words. Skenario yang kedua menggunakan HMM danÂ Most Probable POS-TagÂ untukÂ perlakuan unknown words. Hasil menunjukan skenario pertama menghasilkan akurasi sebesar 45.5% dan skenario kedua menghasilkan akurasi sebesar 70.78%.Â Most Probable POS-TagÂ dapat meningkatkan akurasi padaÂ POS TaggingÂ tetapi tidak selalu menunjukan hasil yang benar dalam pemberian label.Â Most Probable POS-TagÂ dapat menghilangkan probabilitas bernilai Nol dariÂ POS Tagging Hidden Markov Model. Hasil penelitian ini menunjukan bahwaÂ POS TaggingÂ dengan menggunakanÂ Hidden Markov ModelÂ dipengaruhi oleh perlakuan terhadapÂ unknown words, perbendaharaan kata dan hubungan label kata padaÂ dataset.Â Â Part of Speech Tagging or POS Tagging is the process of automatically giving labels to each word in a sentence. This study uses the Hidden Markov Model (HMM) algorithm for the POS Tagging process. Treatment for unknown words uses the Most Probable POS-Tag. The dataset used is in the form of 10 short stories in Javanese consisting of 10,180 words which have been given the Javanese tagset. In this study, the POS Tagging process uses two scenarios. The first scenario is using the Hidden Markov Model (HMM) algorithm without using treatment for unknown words. The second scenario uses HMM and Most Probable POS-Tag for treatment of unknown words. The results show that the first scenario produces an accuracy of 45.5% and the second scenario produces an accuracy of 70.78%. Most Probable POS-Tag can improve accuracy in POS Tagging but does not always produce correct labels. Most Probable POS-Tag can remove zero-value probability from POS Tagging Hidden Markov Model. The results of this study indicate that POS Tagging using the Hidden Markov Model is influenced by the treatment of unknown words, vocabulary and word label relationships in the dataset.

Download Full-text

Joint Learning of Pre-Trained and Random Units for Domain Adaptation in Part-of-Speech Tagging

10.18653/v1/n19-1416 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sara Meftah ◽

Youssef Tamaazousti ◽

Nasredine Semmar ◽

Hassane Essafi ◽

Fatiha Sadat

Keyword(s):

Domain Adaptation ◽

Joint Learning ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Hidden Markov Based Mathematical Model dedicated to Extract Ingredients from Recipe Text

10.31219/osf.io/gvj45 ◽

2019 ◽

Author(s):

Zied Baklouti

Keyword(s):

Mathematical Model ◽

Language Processing ◽

Hidden Markov ◽

Stochastic Methods ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Unknown Words ◽

High Level ◽

Speech Tagging

Natural Language Processing (NLP) is a branch of machine learning that gives the machines the ability to decode human languages. Part-of-speech tagging (POS tagging) is a preprocessing task that requires an annotated corpora. Rule-based and stochastic methods showed great results for POS tag prediction. On this work, I performed a mathematical model based on Hidden Markov structures and I obtained a high level accuracy of ingredients extracted from text recipe which is a performance greater than what traditional methods could make without unknown words consideration.

Download Full-text

Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning

The KIPS Transactions PartB ◽

10.3745/kipstb.2011.18b.1.045 ◽

2011 ◽

Vol 18B (1) ◽

pp. 45-50 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Unknown Words ◽

Speech Tagging

Download Full-text

Domain Adaptation for Part-of-Speech Tagging of Indonesian Text Using Affix Information

Procedia Computer Science ◽

10.1016/j.procs.2021.01.050 ◽

2021 ◽

Vol 179 ◽

pp. 640-647

Author(s):

Aditya Maulana ◽

Ade Romadhony

Keyword(s):

Domain Adaptation ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Domain Adaptation in Part-of-Speech Tagging

Emerging Applications of Natural Language Processing ◽

10.4018/978-1-4666-2169-5.ch003 ◽

2013 ◽

pp. 52-72

Author(s):

Miriam Lúcia Domingues ◽

Eloi Luiz Favero

Keyword(s):

Language Processing ◽

Domain Adaptation ◽

Scientific Texts ◽

New Words ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Linguistic Structures ◽

Speech Tagging ◽

Accuracy Rates

Many Natural Language Processing (NLP) applications rely on accuracy of the part-of-speech taggers. Although many taggers have good accuracy for the domain in which they were trained, their accuracy typically is not portable to new domains due to problems, such as different linguistic structures or presence of new words. The need for domain adaptation has emerged as a new challenge for part-of-speech tagging and in most NLP tasks. The goal of this chapter is to highlight solutions that handle labeled and unlabeled data, methods that deal with such data to solve the domain adaptation problem, and to present a case study that has achieved significant accuracy rates on tagging journalistic and scientific texts.

Download Full-text

Lexical Rule and Lexicon Effect for Part of Speech Tagging Bahasa Madura

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v18i1.332 ◽

2018 ◽

Vol 18 (1) ◽

pp. 65-72

Author(s):

Nindian Puspa Dewi ◽

Ubaidi Ubaidi

Keyword(s):

Text Processing ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Speech Tagging ◽

Bahasa Indonesia

POS Tagging adalah dasar untuk pengembangan Text Processing suatu bahasa. Dalam penelitian ini kita meneliti pengaruh penggunaan lexicon dan perubahan morfologi kata dalam penentuan tagset yang tepat untuk suatu kata. Aturan dengan pendekatan morfologi kata seperti awalan, akhiran, dan sisipan biasa disebut sebagai lexical rule. Penelitian ini menerapkan lexical rule hasil learner dengan menggunakan algoritma Brill Tagger. Bahasa Madura adalah bahasa daerah yang digunakan di Pulau Madura dan beberapa pulau lainnya di Jawa Timur. Objek penelitian ini menggunakan Bahasa Madura yang memiliki banyak sekali variasi afiksasi dibandingkan dengan Bahasa Indonesia. Pada penelitian ini, lexicon selain digunakan untuk pencarian kata dasar Bahasa Madura juga digunakan sebagai salah satu tahap pemberian POS Tagging. Hasil ujicoba dengan menggunakan lexicon mencapai akurasi yaitu 86.61% sedangkan jika tidak menggunakan lexicon hanya mencapai akurasi 28.95 %. Dari sini dapat disimpulkan bahwa ternyata lexicon sangat berpengaruh terhadap POS Tagging.

Download Full-text