Building Balinese Part-of-Speech Tagger Using Hidden Markov Model (HMM)

Part-of-Speech tagging or word class labeling is a process for labeling a word class in a word in a sentence. Previous research on POS Tagger, especially for Indonesian, has been done using various approaches and obtained high accuracy values. However, not many researchers have built POS Tagger for Balinese. In this article, we are interested in building a POS Tagger for Balinese using a probabilistic approach, specifically the Hidden Markov Model (HMM). HMM is selected to deal with ambiguity since it gives higher accuracy and fast processing time. We used k-fold cross-validation (with k = 10) and tagged corpus around 3669 tokens with 21 tags. Based on the experiments conducted, the HMM method obtained an accuracy of 68.56%.

Download Full-text

POS Tagging Bahasa Indonesia Dengan HMM dan Rule Based

Jurnal Informatika ◽

10.21460/inf.2012.82.125 ◽

2013 ◽

Vol 8 (2) ◽

Cited By ~ 1

Author(s):

Kathryn Widhiyanti ◽

Agus Harjoko

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Word Class ◽

Rule Based ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Class Labelling ◽

Speech Tagging

The research conduct a Part of Speech Tagging (POS-tagging) for text in Indonesian language, supporting another process in digitising natural language e.g. Indonesian language text parsing. POS-tagging is an automated process of labelling word classes for certain word in sentences (Jurafsky and Martin, 2000). The escalated issue is how to acquire an accurate word class labelling in sentence domain. The author would like to propose a method which combine Hidden Markov Model and Rule Based method. The expected outcome in this research is a better accurary in word class labelling, resulted by only using Hidden Markov Model. The labelling results –from Hidden Markov Model– are refined by validating with certain rule, composed by the used corpus automatically. From the conducted research through some POST document, using Hidden Markov Model, produced 100% as the highest accurary for identical text within corpus. For different text within the referenced corpus, used words subjected in corpus, produced 92,2% for the highest accurary.

Download Full-text

Twitter Storytelling Generator Using Latent Dirichlet Allocation and Hidden Markov Model POS-TAG (Part-of-Speech Tagging)

2019 3rd International Conference on Informatics and Computational Sciences (ICICoS) ◽

10.1109/icicos48119.2019.8982411 ◽

2019 ◽

Author(s):

Yasir Abdur Rohman ◽

Retno Kusumaningrum

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Latent Dirichlet Allocation ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging ◽

Dirichlet Allocation

Download Full-text

Part-of-speech tagging based on hidden Markov model assuming joint independence

10.3115/1075218.1075252 ◽

2000 ◽

Cited By ~ 3

Author(s):

Sang-Zoo Lee ◽

Jun-ichi Tsujii ◽

Hae-Chang Rim

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Twitter part-of-speech tagging using pre-classification Hidden Markov model

2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/icsmc.2012.6377881 ◽

2012 ◽

Cited By ~ 6

Author(s):

Shichang Sun ◽

Hongbo Liu ◽

Hongfei Lin ◽

Ajith Abraham

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

A comparative study of hidden Markov model and conditional random fields on a Yorùba part-of-speech tagging task

2017 International Conference on Computing Networking and Informatics (ICCNI) ◽

10.1109/iccni.2017.8123784 ◽

2017 ◽

Author(s):

Ikechukwu I. Ayogu ◽

Adebayo O. Adetunmbi ◽

Bolanle A. Ojokoh ◽

Samuel A. Oluwadare

Keyword(s):

Comparative Study ◽

Markov Model ◽

Hidden Markov Model ◽

Random Fields ◽

Conditional Random Fields ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Named entity recognition based on a Hidden Markov Model in part-of-speech tagging

2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT) ◽

10.1109/icadiwt.2008.4664380 ◽

2008 ◽

Cited By ~ 4

Author(s):

Ryohei Ageishi ◽

Takao Miura

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Part-of-speech Tagging and Named Entity Recognition Using Improved Hidden Markov Model and Bloom Filter

2018 International Conference on Computing, Power and Communication Technologies (GUCON) ◽

10.1109/gucon.2018.8674901 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ankita ◽

K. A. Abdul Nazeer

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Named Entity Recognition ◽

Bloom Filter ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

A hidden Markov model for Persian part-of-speech tagging

Procedia Computer Science ◽

10.1016/j.procs.2010.12.160 ◽

2011 ◽

Vol 3 ◽

pp. 977-981 ◽

Cited By ~ 4

Author(s):

Morteza Okhovvat ◽

Behrouz Minaei Bidgoli

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Hidden Markov model-based Korean part-of-speech tagging considering high agglutinativity, word-spacing, and lexical correlativity

10.3115/1075218.1075266 ◽

2000 ◽

Cited By ~ 4

Author(s):

Sang-Zoo Lee ◽

Jun-ichi Tsujii ◽

Hae-Chang Rim

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Model Based ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Pelabelan Kelas Kata Bahasa Jawa Menggunakan Hidden Markov Model

Mobile and Forensics ◽

10.12928/mf.v2i2.2450 ◽

2020 ◽

Vol 2 (2) ◽

pp. 71-83

Author(s):

Mohammad Mursyit ◽

Aji Prasetya Wibawa ◽

Ilham Ari Elbaith Zaeni ◽

Harits Ar Rosyid

Keyword(s):

Short Stories ◽

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Improve Accuracy ◽

Unknown Words ◽

Speech Tagging

Part of Speech TaggingÂ atauÂ POS TaggingÂ adalah proses memberikan label pada setiap kata dalam sebuah kalimat secara otomatis. Penelitian ini menggunakan algoritmaÂ Hidden Markov ModelÂ (HMM) untuk prosesÂ POS Tagging. Perlakuan untukÂ unknown wordsÂ menggunakanÂ Most Probable POS-Tag.Â DatasetÂ yang digunakan berupa 10 cerita pendek berbahasa Jawa terdiri dari 10.180 kata yang telah diberikanÂ tagsetBahasa Jawa. Pada penelitian ini prosesÂ POS TaggingÂ menggunakan dua skenario. Skenario pertama yaitu menggunakan algoritmaÂ Hidden Markov ModelÂ (HMM) tanpa menggunakan perlakuan untukÂ unknown words. Skenario yang kedua menggunakan HMM danÂ Most Probable POS-TagÂ untukÂ perlakuan unknown words. Hasil menunjukan skenario pertama menghasilkan akurasi sebesar 45.5% dan skenario kedua menghasilkan akurasi sebesar 70.78%.Â Most Probable POS-TagÂ dapat meningkatkan akurasi padaÂ POS TaggingÂ tetapi tidak selalu menunjukan hasil yang benar dalam pemberian label.Â Most Probable POS-TagÂ dapat menghilangkan probabilitas bernilai Nol dariÂ POS Tagging Hidden Markov Model. Hasil penelitian ini menunjukan bahwaÂ POS TaggingÂ dengan menggunakanÂ Hidden Markov ModelÂ dipengaruhi oleh perlakuan terhadapÂ unknown words, perbendaharaan kata dan hubungan label kata padaÂ dataset.Â Â Part of Speech Tagging or POS Tagging is the process of automatically giving labels to each word in a sentence. This study uses the Hidden Markov Model (HMM) algorithm for the POS Tagging process. Treatment for unknown words uses the Most Probable POS-Tag. The dataset used is in the form of 10 short stories in Javanese consisting of 10,180 words which have been given the Javanese tagset. In this study, the POS Tagging process uses two scenarios. The first scenario is using the Hidden Markov Model (HMM) algorithm without using treatment for unknown words. The second scenario uses HMM and Most Probable POS-Tag for treatment of unknown words. The results show that the first scenario produces an accuracy of 45.5% and the second scenario produces an accuracy of 70.78%. Most Probable POS-Tag can improve accuracy in POS Tagging but does not always produce correct labels. Most Probable POS-Tag can remove zero-value probability from POS Tagging Hidden Markov Model. The results of this study indicate that POS Tagging using the Hidden Markov Model is influenced by the treatment of unknown words, vocabulary and word label relationships in the dataset.

Download Full-text