POS Tagging Bahasa Indonesia Dengan HMM dan Rule Based

2013 ◽  
Vol 8 (2) ◽  
Author(s):  
Kathryn Widhiyanti ◽  
Agus Harjoko

The research conduct a Part of Speech Tagging (POS-tagging) for text in Indonesian language, supporting another process in digitising natural language e.g. Indonesian language text parsing. POS-tagging is an automated process of labelling word classes for certain word in sentences (Jurafsky and Martin, 2000). The escalated issue is how to acquire an accurate word class labelling in sentence domain. The author would like to propose a method which combine Hidden Markov Model and Rule Based method. The expected outcome in this research is a better accurary in word class labelling, resulted by only using Hidden Markov Model. The labelling results –from Hidden Markov Model– are  refined by validating with certain rule, composed by the used corpus automatically. From the conducted research through some POST document, using Hidden Markov Model, produced 100% as the highest accurary for identical text within corpus. For different text within the referenced corpus, used words subjected in corpus, produced 92,2% for the highest accurary.

2020 ◽  
Vol 9 (2) ◽  
pp. 303
Author(s):  
I Gde Made Hendra Pradiptha ◽  
Ngurah Agus Sanjaya ER

Part-of-Speech tagging or word class labeling is a process for labeling a word class in a word in a sentence. Previous research on POS Tagger, especially for Indonesian, has been done using various approaches and obtained high accuracy values. However, not many researchers have built POS Tagger for Balinese. In this article, we are interested in building a POS Tagger for Balinese using a probabilistic approach, specifically the Hidden Markov Model (HMM). HMM is selected to deal with ambiguity since it gives higher accuracy and fast processing time. We used k-fold cross-validation (with k = 10) and tagged corpus around 3669 tokens with 21 tags. Based on the experiments conducted, the HMM method obtained an accuracy of 68.56%.


2020 ◽  
Vol 2 (2) ◽  
pp. 71-83
Author(s):  
Mohammad Mursyit ◽  
Aji Prasetya Wibawa ◽  
Ilham Ari Elbaith Zaeni ◽  
Harits Ar Rosyid

Part of Speech Tagging atau POS Tagging adalah proses memberikan label pada setiap kata dalam sebuah kalimat secara otomatis. Penelitian ini menggunakan algoritma Hidden Markov Model (HMM) untuk proses POS Tagging. Perlakuan untuk unknown words menggunakan Most Probable POS-Tag. Dataset yang digunakan berupa 10 cerita pendek berbahasa Jawa terdiri dari 10.180 kata yang telah diberikan tagsetBahasa Jawa. Pada penelitian ini proses POS Tagging menggunakan dua skenario. Skenario pertama yaitu menggunakan algoritma Hidden Markov Model (HMM) tanpa menggunakan perlakuan untuk unknown words. Skenario yang kedua menggunakan HMM dan Most Probable POS-Tag untuk perlakuan unknown words. Hasil menunjukan skenario pertama menghasilkan akurasi sebesar 45.5% dan skenario kedua menghasilkan akurasi sebesar 70.78%. Most Probable POS-Tag dapat meningkatkan akurasi pada POS Tagging tetapi tidak selalu menunjukan hasil yang benar dalam pemberian label. Most Probable POS-Tag dapat menghilangkan probabilitas bernilai Nol dari POS Tagging Hidden Markov Model. Hasil penelitian ini menunjukan bahwa POS Tagging dengan menggunakan Hidden Markov Model dipengaruhi oleh perlakuan terhadap unknown words, perbendaharaan kata dan hubungan label kata pada dataset.  Part of Speech Tagging or POS Tagging is the process of automatically giving labels to each word in a sentence. This study uses the Hidden Markov Model (HMM) algorithm for the POS Tagging process. Treatment for unknown words uses the Most Probable POS-Tag. The dataset used is in the form of 10 short stories in Javanese consisting of 10,180 words which have been given the Javanese tagset. In this study, the POS Tagging process uses two scenarios. The first scenario is using the Hidden Markov Model (HMM) algorithm without using treatment for unknown words. The second scenario uses HMM and Most Probable POS-Tag for treatment of unknown words. The results show that the first scenario produces an accuracy of 45.5% and the second scenario produces an accuracy of 70.78%. Most Probable POS-Tag can improve accuracy in POS Tagging but does not always produce correct labels. Most Probable POS-Tag can remove zero-value probability from POS Tagging Hidden Markov Model. The results of this study indicate that POS Tagging using the Hidden Markov Model is influenced by the treatment of unknown words, vocabulary and word label relationships in the dataset.


2018 ◽  
Vol 2 (3) ◽  
pp. 157
Author(s):  
Ahmad Subhan Yazid ◽  
Agung Fatwanto

Indonesian hold a fundamental role in the communication. There is ambiguous problem in its machine learning implementation. In the Natural Language Processing study, Part of Speech (POS) tagging has a role in the decreasing this problem. This study use the Rule Based method to determine the best word class for ambiguous words in Indonesian. This research follows some stages: knowledge inventory, making algorithms, implementation, Testing, Analysis, and Conclusions. The first data used is Indonesian corpus that was developed by Language department of Computer science Faculty, Indonesia University. Then, data is processed and shown descriptively by following certain rules and specification. The result is a POS tagging algorithm included 71 rules in flowchart and descriptive sentence notation. Refer to testing result, the algorithm successfully provides 92 labeling of 100 tested words (92%). The results of the implementation are influenced by the availability of rules, word class tagsets and corpus data.


Sign in / Sign up

Export Citation Format

Share Document