Prediksi Jeda dalam Ucapan Kalimat Bahasa Melayu Pontianak Menggunakan Hidden Markov Model Berbasis Part of Speech

Informasi jeda adalah salah satu faktor pendukung dari ucapan berkualitas yang dihasilkan oleh sistem Text to Speech. Penelitian ini bertujuan untuk memprediksi jeda pada ucapan kalimat bahasa Melayu Pontianak berbasis part of speech dengan menggunakan tools Hidden Markov Model (HMM). HMM akan menghitung nilai probabilitas dari setiap kemungkinan yang ada. Penelitian ini menggunakan data berupa file rekaman ucapan penutur yang membacakan 500 kalimat berbahasa Melayu Pontianak. Hasil yang didapatkan dari sistem ini yaitu teks kalimat bahasa Melayu Pontianak beserta prediksi jedanya. Indeks jeda dikategorikan menjadi 5 kategori yaitu indeks jeda “0” menandakan tidak ada jeda, “1” menandakan jeda singkat, “2” menandakan jeda panjang, “,” menandakan tanda baca koma, dan “.” menandakan akhir kalimat. Hasil prediksi kemudian diuji menggunakan pengujian akurasi kecocokan jeda ucapan dalam satu kalimat penuh dan pengujian precision, recall dan f-measure. Frasa jeda ucapan yang diuji yaitu frasa jeda 1+2 dan frasa jeda 2. Pengujian dilakukan dengan membandingkan hasil model bigram dan trigram. Berdasarkan pengujian yang telah dilakukan, model trigram lebih baik dalam menghasilkan prediksi jeda ucapan pada kalimat bahasa Melayu Pontianak. AbstractPause information is one of the supporting factors of quality speech produced by the Text to Speech system. Previously there had been research to predict pauses in Pontianak Malay language using other methods, but it still did not get good results. This study aims to predict pauses in Pontianak Malay language sentences using the Hidden Markov Model (HMM) tools based on part of speech. HMM will calculate the probability value of each possibility. This research uses recording file of speeches from speakers who read 500 Pontianak Malay sentences and a new PoS set developed from several existing PoS sets. The results are Pontianak Malay language sentence along with the pause prediction. The pause indices are categorized into 5 categories, the pause index "0" indicates that there is no pause, "1" indicates a short pause, "2" indicates a long pause, "," indicates the comma punctuation, and "." indicates the end of the sentence. The prediction results are then tested using a speech pause match accuracy test in one full sentence and testing of precision, recall and f-measure. The speech pause phrases that are tested are the pause phrase 1+2 and the pause phrase 2. The test is done by comparing the results of the bigram and trigram models. Based on the tests that have been done, the trigram model is better at producing predictions of speech pauses in Pontianak Malay language sentences.

Download Full-text

Twitter Storytelling Generator Using Latent Dirichlet Allocation and Hidden Markov Model POS-TAG (Part-of-Speech Tagging)

2019 3rd International Conference on Informatics and Computational Sciences (ICICoS) ◽

10.1109/icicos48119.2019.8982411 ◽

2019 ◽

Author(s):

Yasir Abdur Rohman ◽

Retno Kusumaningrum

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Latent Dirichlet Allocation ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging ◽

Dirichlet Allocation

Download Full-text

Part-of-speech tagging based on hidden Markov model assuming joint independence

10.3115/1075218.1075252 ◽

2000 ◽

Cited By ~ 3

Author(s):

Sang-Zoo Lee ◽

Jun-ichi Tsujii ◽

Hae-Chang Rim

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Post-Processing Using Speech Enhancement Techniques for Unit Selection and Hidden Markov Model Based Low Resource Language Marathi Text-to-Speech System

10.21437/sltu.2018-20 ◽

2018 ◽

Author(s):

Sangramsing Kayte ◽

Monica Mundada

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Speech Enhancement ◽

Hidden Markov ◽

Post Processing ◽

Text To Speech ◽

Unit Selection ◽

Low Resource ◽

Model Based

Download Full-text

POS Tagging Bahasa Indonesia Dengan HMM dan Rule Based

Jurnal Informatika ◽

10.21460/inf.2012.82.125 ◽

2013 ◽

Vol 8 (2) ◽

Cited By ~ 1

Author(s):

Kathryn Widhiyanti ◽

Agus Harjoko

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Word Class ◽

Rule Based ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Class Labelling ◽

Speech Tagging

The research conduct a Part of Speech Tagging (POS-tagging) for text in Indonesian language, supporting another process in digitising natural language e.g. Indonesian language text parsing. POS-tagging is an automated process of labelling word classes for certain word in sentences (Jurafsky and Martin, 2000). The escalated issue is how to acquire an accurate word class labelling in sentence domain. The author would like to propose a method which combine Hidden Markov Model and Rule Based method. The expected outcome in this research is a better accurary in word class labelling, resulted by only using Hidden Markov Model. The labelling results –from Hidden Markov Model– are refined by validating with certain rule, composed by the used corpus automatically. From the conducted research through some POST document, using Hidden Markov Model, produced 100% as the highest accurary for identical text within corpus. For different text within the referenced corpus, used words subjected in corpus, produced 92,2% for the highest accurary.

Download Full-text

Twitter part-of-speech tagging using pre-classification Hidden Markov model

2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/icsmc.2012.6377881 ◽

2012 ◽

Cited By ~ 6

Author(s):

Shichang Sun ◽

Hongbo Liu ◽

Hongfei Lin ◽

Ajith Abraham

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Building Balinese Part-of-Speech Tagger Using Hidden Markov Model (HMM)

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2020.v09.i02.p18 ◽

2020 ◽

Vol 9 (2) ◽

pp. 303

Author(s):

I Gde Made Hendra Pradiptha ◽

Ngurah Agus Sanjaya ER

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Probabilistic Approach ◽

Word Class ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Fast Processing ◽

Pos Tagger ◽

Speech Tagging

Part-of-Speech tagging or word class labeling is a process for labeling a word class in a word in a sentence. Previous research on POS Tagger, especially for Indonesian, has been done using various approaches and obtained high accuracy values. However, not many researchers have built POS Tagger for Balinese. In this article, we are interested in building a POS Tagger for Balinese using a probabilistic approach, specifically the Hidden Markov Model (HMM). HMM is selected to deal with ambiguity since it gives higher accuracy and fast processing time. We used k-fold cross-validation (with k = 10) and tagged corpus around 3669 tokens with 21 tags. Based on the experiments conducted, the HMM method obtained an accuracy of 68.56%.

Download Full-text