scholarly journals Prediksi Jeda dalam Ucapan Kalimat Bahasa Melayu Pontianak Menggunakan Hidden Markov Model Berbasis Part of Speech

2020 ◽  
Vol 7 (4) ◽  
pp. 755
Author(s):  
Arif Bijaksana Putra Negara ◽  
Hafiz Muhardi ◽  
Evi Fathiyah Muniyati

<p>Informasi jeda adalah salah satu faktor pendukung dari ucapan berkualitas yang dihasilkan oleh sistem <em>Text to Speech</em>. Penelitian ini bertujuan untuk memprediksi jeda pada ucapan kalimat bahasa Melayu Pontianak berbasis <em>part of speech</em> dengan menggunakan <em>tools</em> Hidden Markov Model (HMM). HMM akan menghitung nilai probabilitas dari setiap kemungkinan yang ada. Penelitian ini menggunakan data berupa file rekaman ucapan penutur yang membacakan 500 kalimat berbahasa Melayu Pontianak. Hasil yang didapatkan dari sistem ini yaitu teks kalimat bahasa Melayu Pontianak beserta prediksi jedanya. Indeks jeda dikategorikan menjadi 5 kategori yaitu indeks jeda “0” menandakan tidak ada jeda, “1” menandakan jeda singkat, “2” menandakan jeda panjang, “,” menandakan tanda baca koma, dan “.” menandakan akhir kalimat. Hasil prediksi kemudian diuji menggunakan pengujian akurasi kecocokan jeda ucapan dalam satu kalimat penuh dan pengujian <em>precision</em>, <em>recall</em> dan <em>f-measure</em>. Frasa jeda ucapan yang diuji yaitu frasa jeda 1+2 dan frasa jeda 2. Pengujian dilakukan dengan membandingkan hasil model bigram dan trigram. Berdasarkan pengujian yang telah dilakukan, model trigram lebih baik dalam menghasilkan prediksi jeda ucapan pada kalimat bahasa Melayu Pontianak.</p><p> </p><p><em><strong>Abstract</strong></em></p><p><em>Pause information is one of the supporting factors of quality speech produced by the Text to Speech system. Previously there had been research to predict pauses in Pontianak Malay language using other methods, but it still did not get good results. This study aims to predict pauses in Pontianak Malay language sentences using the Hidden Markov Model (HMM) tools based on part of speech. HMM will calculate the probability value of each possibility. This research uses recording file of speeches from speakers who read 500 Pontianak Malay sentences and a new PoS set developed from several existing PoS sets. The results are Pontianak Malay language sentence along with the pause prediction. The pause indices are categorized into 5 categories, the pause index "0" indicates that there is no pause, "1" indicates a short pause, "2" indicates a long pause, "," indicates the comma punctuation, and "." indicates the end of the sentence. The prediction results are then tested using a speech pause match accuracy test in one full sentence and testing of precision, recall and f-measure. The speech pause phrases that are tested are the pause phrase 1+2 and the pause phrase 2. The test is done by comparing the results of the bigram and trigram models. Based on the tests that have been done, the trigram model is better at producing predictions of speech pauses in Pontianak Malay language sentences.</em></p>

2013 ◽  
Vol 8 (2) ◽  
Author(s):  
Kathryn Widhiyanti ◽  
Agus Harjoko

The research conduct a Part of Speech Tagging (POS-tagging) for text in Indonesian language, supporting another process in digitising natural language e.g. Indonesian language text parsing. POS-tagging is an automated process of labelling word classes for certain word in sentences (Jurafsky and Martin, 2000). The escalated issue is how to acquire an accurate word class labelling in sentence domain. The author would like to propose a method which combine Hidden Markov Model and Rule Based method. The expected outcome in this research is a better accurary in word class labelling, resulted by only using Hidden Markov Model. The labelling results –from Hidden Markov Model– are  refined by validating with certain rule, composed by the used corpus automatically. From the conducted research through some POST document, using Hidden Markov Model, produced 100% as the highest accurary for identical text within corpus. For different text within the referenced corpus, used words subjected in corpus, produced 92,2% for the highest accurary.


2020 ◽  
Vol 9 (2) ◽  
pp. 303
Author(s):  
I Gde Made Hendra Pradiptha ◽  
Ngurah Agus Sanjaya ER

Part-of-Speech tagging or word class labeling is a process for labeling a word class in a word in a sentence. Previous research on POS Tagger, especially for Indonesian, has been done using various approaches and obtained high accuracy values. However, not many researchers have built POS Tagger for Balinese. In this article, we are interested in building a POS Tagger for Balinese using a probabilistic approach, specifically the Hidden Markov Model (HMM). HMM is selected to deal with ambiguity since it gives higher accuracy and fast processing time. We used k-fold cross-validation (with k = 10) and tagged corpus around 3669 tokens with 21 tags. Based on the experiments conducted, the HMM method obtained an accuracy of 68.56%.


Sign in / Sign up

Export Citation Format

Share Document