scholarly journals Sistem Question Answering untuk Bahasa Bali menggunakan Metode Rule-Based dan String Similarity

Techno Com ◽  
2021 ◽  
Vol 20 (2) ◽  
pp. 300-308
Author(s):  
Made Agus Putra Subali ◽  
Puritan Wijaya

Sistem question answering merupakan kemampuan sistem untuk memberikan jawaban atas kalimat tanya yang diberikan oleh user. Sampai saat ini penelitian tentang sistem question answering pada bahasa Bali belum pernah dilakukan. Pada penelitian ini kalimat tanya yang digunakan adalah kalimat tanya biasa, sebagai contoh kalimat tanya "akuda memene ngubuh siap?" Dalam bahasa Indonesia "berapa ibumu memelihara ayam?" Data yang digunakan dalam penelitian ini merupakan lima puluh dokumen berbahasa Bali. Sedangkan pada tahap pengujian dilakukan dengan menggunakan dua puluh kalimat tanya. Adapun metode yang diusulkan dalam penelitian ini dimulai dari memberi input pertanyaan, mencari dokumen paling relevan berdasarkan pertanyaan yang diberikan, dan memperoleh jawaban berdasarkan rules untuk setiap pertanyaan. Berdasarkan pengujian pada kedua puluh pertanyaan yang diberikan metode yang diusulkan memperoleh akurasi sebesar 40% terkait kebenaran respons atau jawaban yang diberikan.

2019 ◽  
Vol 15 (3) ◽  
pp. 79-100 ◽  
Author(s):  
Watanee Jearanaiwongkul ◽  
Frederic Andres ◽  
Chutiporn Anutariya

Nowadays, farmers can search for treatments for their plants using search engines and applications. Most existing works are developed in the form of rule-based question answering platforms. However, an observation could be incorrectly given by the farmer. This work recommends that diseases and treatments must be considered from a set of related observations. Thus, we develop a theoretical framework for systems to manage a farmer's observation data. We investigate and formalize desirable characteristics of such systems. The observation data is attached with a geolocation in which related contextual data is found. The framework is formalized based on algebra, in which required types and functions are identified. Its key characteristics are described by: (1) the defined type called warncons for representing observation data; (2) the similarity function for warncons; and (3) the warncons composition function for composing similar warncons. Finally, we show that the framework helps observation data to become richer and improve advice-finding.


2018 ◽  
Vol 2 (3) ◽  
pp. 157
Author(s):  
Ahmad Subhan Yazid ◽  
Agung Fatwanto

Indonesian hold a fundamental role in the communication. There is ambiguous problem in its machine learning implementation. In the Natural Language Processing study, Part of Speech (POS) tagging has a role in the decreasing this problem. This study use the Rule Based method to determine the best word class for ambiguous words in Indonesian. This research follows some stages: knowledge inventory, making algorithms, implementation, Testing, Analysis, and Conclusions. The first data used is Indonesian corpus that was developed by Language department of Computer science Faculty, Indonesia University. Then, data is processed and shown descriptively by following certain rules and specification. The result is a POS tagging algorithm included 71 rules in flowchart and descriptive sentence notation. Refer to testing result, the algorithm successfully provides 92 labeling of 100 tested words (92%). The results of the implementation are influenced by the availability of rules, word class tagsets and corpus data.


2021 ◽  
Author(s):  
Samreen Ahmed ◽  
shakeel khoja

<p>In recent years, low-resource Machine Reading Comprehension (MRC) has made significant progress, with models getting remarkable performance on various language datasets. However, none of these models have been customized for the Urdu language. This work explores the semi-automated creation of the Urdu Question Answering Dataset (UQuAD1.0) by combining machine-translated SQuAD with human-generated samples derived from Wikipedia articles and Urdu RC worksheets from Cambridge O-level books. UQuAD1.0 is a large-scale Urdu dataset intended for extractive machine reading comprehension tasks consisting of 49k question Answers pairs in question, passage, and answer format. In UQuAD1.0, 45000 pairs of QA were generated by machine translation of the original SQuAD1.0 and approximately 4000 pairs via crowdsourcing. In this study, we used two types of MRC models: rule-based baseline and advanced Transformer-based models. However, we have discovered that the latter outperforms the others; thus, we have decided to concentrate solely on Transformer-based architectures. Using XLMRoBERTa and multi-lingual BERT, we acquire an F<sub>1</sub> score of 0.66 and 0.63, respectively.</p>


2016 ◽  
Vol 24 ◽  
pp. 1534-1541 ◽  
Author(s):  
S.M. Archana ◽  
Naima Vahab ◽  
Rekha Thankappan ◽  
C. Raseek

Author(s):  
Ria Hari Gusmita ◽  
Yusuf Durachman ◽  
Salman Harun ◽  
Asep Fajar Firmansyah ◽  
Husni Teja Sukmana ◽  
...  

2015 ◽  
Vol 6 (4) ◽  
Author(s):  
Bonifacius Vicky Indriyono ◽  
Ema Utami ◽  
Andi Sunyoto

Abstract. Stemming is the process of mapping and decomposition of various forms (variants) of a word to essentially find the root word. This process is also referred to as the conflation. Stemming process has been widely used in the activities of the information retrieval (search information) to improve the quality of the information obtained. Stemming works by employing words taken froma dictionary and the usage of the basic rules of affixes. Porter stemmer for Indonesian or commonly referred as Tala stemmer uses the rules of basic analysis to find the root of a word. Tala Stemmer does not use a dictionary in the process. Instead, it uses a rule-based algorithm. In this study, the principal issue raised is how to make the process of classification/determination of the book/library materials in a library with a fast and effective manner in order to minimize error in determining the type of books. The solution is to utilize the method used by the porter stemmer for stemming Indonesian.Keywords: Stemming, Information Retrieval, Porter Stemmer, Classification Abstrak. Stemming adalah proses pemetaan dan penguraian berbagai bentuk (variants) dari suatu kata menjadi bentuk kata dasarnya. Proses ini juga disebut sebagai conflation. Proses stemming secara luas sudah digunakan di dalam kegiatan Information retrieval (pencarian informasi) untuk meningkatkan kualitas informasi yang didapatkan. Cara kerja stemming dapat dilakukan dengan menggunakan kamus kata dasar maupun menggunakan aturan-aturan imbuhan. Porter stemmer untuk Bahasa Indonesia atau yang biasa disebut dengan stemmer Tala menggunakan rule base analisis untuk mencari root sebuah kata. Stemmer Tala tidak menggunakan kamus dalam proses, melainkan menggunakan algoritma berbasis aturan. Dalam penelitian ini, pokok permasalahan yang diangkat adalah bagaimana melakukan proses klasifikasi/penentuan jenis buku/bahan pustaka dalam sebuah perpustakaan dengan cara yang cepat dan efektif sehingga dapat meminimalisir kesalahan penentuan jenis buku. Solusi yang dipergunakan adalah dengan memanfaatkan metode stemming dengan porter stemmer untuk bahasa Indonesia.Kata Kunci: Stemming, Information Retrieval, Porter Stemmer, Klasifikasi


Author(s):  
Dian Puspita Tedjosurya ◽  
Suharjito Suharjito

Along with the development of information technology in recent era, a number of new applications emerge, especially on mobile phones. The use of mobile phones, besides as communication media, is also as media of learning, such as translator application. Translator application can be a tool to learn a language, such as English to Bahasa Indonesia translator application. The purpose of this research is to allow user to be able to translate English to Bahasa Indonesia on mobile phone easily. Translator application on this research was developed using Java programming language (especially J2ME) because of its advantage that can run on various operating systems and its open source that can be easily developed and distributed. In this research, data collection was done through literature study, observation, and browsing similar application. Development of the system used object-oriented analysis and design that can be described by using case diagrams, class diagrams, sequence diagrams, and activity diagrams. The translation process used rule-based method. Result of this research is the application of Java-based translator which can translate English sentence into Indonesian sentence. The application can be accessed using a mobile phone with Internet connection. The application has spelling check feature that is able to check the wrong word and provide alternative word that approaches the word input. Conclusion of this research is the application can translate sentence in daily conversation quite well with the sentence structure corresponds and is close to its original meaning.


2021 ◽  
Author(s):  
Samreen Ahmed ◽  
shakeel khoja

<p>In recent years, low-resource Machine Reading Comprehension (MRC) has made significant progress, with models getting remarkable performance on various language datasets. However, none of these models have been customized for the Urdu language. This work explores the semi-automated creation of the Urdu Question Answering Dataset (UQuAD1.0) by combining machine-translated SQuAD with human-generated samples derived from Wikipedia articles and Urdu RC worksheets from Cambridge O-level books. UQuAD1.0 is a large-scale Urdu dataset intended for extractive machine reading comprehension tasks consisting of 49k question Answers pairs in question, passage, and answer format. In UQuAD1.0, 45000 pairs of QA were generated by machine translation of the original SQuAD1.0 and approximately 4000 pairs via crowdsourcing. In this study, we used two types of MRC models: rule-based baseline and advanced Transformer-based models. However, we have discovered that the latter outperforms the others; thus, we have decided to concentrate solely on Transformer-based architectures. Using XLMRoBERTa and multi-lingual BERT, we acquire an F<sub>1</sub> score of 0.66 and 0.63, respectively.</p>


Sign in / Sign up

Export Citation Format

Share Document