scholarly journals Analisis Morfologi untuk Menangani Out-of-Vocabulary Words pada Part-of-Speech Tagger Bahasa Indonesia Menggunakan Hidden Markov Model

2019 ◽  
Vol 2 (1) ◽  
pp. 6 ◽  
Author(s):  
Febyana Ramadhanti ◽  
Yudi Wibisono ◽  
Rosa Ariani Sukamto

Part-of-speech (PoS) tagger merupakan salah satu task dalam bidang natural language processing (NLP) sebagai proses penandaan kategori kata (part-of-speech) untuk setiap kata pada teks kalimat masukan. Hidden markov model (HMM) merupakan algoritma PoS tagger berbasis probabilistik, sehingga sangat tergantung pada train corpus. Terbatasnya komponen dalam train corpus dan luasnya kata dalam bahasa Indonesia menimbulkan masalah yang disebut out-of-vocabulary (OOV) words. Penelitian ini membandingkan PoS tagger yang menggunakan HMM+AM (analisis morfologi) dan PoS tagger HMM tanpa AM, dengan menggunakan train corpus dan testing corpus yang sama. Testing corpus mengandung 30% tingkat OOV dari 6.676 token atau 740 kalimat masukan. Hasil yang diperoleh dari sistem HMM saja memiliki akurasi 97.54%, sedangkan sistem HMM dengan metode analisis morfologi memiliki akurasi tertinggi 99.14%. 

2021 ◽  
pp. 587-595
Author(s):  
Alebachew Chiche ◽  
Hiwot Kadi ◽  
Tibebu Bekele

Natural language processing plays a great role in providing an interface for human-computer communication. It enables people to talk with the computer in their formal language rather than machine language. This study aims at presenting a Part of speech tagger that can assign word class to words in a given paragraph sentence. Some of the researchers developed parts of speech taggers for different languages such as English Amharic, Afan Oromo, Tigrigna, etc. On the other hand, many other languages do not have POS taggers like Shekki’noono language.  POS tagger is incorporated in most natural language processing tools like machine translation, information extraction as a basic component. So, it is compulsory to develop a part of speech tagger for languages then it is possible to work with an advanced natural language application. Because those applications enhance machine to machine, machine to human, and human to human communications. Although, one language POS tagger cannot be directly applied for other languages POS tagger. With the purpose for developing the Shekki’noono POS tagger, we have used the stochastic Hidden Markov Model. For the study, we have used 1500 sentences collected from different sources such as newspapers (which includes social, economic, and political aspects), modules, textbooks, Radio Programs, and bulletins.  The collected sentences are labeled by language experts with their appropriate parts of speech for each word.  With the experiments carried out, the part of speech tagger is trained on the training sets using Hidden Markov model. As experiments showed, HMM based POS tagging has achieved 92.77 % accuracy for Shekki’noono. And the POS tagger model is compared with the previous experiments in related works using HMM. As a future work, the proposed approaches can be utilized to perform an evaluation on a larger corpus.


2020 ◽  
Vol 9 (2) ◽  
pp. 303
Author(s):  
I Gde Made Hendra Pradiptha ◽  
Ngurah Agus Sanjaya ER

Part-of-Speech tagging or word class labeling is a process for labeling a word class in a word in a sentence. Previous research on POS Tagger, especially for Indonesian, has been done using various approaches and obtained high accuracy values. However, not many researchers have built POS Tagger for Balinese. In this article, we are interested in building a POS Tagger for Balinese using a probabilistic approach, specifically the Hidden Markov Model (HMM). HMM is selected to deal with ambiguity since it gives higher accuracy and fast processing time. We used k-fold cross-validation (with k = 10) and tagged corpus around 3669 tokens with 21 tags. Based on the experiments conducted, the HMM method obtained an accuracy of 68.56%.


2019 ◽  
Vol 8 (2) ◽  
pp. 3899-3903

Part of Speech Tagging has continually been a difficult mission in the era of Natural Language Processing. This article offers POS tagging for Gujarati textual content the use of Hidden Markov Model. Using Gujarati text annotated corpus for training checking out statistics set are randomly separated. 80% accuracy is given by model. Error analysis in which the mismatches happened is likewise mentioned in element.


2012 ◽  
Vol 198-199 ◽  
pp. 852-855
Author(s):  
Xi Jie Wang ◽  
Shun Yi Hu

Part-of-Speech Tagging is the fundamental problems in natural language processing .The paper introduces the representation of the Hidden Markov Model (HMM) and the needs to solve the problem, and then discusses the parameter estimation method of the HMM model, and research on basic principles of Part-of Speech Tagging using Viterbi algorithm.


The software development procedure begins with identifying the requirement analysis. The process levels of the requirements start from analysing the requirements to sketch the design of the program, which is very critical work for programmers and software engineers. Moreover, many errors will happen during the requirement analysis cycle transferring to other stages, which leads to the high cost of the process more than the initial specified process. The reason behind this is because of the specifications of software requirements created in the natural language. To minimize these errors, we can transfer the software requirements to the computerized form by the UML diagram. To overcome this, a device has been designed, which plans can provide semi-automatized aid for designers to provide UML class version from software program specifications using natural Language Processing techniques. The proposed technique outlines the class diagram in a well-known configuration and additionally facts out the relationship between instructions. In this research, we propose to enhance the procedure of producing the UML diagrams by utilizing the Natural Language, which will help the software development to analyze the software requirements with fewer errors and efficient way. The proposed approach will use the parser analyze and Part of Speech (POS) tagger to analyze the user requirements entered by the user in the English language. Then, extract the verbs and phrases, etc. in the user text. The obtained results showed that the proposed method got better results in comparison with other methods published in the literature. The proposed method gave a better analysis of the given requirements and better diagrams presentation, which can help the software engineers. Key words: Part of Speech,UM


Author(s):  
Ayush Srivastav ◽  
Hera Khan ◽  
Amit Kumar Mishra

The chapter provides an eloquent account of the major methodologies and advances in the field of Natural Language Processing. The most popular models that have been used over time for the task of Natural Language Processing have been discussed along with their applications in their specific tasks. The chapter begins with the fundamental concepts of regex and tokenization. It provides an insight to text preprocessing and its methodologies such as Stemming and Lemmatization, Stop Word Removal, followed by Part-of-Speech tagging and Named Entity Recognition. Further, this chapter elaborates the concept of Word Embedding, its various types, and some common frameworks such as word2vec, GloVe, and fastText. A brief description of classification algorithms used in Natural Language Processing is provided next, followed by Neural Networks and its advanced forms such as Recursive Neural Networks and Seq2seq models that are used in Computational Linguistics. A brief description of chatbots and Memory Networks concludes the chapter.


Author(s):  
Marina Sokolova ◽  
Stan Szpakowicz

This chapter presents applications of machine learning techniques to traditional problems in natural language processing, including part-of-speech tagging, entity recognition and word-sense disambiguation. People usually solve such problems without difficulty or at least do a very good job. Linguistics may suggest labour-intensive ways of manually constructing rule-based systems. It is, however, the easy availability of large collections of texts that has made machine learning a method of choice for processing volumes of data well above the human capacity. One of the main purposes of text processing is all manner of information extraction and knowledge extraction from such large text. Machine learning methods discussed in this chapter have stimulated wide-ranging research in natural language processing and helped build applications with serious deployment potential.


Sign in / Sign up

Export Citation Format

Share Document