Analisis Morfologi untuk Menangani Out-of-Vocabulary Words pada Part-of-Speech Tagger Bahasa Indonesia Menggunakan Hidden Markov Model

Febyana Ramadhanti; Yudi Wibisono; Rosa Ariani Sukamto

doi:10.26418/jlk.v2i1.13

Analisis Morfologi untuk Menangani Out-of-Vocabulary Words pada Part-of-Speech Tagger Bahasa Indonesia Menggunakan Hidden Markov Model

Jurnal Linguistik Komputasional (JLK) ◽

10.26418/jlk.v2i1.13 ◽

2019 ◽

Vol 2 (1) ◽

pp. 6 ◽

Cited By ~ 1

Author(s):

Febyana Ramadhanti ◽

Yudi Wibisono ◽

Rosa Ariani Sukamto

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Markov Model ◽

Hidden Markov Model ◽

Language Processing ◽

Hidden Markov ◽

Part Of Speech ◽

Pos Tagger ◽

Bahasa Indonesia

Part-of-speech (PoS) tagger merupakan salah satu task dalam bidang natural language processing (NLP) sebagai proses penandaan kategori kata (part-of-speech) untuk setiap kata pada teks kalimat masukan. Hidden markov model (HMM) merupakan algoritma PoS tagger berbasis probabilistik, sehingga sangat tergantung pada train corpus. Terbatasnya komponen dalam train corpus dan luasnya kata dalam bahasa Indonesia menimbulkan masalah yang disebut out-of-vocabulary (OOV) words. Penelitian ini membandingkan PoS tagger yang menggunakan HMM+AM (analisis morfologi) dan PoS tagger HMM tanpa AM, dengan menggunakan train corpus dan testing corpus yang sama. Testing corpus mengandung 30% tingkat OOV dari 6.676 token atau 740 kalimat masukan. Hasil yang diperoleh dari sistem HMM saja memiliki akurasi 97.54%, sedangkan sistem HMM dengan metode analisis morfologi memiliki akurasi tertinggi 99.14%.

Download Full-text

A Hidden Markov Model-based Part of Speech Tagger for Shekki’noono Language

International Journal of Computing ◽

10.47839/ijc.20.4.2448 ◽

2021 ◽

pp. 587-595

Author(s):

Alebachew Chiche ◽

Hiwot Kadi ◽

Tibebu Bekele

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Markov Model ◽

Hidden Markov Model ◽

Language Processing ◽

Hidden Markov ◽

Parts Of Speech ◽

Pos Tagging ◽

Part Of Speech ◽

Pos Tagger

Natural language processing plays a great role in providing an interface for human-computer communication. It enables people to talk with the computer in their formal language rather than machine language. This study aims at presenting a Part of speech tagger that can assign word class to words in a given paragraph sentence. Some of the researchers developed parts of speech taggers for different languages such as English Amharic, Afan Oromo, Tigrigna, etc. On the other hand, many other languages do not have POS taggers like Shekki’noono language. POS tagger is incorporated in most natural language processing tools like machine translation, information extraction as a basic component. So, it is compulsory to develop a part of speech tagger for languages then it is possible to work with an advanced natural language application. Because those applications enhance machine to machine, machine to human, and human to human communications. Although, one language POS tagger cannot be directly applied for other languages POS tagger. With the purpose for developing the Shekki’noono POS tagger, we have used the stochastic Hidden Markov Model. For the study, we have used 1500 sentences collected from different sources such as newspapers (which includes social, economic, and political aspects), modules, textbooks, Radio Programs, and bulletins. The collected sentences are labeled by language experts with their appropriate parts of speech for each word. With the experiments carried out, the part of speech tagger is trained on the training sets using Hidden Markov model. As experiments showed, HMM based POS tagging has achieved 92.77 % accuracy for Shekki’noono. And the POS tagger model is compared with the previous experiments in related works using HMM. As a future work, the proposed approaches can be utilized to perform an evaluation on a larger corpus.

Download Full-text

Natural Language Processing Based Part of Speech Tagger using Hidden Markov Model

2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) ◽

10.1109/i-smac47947.2019.9032593 ◽

2019 ◽

Author(s):

Sindhya K Nambiar ◽

Antony Leons ◽

Soniya Jose ◽

Arunsree

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Markov Model ◽

Hidden Markov Model ◽

Language Processing ◽

Hidden Markov ◽

Part Of Speech

Download Full-text

Hidden Markov Model and its Application in Natural Language Processing

Information Technology Journal ◽

10.3923/itj.2013.4256.4261 ◽

2013 ◽

Vol 12 (17) ◽

pp. 4256-4261

Author(s):

Xuexia Gao ◽

Nan Zhu

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Markov Model ◽

Hidden Markov Model ◽

Language Processing ◽

Hidden Markov

Download Full-text

Review on Usage of Hidden Markov Model in Natural Language Processing

Smart Innovation, Systems and Technologies - Intelligent and Cloud Computing ◽

10.1007/978-981-15-5971-6_45 ◽

2020 ◽

pp. 415-423

Author(s):

Amrita Anandika ◽

Smita Prava Mishra ◽

Madhusmita Das

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Markov Model ◽

Hidden Markov Model ◽

Language Processing ◽

Hidden Markov

Download Full-text

Building Balinese Part-of-Speech Tagger Using Hidden Markov Model (HMM)

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2020.v09.i02.p18 ◽

2020 ◽

Vol 9 (2) ◽

pp. 303

Author(s):

I Gde Made Hendra Pradiptha ◽

Ngurah Agus Sanjaya ER

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Probabilistic Approach ◽

Word Class ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Fast Processing ◽

Pos Tagger ◽

Speech Tagging

Part-of-Speech tagging or word class labeling is a process for labeling a word class in a word in a sentence. Previous research on POS Tagger, especially for Indonesian, has been done using various approaches and obtained high accuracy values. However, not many researchers have built POS Tagger for Balinese. In this article, we are interested in building a POS Tagger for Balinese using a probabilistic approach, specifically the Hidden Markov Model (HMM). HMM is selected to deal with ambiguity since it gives higher accuracy and fast processing time. We used k-fold cross-validation (with k = 10) and tagged corpus around 3669 tokens with 21 tags. Based on the experiments conducted, the HMM method obtained an accuracy of 68.56%.

Download Full-text

A Statistical Method for Evaluating Performance of Part of Speech Tagger for Gujarati

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1492.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 3899-3903

Keyword(s):

Natural Language Processing ◽

Markov Model ◽

Language Processing ◽

Hidden Markov ◽

Model Error ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Textual Content ◽

Speech Tagging

Part of Speech Tagging has continually been a difficult mission in the era of Natural Language Processing. This article offers POS tagging for Gujarati textual content the use of Hidden Markov Model. Using Gujarati text annotated corpus for training checking out statistics set are randomly separated. 80% accuracy is given by model. Error analysis in which the mismatches happened is likewise mentioned in element.

Download Full-text

Part-Of Speech Tagging Base on Hidden Markov Model

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.198-199.852 ◽

2012 ◽

Vol 198-199 ◽

pp. 852-855

Author(s):

Xi Jie Wang ◽

Shun Yi Hu

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Language Processing ◽

Viterbi Algorithm ◽

Hidden Markov ◽

Estimation Method ◽

Basic Principles ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Part-of-Speech Tagging is the fundamental problems in natural language processing .The paper introduces the representation of the Hidden Markov Model (HMM) and the needs to solve the problem, and then discusses the parameter estimation method of the HMM model, and research on basic principles of Part-of Speech Tagging using Viterbi algorithm.

Download Full-text

A Framework for the Generation of Class Diagram from Text Requirements using Natural language Processing

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/041012021 ◽

2021 ◽

Vol 10 (1) ◽

pp. 25-31

Keyword(s):

Natural Language Processing ◽

Software Development ◽

Natural Language ◽

Language Processing ◽

English Language ◽

Software Requirements ◽

Class Diagram ◽

Requirement Analysis ◽

Part Of Speech ◽

Software Engineers

The software development procedure begins with identifying the requirement analysis. The process levels of the requirements start from analysing the requirements to sketch the design of the program, which is very critical work for programmers and software engineers. Moreover, many errors will happen during the requirement analysis cycle transferring to other stages, which leads to the high cost of the process more than the initial specified process. The reason behind this is because of the specifications of software requirements created in the natural language. To minimize these errors, we can transfer the software requirements to the computerized form by the UML diagram. To overcome this, a device has been designed, which plans can provide semi-automatized aid for designers to provide UML class version from software program specifications using natural Language Processing techniques. The proposed technique outlines the class diagram in a well-known configuration and additionally facts out the relationship between instructions. In this research, we propose to enhance the procedure of producing the UML diagrams by utilizing the Natural Language, which will help the software development to analyze the software requirements with fewer errors and efficient way. The proposed approach will use the parser analyze and Part of Speech (POS) tagger to analyze the user requirements entered by the user in the English language. Then, extract the verbs and phrases, etc. in the user text. The obtained results showed that the proposed method got better results in comparison with other methods published in the literature. The proposed method gave a better analysis of the given requirements and better diagrams presentation, which can help the software engineers. Key words: Part of Speech,UM

Download Full-text

Advances in Computational Linguistics and Text Processing Frameworks

Advances in Computer and Electrical Engineering - Handbook of Research on Engineering Innovations and Technology Management in Organizations ◽

10.4018/978-1-7998-2772-6.ch012 ◽

2020 ◽

pp. 217-244

Author(s):

Ayush Srivastav ◽

Hera Khan ◽

Amit Kumar Mishra

Keyword(s):

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Text Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech

The chapter provides an eloquent account of the major methodologies and advances in the field of Natural Language Processing. The most popular models that have been used over time for the task of Natural Language Processing have been discussed along with their applications in their specific tasks. The chapter begins with the fundamental concepts of regex and tokenization. It provides an insight to text preprocessing and its methodologies such as Stemming and Lemmatization, Stop Word Removal, followed by Part-of-Speech tagging and Named Entity Recognition. Further, this chapter elaborates the concept of Word Embedding, its various types, and some common frameworks such as word2vec, GloVe, and fastText. A brief description of classification algorithms used in Natural Language Processing is provided next, followed by Neural Networks and its advanced forms such as Recursive Neural Networks and Seq2seq models that are used in Computational Linguistics. A brief description of chatbots and Memory Networks concludes the chapter.

Download Full-text

Machine Learning in Natural Language Processing

Handbook of Research on Machine Learning Applications and Trends ◽

10.4018/978-1-60566-766-9.ch014 ◽

2010 ◽

pp. 302-324

Author(s):

Marina Sokolova ◽

Stan Szpakowicz

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Processing ◽

Word Sense Disambiguation ◽

Machine Learning Techniques ◽

Word Sense ◽

Part Of Speech ◽

Applications Of Machine Learning

This chapter presents applications of machine learning techniques to traditional problems in natural language processing, including part-of-speech tagging, entity recognition and word-sense disambiguation. People usually solve such problems without difficulty or at least do a very good job. Linguistics may suggest labour-intensive ways of manually constructing rule-based systems. It is, however, the easy availability of large collections of texts that has made machine learning a method of choice for processing volumes of data well above the human capacity. One of the main purposes of text processing is all manner of information extraction and knowledge extraction from such large text. Machine learning methods discussed in this chapter have stimulated wide-ranging research in natural language processing and helped build applications with serious deployment potential.

Download Full-text