Utilizing Morphological Features for Part-of-Speech Tagging of Bahasa Indonesia in Bidirectional LSTM

POS Tagging adalah dasar untuk pengembangan Text Processing suatu bahasa. Dalam penelitian ini kita meneliti pengaruh penggunaan lexicon dan perubahan morfologi kata dalam penentuan tagset yang tepat untuk suatu kata. Aturan dengan pendekatan morfologi kata seperti awalan, akhiran, dan sisipan biasa disebut sebagai lexical rule. Penelitian ini menerapkan lexical rule hasil learner dengan menggunakan algoritma Brill Tagger. Bahasa Madura adalah bahasa daerah yang digunakan di Pulau Madura dan beberapa pulau lainnya di Jawa Timur. Objek penelitian ini menggunakan Bahasa Madura yang memiliki banyak sekali variasi afiksasi dibandingkan dengan Bahasa Indonesia. Pada penelitian ini, lexicon selain digunakan untuk pencarian kata dasar Bahasa Madura juga digunakan sebagai salah satu tahap pemberian POS Tagging. Hasil ujicoba dengan menggunakan lexicon mencapai akurasi yaitu 86.61% sedangkan jika tidak menggunakan lexicon hanya mencapai akurasi 28.95 %. Dari sini dapat disimpulkan bahwa ternyata lexicon sangat berpengaruh terhadap POS Tagging.

PENENTUAN KELAS KATA PADA PART OF SPEECH TAGGING KATA AMBIGU BAHASA INDONESIA

JISKA (Jurnal Informatika Sunan Kalijaga) ◽

10.14421/jiska.2018.23-05 ◽

2018 ◽

Vol 2 (3) ◽

pp. 157

Author(s):

Ahmad Subhan Yazid ◽

Agung Fatwanto

Keyword(s):

Language Processing ◽

Word Class ◽

Rule Based ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Ambiguous Words ◽

Computer Science Faculty ◽

Speech Tagging ◽

Indonesian hold a fundamental role in the communication. There is ambiguous problem in its machine learning implementation. In the Natural Language Processing study, Part of Speech (POS) tagging has a role in the decreasing this problem. This study use the Rule Based method to determine the best word class for ambiguous words in Indonesian. This research follows some stages: knowledge inventory, making algorithms, implementation, Testing, Analysis, and Conclusions. The first data used is Indonesian corpus that was developed by Language department of Computer science Faculty, Indonesia University. Then, data is processed and shown descriptively by following certain rules and specification. The result is a POS tagging algorithm included 71 rules in flowchart and descriptive sentence notation. Refer to testing result, the algorithm successfully provides 92 labeling of 100 tested words (92%). The results of the implementation are influenced by the availability of rules, word class tagsets and corpus data.

A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia

2017 7th International Annual Engineering Seminar (InAES) ◽

10.1109/inaes.2017.8068538 ◽

2017 ◽

Cited By ~ 1

Author(s):

Ahmad Zuli Amrullah ◽

Rudy Hartanto ◽

I Wayan Mustika

Keyword(s):

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging ◽

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Advances of Science and Technology ◽

Automatic Amharic Part of Speech Tagging (AAPOST): A Comparative Approach Using Bidirectional LSTM and Conditional Random Fields (CRF) Methods

10.1007/978-3-030-43690-2_37 ◽

2020 ◽

pp. 512-521

Author(s):

Worku Kelemework Birhanie ◽

Miriam Butt

Keyword(s):

Random Fields ◽

Conditional Random Fields ◽

Comparative Approach ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Bidirectional Lstm ◽

2016 International Conference on Computer, Control, Informatics and its Applications (IC3INA) ◽

Evaluating the use of word embeddings for part-of-speech tagging in Bahasa Indonesia

10.1109/ic3ina.2016.7863051 ◽

2016 ◽

Cited By ~ 4

Author(s):

Achmad F. Abka

Keyword(s):

Word Embeddings ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging ◽

Integration of morphological features and contextual weightage using monotonic chunk attention for part of speech tagging

Journal of King Saud University - Computer and Information Sciences ◽

10.1016/j.jksuci.2021.08.023 ◽

2021 ◽

Author(s):

Rajesh Kumar Mundotiya ◽

Arpit Mehta ◽

Rupjyoti Baruah ◽

Anil Kumar Singh

Keyword(s):

Morphological Features ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Research on Joint Models for Korean Word Spacing and POS (Part-Of-Speech) Tagging based on Bidirectional LSTM-CRF

Journal of KIISE ◽

10.5626/jok.2018.45.8.792 ◽

2018 ◽

Vol 45 (8) ◽

pp. 792-800 ◽

Cited By ~ 3

Author(s):

Seon-Wu Kim ◽

Sung-Pil Choi

Keyword(s):

Joint Models ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Bidirectional Lstm ◽

A standard tag set expounding traditional morphological features for Arabic language part-of-speech tagging

WORD Structure ◽

10.3366/word.2013.0035 ◽

2013 ◽

Vol 6 (1) ◽

pp. 43-99 ◽

Cited By ~ 4

Author(s):

Majdi Sawalha ◽

Eric Atwell

Keyword(s):

Arabic Language ◽

Morphological Features ◽

Inflectional Morphology ◽

Parts Of Speech ◽

Word Structure ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

European Languages ◽

Generic Design ◽

The SALMA Morphological Features Tag Set (SALMA, Sawalha Atwell Leeds Morphological Analysis tag set for Arabic) captures long-established traditional morphological features of grammar and Arabic, in a compact yet transparent notation. First, we introduce Part-of-Speech tagging and tag set standards for English and other European languages, and then survey Arabic Part-of-Speech taggers and corpora, and long-established Arabic traditions in analysis of morphology. A range of existing Arabic Part-of-Speech tag sets are illustrated and compared; and we review generic design criteria for corpus tag sets. For a morphologically-rich language like Arabic, the Part-of-Speech tag set should be defined in terms of morphological features characterizing word structure. We describe the SALMA Tag Set in detail, explaining and illustrating each feature and possible values. In our analysis, a tag consists of 22 characters; each position represents a feature and the letter at that location represents a value or attribute of the morphological feature; the dash ‘-’ represents a feature not relevant to a given word. The first character shows the main Parts of Speech, from: noun, verb, particle, punctuation, and Other (residual); these last two are an extension to the traditional three classes to handle modern texts. ‘Noun’ in Arabic subsumes what are traditionally referred to in English as ‘noun’ and ‘adjective’. The characters 2, 3, and 4 are used to represent subcategories; traditional Arabic grammar recognizes 34 subclasses of noun (letter 2), 3 subclasses of verb (letter 3), 21 subclasses of particle (letter 4). Others (residuals) and punctuation marks are represented in letters 5 and 6 respectively. The next letters represent traditional morphological features: gender (7), number (8), person (9), inflectional morphology (10) case or mood (11), case and mood marks (12), definiteness (13), voice (14), emphasized and non-emphasized (15), transitivity (16), rational (17), declension and conjugation (18). Finally there are four characters representing morphological information which is useful in Arabic text analysis, although not all linguists would count these as traditional features: unaugmented and augmented (19), number of root letters (20), verb root (21), types of nouns according to their final letters (22). The SALMA Tag Set is not tied to a specific tagging algorithm or theory, and other tag sets could be mapped onto this standard, to simplify and promote comparisons between and reuse of Arabic taggers and tagged corpora.

Mongolian part-of-speech tagging approach based on conditional random fields

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.02038 ◽

2010 ◽

Vol 30 (8) ◽

pp. 2038-2040

Author(s):

Yu-long YING ◽

Miao LI ◽

bala Wuda ◽

Hai ZHU

Keyword(s):

Random Fields ◽

Conditional Random Fields ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

Procedia Computer Science ◽

10.1016/j.procs.2021.03.026 ◽

2021 ◽

Vol 184 ◽

pp. 148-155

Author(s):

Abdul Munem Nerabie ◽

Manar AlKhatib ◽

Sujith Samuel Mathew ◽

May El Barachi ◽

Farhad Oroumchian

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Learning Approach ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

The Impact ◽